[Home]

Summary:ASTERISK-26095: chan_iax2: Deadlock
Reporter:Ben Crox (Ben Crox)Labels:
Date Opened:2016-06-08 00:59:34Date Closed:2016-06-08 06:47:43
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:12.5.1 13.7.1 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64) wcte13xp+ d161:800a Wildcard TE131/TE133Attachments:
Description:Asterisk ran out of threads during some of iax2 to dahdi calls.
1 out of 20+ calls may end in such issue.

Meanwhile, there are merely 6k calls per month, 400 call from iax2 to dadhi for the box. 20 hangs per month shall be considered quite severe.
No freepbx, no hylafax, just asterisk + libpri + dahdi + AMI applications

When frozen, 1 of cpu might have 100% usage, memory usage is just 20% ~ 40%, harddisks, network, NAS are all fine.

CLI is accessible, but no new log / verbose / debug messages.
Core show locks reveals huge amount of MUTEX lock by chan_iax2.so .
fd limit is not yet a problem for any other new process.  

core restart does not work ( and may held CLI )
service asterisk restart does not work.
Can only hard KILL safe_asterisk or reboot the box.


( version 13.1-cert7  )
Comments:By: Asterisk Team (asteriskteam) 2016-06-08 00:59:34.608-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Joshua C. Colp (jcolp) 2016-06-08 05:24:26.773-0500

Thank you for the crash report. However, we need more information to investigate the crash. Please provide:

1. A backtrace generated from a core dump using the instructions provided on the Asterisk wiki [1].
2. Specific steps taken that lead to the crash.
3. All configuration information necesary to reproduce the crash.

Thanks!

[1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace



By: Joshua C. Colp (jcolp) 2016-06-08 05:26:24.019-0500

As well - please try using a recent version of Asterisk 13 itself. Certified only receives fixes as a result of issues reported by Digium commercial customers and so your issue may be fixed in 13 itself.

By: Ben Crox (Ben Crox) 2016-06-08 06:26:12.442-0500

Thanks Joshua for following up. I need to recompile for DONT_OPTIMIZE and BETTER_BACKTRACES.

Since the crash is a bit random, I don't have a specific flow to reproduce it.
I can avoid crash by switching off IAX2 channel.
( The same box has over 9 month uptime and tens thousands of calls handled before getting the IAX2 trouble. Was using 12.5.1 )

Every time asterisk freezes, there is at least one IAX2 to Dahdi found in core show channels.
Source of IAX2 is another Asterisk 13.7.2 connected by VPN, which so far never hangs.

Typical hang:

boxB: .... Dial(IAX2/id:pw@boxA/${DESTNUM})

boxA:
[from-iax2]
exten => _XXXX.,1,Dial(DAHDI/g0/${EXTEN})
exten => _XXXX.,n,Hangup
exten => h,1,Hangup
exten => t,1,Hangup
exten => i,1,Hangup

boxA:
app_dial.c: DAHDI/i1/XXXXXXXX-YY answered IAX2/BoxB-ZZZZ
bridge_channel.c: Channel DAHDI/i1/XXXXXXXX-YY joined 'simple_bridge' basic-bridge <....UUID>
bridge_channel.c: Channel IAX2/BoxB-ZZZZ joined 'simple_bridge' basic-bridge <....UUID>
  .... after several minutes
sig_pri.c: Span 1: Channel 0/1 got hangup request, cause 16
sig_pri.c: Span 1: Channel 0/1 got hangup, cause 102
( frozen )

boxB:
during that time, if there are another calls via IAX2 starts

NOTICE[XXXXX] chan_iax2.c: Auto-congesting call due to slow response

the same happens untils boxA is restarted.

boxA:
does not log the new call

If boxA is not frozen, multiple concurrent IAX2 calls  works.


By: Ben Crox (Ben Crox) 2016-06-08 06:45:54.423-0500

Hmm, hit some obstacle from end user side, not willing to go to current release version.

My in-house boxes don't have such IAX2 issues and not yet able to reproduce.

I will switch that box + network to take only SIP + Dadhi.
Thus may not able to do experiment for this issue report....

Thanks for the teams' help anyway. If I get updates and core-dumps, I will come back to update/re-open issue.


By: Joshua C. Colp (jcolp) 2016-06-08 06:47:43.591-0500

Suspending for now!