[Home]

Summary:ASTERISK-25386: Asterisk Chan_sip.c Deadlock. All SIP traffic stops
Reporter:Christopher (cadillackid)Labels:
Date Opened:2015-09-09 20:32:22Date Closed:2015-09-10 05:22:26
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:1.8.15.0 1.8.32.3 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Centos 6.4 Kernel 2.6.32-358.el6.i686. Core i7-3770k Jetway Q77, 8 GB RAM Static flatfiles created by Execs. Attachments:( 0) 1441822220.asterisk-core-show-locks.txt
( 1) 1441822220.asterisk-core-show-taskprocessors.txt
( 2) 1441822220.asterisk-core-show-threads.txt
( 3) 1441822220.asterisk-module-show.txt
( 4) 1441822220.asterisk-sip-show-channels.txt
( 5) 1441822220.gdb-bt-thread-apply-all-bt.txt
Description:All SIP traffic stops randomly. core show channels will never return, core show Locks shows Deadlock in the channel driver. our core show locks looks similar to 21228 and also to 25213. However we are NOT using realtime. this issue occured on average about every 50,000 call completions on 1.8.15-cert5. we installed 1.8.32.3  with DONT_OPTIMIZE, DEBUG_THREADS, BETTER_BACKTRACES, and we are lucky to get 3000 calls being lockups. each time our BT and locks look similar. existing calls continue to progress through applications such as voicemail, queues, etc until they reach a point where chan_sip is required.. (ie a call will continue in the queue until its time to ring an agent.. then that call will hang in dead air). no SIP registrations are accepted. all NOTIFY traffic stops as well. the AMI continues to function. CLI command sip show channels returns, however core show channels hangs. will attach pertinent backtraces
Comments:By: Asterisk Team (asteriskteam) 2015-09-09 20:32:23.823-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Christopher (cadillackid) 2015-09-09 20:40:25.161-0500

Attached are the Backtraces, locks, and other system info from the time of the system crash. we are currently running the system on the 1.8.32.3, these were taken by attaching to the running process before we restarted it.

By: Christopher (cadillackid) 2015-09-09 20:55:08.604-0500

it looks like our Deadlock occurs in pbx_core since our queue sits at 18..

perhaps between threads 0xb6387b70 and 0xb6effb70?

0xb6effb70 - it almost looks like a devicestate change was being performed at the time..  we use a lot of BLF;s


By: Joshua C. Colp (jcolp) 2015-09-10 05:22:13.503-0500

Per the Asterisk versions page [1], the maintenance (bug fix) support for the Asterisk branch you are using has ended. For continued maintenance support please move to a supported branch of Asterisk. After testing with a supported branch, if you find this problem has not been resolved, please open a new issue against the latest version of that Asterisk branch.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions



By: Nuno Ferreira (nferreira) 2016-02-25 09:22:42.408-0600

Hi Christopher i'm facing similar problems such the ones you described here. Unfortunately I'm not being able to reproduce them on my lab.
When you get the hang at the 3000 calls mark was the system on production or did you succeed to replicate that on some test server?