[Home]

Summary:ASTERISK-24595: chan_sip.c Deadlock on SMP
Reporter:Grigory Milev (weekend)Labels:
Date Opened:2014-12-05 04:14:33.000-0600Date Closed:2015-01-27 12:43:43.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:11.14.1 11.14.2 11.15.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Linux home 3.0.35PD14.0.0 #8 SMP PREEMPT Tue Nov 25 22:18:16 MSK 2014 armv7l GNU/Linux Platform: iMX6 quad (armh) OS: Altlinux sisyphus based.Attachments:( 0) backtrace-threads.txt.bz2
( 1) core-show-locks.txt.bz2
( 2) debuglog.bz2
( 3) more-logs.tar.bz2
Description:When asterisk runed on all cpu cores, fue times pear day sip deadlocked:
{noformat}
[2014-12-03 12:50:30] ERROR[26839] lock.c: chan_sip.c line 20849 (show_channels_cb): Deadlock? waited 85 sec for mutex 'cur'?
[2014-12-03 12:50:30] ERROR[26839] lock.c: chan_sip.c line 3940 (retrans_pkt): 'cur' was locked here.
[2014-12-03 12:50:35] ERROR[26839] lock.c: chan_sip.c line 20849 (show_channels_cb): Deadlock? waited 90 sec for mutex 'cur'?
{noformat}
When i load asterisk only on one core (taskset -a 0x0001 asterisk -fn), then no dedalock at all, but the next dead sip channels:
{noformat}
home*CLI> sip show channels
Peer             User/ANR         Call ID          Format           Hold     Last Message    Expiry     Peer
37.17.17.73      205              74439a312ea2eab  (nothing)        No       Init: NOTIFY               <guest>
37.17.114.246    205              30a22f2359ff7ee  (nothing)        No       Init: NOTIFY               <guest>
37.17.114.246    205              1b4153685e957cd  (nothing)        No       Init: NOTIFY               <guest>
{noformat}
Comments:By: Rusty Newton (rnewton) 2014-12-05 08:56:30.041-0600

There isn't enough information here to investigate.

Can you provide a trace after the deadlock occurs? [Follow the linked instructions|https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock]

Be sure to include the "core show locks" output [and an Asterisk log|https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information] including VERBOSE and DEBUG message types.


By: Grigory Milev (weekend) 2014-12-10 01:50:04.121-0600

That log's files i'm collect after sip locked (not respons at all), i'm make sip reload and receive dead locks.

With out SMP (asterisk attached to one core) sip never lock.

By: Grigory Milev (weekend) 2014-12-12 04:56:15.719-0600

Additional logs.

By: Rusty Newton (rnewton) 2015-01-09 19:52:45.158-0600

Looks like you are missing symbols for Asterisk can you resolve that and get a new trace and 'core show locks' output? That of course means a new log too. Everything. Thanks!

By: Grigory Milev (weekend) 2015-01-10 02:18:08.434-0600

What exactly symbols not enought? If it's possible, describe step's. It's not simple to reproduce this bug, because system all time in use and work normal on one CPU core, expect of next:
[2015-01-10 08:02:31] WARNING[4965] chan_sip.c: Timeout on 8d97c592027efe566b77822b261b75f8 on non-critical invite transaction.

By: Matt Jordan (mjordan) 2015-01-12 09:36:09.530-0600

It is pretty clear looking at your thread backtrace that something is not correct in the Asterisk binary/modules:

{noformat}
Thread 27 (Thread 0x2b1873e0 (LWP 25986)):
#0  0x2ada55e4 in __libc_do_syscall () from /lib/libpthread.so.0
#1  0x2ada1146 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
#2  0x2abf0266 in pthread_cond_wait () from /lib/libc.so.6
#3  0x00124c54 in __ast_cond_wait ()
#4  0x001b2f6c in ?? ()
#5  0x001b2f6c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
{noformat}

Not only do you have symbols stripped out, but you also have stack frames that are mucked up. Something here is odd, and developers aren't going to be able to figure it out from what you've provided.

There are good instructions on the wiki on how to get information during a deadlock, including what options should be selected in Asterisk to build it correctly:

https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

Please follow the linked instructions, getting the output of 'core show locks' as well as as the output from GDB.

By: Matt Jordan (mjordan) 2015-01-27 12:43:51.100-0600

Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested.  Further information can be found at http://www.asterisk.org/developers/bug-guidelines