Summary: | ASTERISK-24595: chan_sip.c Deadlock on SMP | ||
Reporter: | Grigory Milev (weekend) | Labels: | |
Date Opened: | 2014-12-05 04:14:33.000-0600 | Date Closed: | 2015-01-27 12:43:43.000-0600 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | 11.14.1 11.14.2 11.15.0 | Frequency of Occurrence | Frequent |
Related Issues: | |||
Environment: | Linux home 3.0.35PD14.0.0 #8 SMP PREEMPT Tue Nov 25 22:18:16 MSK 2014 armv7l GNU/Linux Platform: iMX6 quad (armh) OS: Altlinux sisyphus based. | Attachments: | ( 0) backtrace-threads.txt.bz2 ( 1) core-show-locks.txt.bz2 ( 2) debuglog.bz2 ( 3) more-logs.tar.bz2 |
Description: | When asterisk runed on all cpu cores, fue times pear day sip deadlocked:
{noformat} [2014-12-03 12:50:30] ERROR[26839] lock.c: chan_sip.c line 20849 (show_channels_cb): Deadlock? waited 85 sec for mutex 'cur'? [2014-12-03 12:50:30] ERROR[26839] lock.c: chan_sip.c line 3940 (retrans_pkt): 'cur' was locked here. [2014-12-03 12:50:35] ERROR[26839] lock.c: chan_sip.c line 20849 (show_channels_cb): Deadlock? waited 90 sec for mutex 'cur'? {noformat} When i load asterisk only on one core (taskset -a 0x0001 asterisk -fn), then no dedalock at all, but the next dead sip channels: {noformat} home*CLI> sip show channels Peer User/ANR Call ID Format Hold Last Message Expiry Peer 37.17.17.73 205 74439a312ea2eab (nothing) No Init: NOTIFY <guest> 37.17.114.246 205 30a22f2359ff7ee (nothing) No Init: NOTIFY <guest> 37.17.114.246 205 1b4153685e957cd (nothing) No Init: NOTIFY <guest> {noformat} | ||
Comments: | By: Rusty Newton (rnewton) 2014-12-05 08:56:30.041-0600 There isn't enough information here to investigate. Can you provide a trace after the deadlock occurs? [Follow the linked instructions|https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock] Be sure to include the "core show locks" output [and an Asterisk log|https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information] including VERBOSE and DEBUG message types. By: Grigory Milev (weekend) 2014-12-10 01:50:04.121-0600 That log's files i'm collect after sip locked (not respons at all), i'm make sip reload and receive dead locks. With out SMP (asterisk attached to one core) sip never lock. By: Grigory Milev (weekend) 2014-12-12 04:56:15.719-0600 Additional logs. By: Rusty Newton (rnewton) 2015-01-09 19:52:45.158-0600 Looks like you are missing symbols for Asterisk can you resolve that and get a new trace and 'core show locks' output? That of course means a new log too. Everything. Thanks! By: Grigory Milev (weekend) 2015-01-10 02:18:08.434-0600 What exactly symbols not enought? If it's possible, describe step's. It's not simple to reproduce this bug, because system all time in use and work normal on one CPU core, expect of next: [2015-01-10 08:02:31] WARNING[4965] chan_sip.c: Timeout on 8d97c592027efe566b77822b261b75f8 on non-critical invite transaction. By: Matt Jordan (mjordan) 2015-01-12 09:36:09.530-0600 It is pretty clear looking at your thread backtrace that something is not correct in the Asterisk binary/modules: {noformat} Thread 27 (Thread 0x2b1873e0 (LWP 25986)): #0 0x2ada55e4 in __libc_do_syscall () from /lib/libpthread.so.0 #1 0x2ada1146 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0 #2 0x2abf0266 in pthread_cond_wait () from /lib/libc.so.6 #3 0x00124c54 in __ast_cond_wait () #4 0x001b2f6c in ?? () #5 0x001b2f6c in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) {noformat} Not only do you have symbols stripped out, but you also have stack frames that are mucked up. Something here is odd, and developers aren't going to be able to figure it out from what you've provided. There are good instructions on the wiki on how to get information during a deadlock, including what options should be selected in Asterisk to build it correctly: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace Please follow the linked instructions, getting the output of 'core show locks' as well as as the output from GDB. By: Matt Jordan (mjordan) 2015-01-27 12:43:51.100-0600 Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested. Further information can be found at http://www.asterisk.org/developers/bug-guidelines |