Summary: | ASTERISK-20550: Deadlock between SIP pvts being placed in container and CLI command 'sip show channels' | ||||
Reporter: | David Brillert (aragon) | Labels: | |||
Date Opened: | 2012-10-10 18:55:01 | Date Closed: | 2013-03-25 14:10:03 | ||
Priority: | Major | Regression? | No | ||
Status: | Closed/Complete | Components: | Channels/chan_sip/General | ||
Versions: | SVN | Frequency of Occurrence | Occasional | ||
Related Issues: |
| ||||
Environment: | Attachments: | ( 0) cli.txt ( 1) refcountdebug.txt ( 2) sip_show_channels_ref_count_log.rar ( 3) thread_apply_bt.txt | |||
Description: | While testing for ref counting issues to debug another bug report I ran into a deadlock. core show locks unresponsive so I attached with gdb. Ref count debugging was enabled so I am attaching: gdb thread apply all bt cli before lock ref count file | ||||
Comments: | By: David Brillert (aragon) 2012-10-11 08:32:29.246-0500 I reproduced another deadlock this morning. System was busy and from the * CLI I typed sip show channels and whammo deadlock. sip show channels ref count log.rar attached. By: Matt Jordan (mjordan) 2012-10-11 08:49:29.205-0500 This ends up being a locking inversion between the sip_pvt and the dialogs container. When handling {{parse_register_contact}}, the sip_pvt is already locked, and a new sip_pvt is created, locking the dialogs container. When the CLI command is executed, the container is locked first; then each sip_pvt is locked. The backtrace shows this for sip_pvt {{0x2aaad8532618}}; dialog container {{0x558a3c8}}. By: David Brillert (aragon) 2012-10-11 08:53:12.765-0500 I can test a patch ASAP, no delays here... By: David Brillert (aragon) 2012-10-11 09:00:21.329-0500 Could sip show peer XX cause a similar issue? By: Matt Jordan (mjordan) 2012-10-11 09:01:41.694-0500 'sip show peer' won't, because it doesn't have to interact with the dialogs. As much as I'd like to say I have a patch in hand, I don't - was merely providing analysis based on your backtraces to help whoever tackles this issue. By: David Brillert (aragon) 2012-10-11 09:03:47.032-0500 Thanks, at least I know to stay away from sip show channels on the production system. By: Jonathan Rose (jrose) 2012-10-26 09:44:44.784-0500 Looks like a pvt is locked in register_verify, register_verify calls parse_register_contact which calls sip_poke_peer which calls dialog_unlink_all. Then dialog_unlink_all performs an ao2_t_unlink against dialogs which causes the locking inversion. It's kinda funny since the pvt we are working on in that function (dialog_unlink_all) seems to be a completely different pvt from the one that was locked by register_verify. There might also be other things using the dialogs list, this is just the first one I found that seems definite. By: David Brillert (aragon) 2012-11-07 11:39:52.653-0600 Just following up on this. Ping, any progress? By: Matt Jordan (mjordan) 2013-03-25 09:21:04.384-0500 If you can, please try the patch attached to ASTERISK-21068, ASTERISK-21068-1.8.diff. It should resolve the issue based on the backtraces attached to the various issues. By: David Brillert (aragon) 2013-03-25 10:17:35.947-0500 Sorry, I am not able to test that patch. By: Matt Jordan (mjordan) 2013-03-25 12:38:11.025-0500 Something wrong with the patch? Or are you not running that version of Asterisk? If needed I can port it to a particular affected version. By: David Brillert (aragon) 2013-03-25 13:46:08.668-0500 Best I can say is that I can no longer reproduce that problem on that site and I cannot test the patch. By: Matt Jordan (mjordan) 2013-03-25 14:09:47.462-0500 Well, that's not the worst thing in the world :-) I'll close this issue out to ASTERISK-21068. We've had a number of people report the same problem - if you do run into it again, try the patch out on that issue. It should keep REGISTER request parsing from deadlocking if you happen to also be running a CLI command at the same time. |