[Home]

Summary:ASTERISK-20550: Deadlock between SIP pvts being placed in container and CLI command 'sip show channels'
Reporter:David Brillert (aragon)Labels:
Date Opened:2012-10-10 18:55:01Date Closed:2013-03-25 14:10:03
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:SVN Frequency of
Occurrence
Occasional
Related
Issues:
is duplicated byASTERISK-21068 Asterisk is freezing (since 1.8.18.0 to 1.8.20.1) when doing 'core show channels' AND receiving 'SIP register'
Environment:Attachments:( 0) cli.txt
( 1) refcountdebug.txt
( 2) sip_show_channels_ref_count_log.rar
( 3) thread_apply_bt.txt
Description:While testing for ref counting issues to debug another bug report I ran into a deadlock.
core show locks unresponsive so I attached with gdb.
Ref count debugging was enabled so I am attaching:
gdb thread apply all bt
cli before lock
ref count file
Comments:By: David Brillert (aragon) 2012-10-11 08:32:29.246-0500

I reproduced another deadlock this morning.
System was busy and from the * CLI I typed sip show channels and whammo deadlock.
sip show channels ref count log.rar attached.

By: Matt Jordan (mjordan) 2012-10-11 08:49:29.205-0500

This ends up being a locking inversion between the sip_pvt and the dialogs container.  When handling {{parse_register_contact}}, the sip_pvt is already locked, and a new sip_pvt is created, locking the dialogs container.  When the CLI command is executed, the container is locked first; then each sip_pvt is locked.

The backtrace shows this for sip_pvt {{0x2aaad8532618}}; dialog container {{0x558a3c8}}.

By: David Brillert (aragon) 2012-10-11 08:53:12.765-0500

I can test a patch ASAP, no delays here...

By: David Brillert (aragon) 2012-10-11 09:00:21.329-0500

Could sip show peer XX cause a similar issue?

By: Matt Jordan (mjordan) 2012-10-11 09:01:41.694-0500

'sip show peer' won't, because it doesn't have to interact with the dialogs.

As much as I'd like to say I have a patch in hand, I don't - was merely providing analysis based on your backtraces to help whoever tackles this issue.

By: David Brillert (aragon) 2012-10-11 09:03:47.032-0500

Thanks, at least I know to stay away from sip show channels on the production system.

By: Jonathan Rose (jrose) 2012-10-26 09:44:44.784-0500

Looks like a pvt is locked in register_verify, register_verify calls parse_register_contact which calls sip_poke_peer which calls dialog_unlink_all. Then dialog_unlink_all performs an ao2_t_unlink against dialogs which causes the locking inversion. It's kinda funny since the pvt we are working on in that function (dialog_unlink_all) seems to be a completely different pvt from the one that was locked by register_verify.

There might also be other things using the dialogs list, this is just the first one I found that seems definite.

By: David Brillert (aragon) 2012-11-07 11:39:52.653-0600

Just following up on this.  Ping, any progress?

By: Matt Jordan (mjordan) 2013-03-25 09:21:04.384-0500

If you can, please try the patch attached to ASTERISK-21068, ASTERISK-21068-1.8.diff. It should resolve the issue based on the backtraces attached to the various issues.

By: David Brillert (aragon) 2013-03-25 10:17:35.947-0500

Sorry, I am not able to test that patch.

By: Matt Jordan (mjordan) 2013-03-25 12:38:11.025-0500

Something wrong with the patch? Or are you not running that version of Asterisk?

If needed I can port it to a particular affected version.

By: David Brillert (aragon) 2013-03-25 13:46:08.668-0500

Best I can say is that I can no longer reproduce that problem on that site and I cannot test the patch.

By: Matt Jordan (mjordan) 2013-03-25 14:09:47.462-0500

Well, that's not the worst thing in the world :-)

I'll close this issue out to ASTERISK-21068. We've had a number of people report the same problem - if you do run into it again, try the patch out on that issue. It should keep REGISTER request parsing from deadlocking if you happen to also be running a CLI command at the same time.