[Home]

Summary:ASTERISK-17297: sip deadlock with explanation
Reporter:David Vossel (dvossel)Labels:
Date Opened:2011-01-27 12:44:15.000-0600Date Closed:2011-04-18 11:22:58
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) asterisk_core_locks_14.02.2011.txt
( 1) coreshowdeadlocks.txt
Description:I've been doing quite a bit of load testing of chan_sip and chan_iax for the media architecture changes.  When I push the calls per a second up past a certain limit I am seeing a consistent deadlock occurring.  I've investigated it and here's what is happening.

In chan_sip's handle_request_do() function we have to lock both the pvt and the channel at the same time.  This involves deadlock avoidance and is a mess.  After a number of locking attempts fail on the channel, the request is queued to be handled at a later time.  This would typically only occur under very heavy load.

Queued requests are also handled in the handle_request_do() function.  Once both a pvt and a channel are locked, the queued requests are handled first, and then the actual request triggering the handle_request_do function's invocation is processed... This is where the problem is...

Between process_request_queue() and calling handle_incoming() the channel may be unlocked.  We can detect this by inspecting the nounlock variable which is passed to process_request_queue() but we do not.  If the channel remains unlocked entering handle_incoming(), it is very possible a deadlock will occur since many of the code paths in handle_incoming will try to grab a recursive channel lock using the channel API.  In the core show deadlocks output I have included ast_queue_frame causes this to occur.


I have a setup that reproduces this problem consistently.
Comments:By: Tan Tuerel (thsgmbh) 2011-02-15 02:07:55.000-0600

Hi, I tried your patch from the reviewboard (adjusted for asterisk 1.6.2) and still get deadlocks in handle_request_do (in association with find_call)...

Not sure, if my deadlock is really related to your bug/patch (but it seems to be so)!?!

I have attached a "core show locks"...

By: Tan Tuerel (thsgmbh) 2011-02-15 02:18:51.000-0600

Supplement:

The deadlock occurs while shutting down asterisk (core stop gracefully). Shortly before shutting down I executed some callfiles (with local channels and  extensions with BLF subscriptions)...

By: Gregory Hinton Nietsky (irroot) 2011-02-15 03:00:20.000-0600

Hi there please try somthing >= 1.6.2.17-rc2 and look @ ASTERISK-17387 this problem has bothered me for long time back to 1.4

By: David Vossel (dvossel) 2011-02-15 09:58:07.000-0600

I've also created a patch for this issue.  It can be found here. https://reviewboard.asterisk.org/r/1100/

By: Tan Tuerel (thsgmbh) 2011-02-16 01:46:42.000-0600

Hi...

I already use the latest svn branch 1.6.2, actually 307624...

Now, a applied your patch (from 0018790) in addition to the already in svn integrated one from jpeeler (0018441) (<< this patch did NOT resolve my issue).

By: Tan Tuerel (thsgmbh) 2011-02-16 02:02:15.000-0600

@dvossel:  I've also integrated your patch, but this didn't resolve my problem as well.

By: Digium Subversion (svnbot) 2011-04-18 10:23:47

Repository: asterisk
Revision: 314067

U   branches/1.8/channels/chan_sip.c

------------------------------------------------------------------------
r314067 | dvossel | 2011-04-18 10:23:46 -0500 (Mon, 18 Apr 2011) | 22 lines

Remove the need for deadlock avoidance in chan_sip do_monitor.

Deadlock avoidance between the sip pvt and the pvt->owner is
very difficult.  Now that channel's are ao2 objects, this complication
is no longer necessary.  It turns out the pvt's msg queue only
exists because of deadlock avoidance (when deadlock avoidance fails
msgs were added to a queue to be processed later), so this goes away as well.

The technique used in the new sip_lock_pvt_full() function should
be used as a template for replacing all locations where deadlock
avoidance occurs between a channel tech_pvt and the pvt's owner.
My hope is that this will begin a reversal of the invalid channel
driver locking architecture we have been using for so long.

This patch also resolves an issue where the pvt->owner gets
unlocked during processing the msg queue.

(closes issue ASTERISK-17297)
Reported by: dvossel

Review: https://reviewboard.asterisk.org/r/1182/

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=314067

By: Digium Subversion (svnbot) 2011-04-18 11:22:57

Repository: asterisk
Revision: 314078

_U  trunk/
U   trunk/channels/chan_sip.c

------------------------------------------------------------------------
r314078 | dvossel | 2011-04-18 11:22:56 -0500 (Mon, 18 Apr 2011) | 29 lines

Merged revisions 314067 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.8

........
 r314067 | dvossel | 2011-04-18 10:23:45 -0500 (Mon, 18 Apr 2011) | 22 lines
 
 Remove the need for deadlock avoidance in chan_sip do_monitor.
 
 Deadlock avoidance between the sip pvt and the pvt->owner is
 very difficult.  Now that channel's are ao2 objects, this complication
 is no longer necessary.  It turns out the pvt's msg queue only
 exists because of deadlock avoidance (when deadlock avoidance fails
 msgs were added to a queue to be processed later), so this goes away as well.
 
 The technique used in the new sip_lock_pvt_full() function should
 be used as a template for replacing all locations where deadlock
 avoidance occurs between a channel tech_pvt and the pvt's owner.
 My hope is that this will begin a reversal of the invalid channel
 driver locking architecture we have been using for so long.
 
 This patch also resolves an issue where the pvt->owner gets
 unlocked during processing the msg queue.
 
 (closes issue ASTERISK-17297)
 Reported by: dvossel
 
 Review: https://reviewboard.asterisk.org/r/1182/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=314078