[Home]

Summary:ASTERISK-18747: Deadlock in chan_sip / event on send mwi / unsubscribe
Reporter:Gregory Hinton Nietsky (irroot)Labels:
Date Opened:2011-10-24 05:14:58Date Closed:2011-11-07 17:44:25.000-0600
Priority:BlockerRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_sip/Subscriptions
Versions:1.8.7.0 Frequency of
Occurrence
One Time
Related
Issues:
must be completed before resolvingASTERISK-18499 Asterisk 1.8.8.0 Release Blockers
must be completed before resolvingASTERISK-18608 Asterisk 10.0.0-rc1 Blockers
is related toASTERISK-18760 Deadlock in SVN 1.8 version 342359
Environment:Tin Cans And StringAttachments:( 0) jira_asterisk_18747_v1.8.patch
( 1) sip_mwi_dl.patch
Description:if the peer is destroyed while sending mwi notice there is a deadlock.

this is trivial to fix make sure the ref is held for the peer when locked.

introduced by r335319 where the peer was locked not ref'd


=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 0xb7438b70 (tps_processing_function started at [  451] taskprocessor.c ast_taskprocessor_get())
=== ---> Lock #0 (event.c): RDLOCK 1452 handle_event &(&ast_event_subs[event_types[i]])->lock 0x820fe68 (1)
=== ---> Lock #1 (chan_sip.c): MUTEX 25057 sip_send_mwi_to_peer peer 0xade54b38 (1)
=== ---> Waiting for Lock #2 (astobj2.c): MUTEX 493 internal_ao2_link c 0xb461fa80 (1)
=== --- ---> Locked Here: astobj2.c line 657 (internal_ao2_callback)
=== -------------------------------------------------------------------
===
=== Thread ID: 0xb4262b70 (do_monitor           started at [25271] chan_sip.c restart_monitor())
=== ---> Lock #0 (astobj2.c): MUTEX 657 internal_ao2_callback c 0xb461fa80 (1)
=== ---> Waiting for Lock #1 (event.c): WRLOCK 961 ast_event_unsubscribe &(&ast_event_subs[sub->type])->lock 0x820fe68 (1)
=== -------------------------------------------------------------------
===

FROM BT

#2  0x08115f1f in __ast_rwlock_wrlock (filename=0x81ce9d1 "event.c", line=961, func=0x81cf3e7 "ast_event_unsubscribe",
   t=0x820fe68, name=0x81cef98 "&(&ast_event_subs[sub->type])->lock") at /usr/src/asterisk-1.8/include/asterisk/lock.h:302
       res = 0
       lt = (struct ast_lock_track *) 0x9848e18
       canlog = 1
       bt = (struct ast_bt *) 0x9848f44
       __PRETTY_FUNCTION__ = "__ast_rwlock_wrlock"
#3  0x080e4504 in ast_event_unsubscribe (sub=0xb6685a20) at event.c:288
       event = (struct ast_event *) 0xb772cff4
       __PRETTY_FUNCTION__ = "ast_event_unsubscribe"
#4  0xb454d249 in sip_destroy_peer (peer=0xb635b7b8) at /usr/src/asterisk-1.8.6/include/asterisk/netsock2.h:95
       __p = 0
       __x = 3060292128
       __PRETTY_FUNCTION__ = "sip_destroy_peer"
#5  0xb454d26d in sip_destroy_peer (peer=0xb6685250) at /usr/src/asterisk-1.8.6/include/asterisk/netsock2.h:95
       __PRETTY_FUNCTION__ = "sip_destroy_peer"
#6  0xb454d647 in realtime_peer (newpeername=0xb6685250 "1272", addr=0x0, devstate_only=0, which_objects=-1234677216)
   at /usr/src/asterisk-1.8.6/include/asterisk/netsock2.h:95
       peer = (struct sip_peer *) 0x0
       var = (struct ast_variable *) 0x1c0602
       varregs = (struct ast_variable *) 0x0
       tmp = (struct ast_variable *) 0x33413
       peerlist = (struct ast_config *) 0x0
       ipaddr = "5<B6>\000\000\000\000\000\017}\000(\037&<B4>G<D6>T<B4>PRh<B6><FF><FF><FF><FF>", '\0' <repeats 19 times>
       portstring = "<B8><B7>5<B6><B8><B7>"
       realtimeregs = 0
       __PRETTY_FUNCTION__ = "realtime_peer"
#7  0xb454d2d2 in sip_destroy_peer (peer=0xb6685250) at /usr/src/asterisk-1.8.6/include/asterisk/netsock2.h:95
       __PRETTY_FUNCTION__ = "sip_destroy_peer"
#8  0x080860d1 in internal_ao2_ref (user_data=0xb6685250, delta=-1) at astobj2.c:93
Comments:By: Gregory Hinton Nietsky (irroot) 2011-10-24 05:37:07.333-0500

Make sure im holding a ref for the dialog and peer while working on them to prevent deadlock.

By: Richard Mudgett (rmudgett) 2011-11-01 16:36:50.857-0500

[^jira_asterisk_18747_v1.8.patch] should fix the deadlock between the subscription's RWLOCK and the dialogs container lock.

By: Richard Mudgett (rmudgett) 2011-11-01 17:19:58.519-0500

If you still have the backtrace and core show locks files, please attach them.  I had to guess that the ao2 lock needed was for the dialogs container and where it was obtained.

By: David Vossel (dvossel) 2011-11-07 16:43:13.477-0600

Just as a side note to this issue.

Gregory, holding a reference to an object has absolutely nothing to do with preventing a deadlock... or locking at all.  Threads are given a reference to an object just to guarantee the object will not be destroyed while the thread is accessing it.  A reference will never prevent a deadlock from occurring... EVER, never never ever never.

never.

I have used some tricks in several places where I lock an object that has a reference to another object to do some magic that may be confusing.  I'll attempt to explain.

Say we have an object pointer called blarg, and blarg owns some member data called scooter.  For some stupid reason we need to lock both blarg and scooter at the same time, but we only have a reference to blarg and the proper locking order is scooter then blarg...  This is a trick I use.

lock(blarg);
/*
* Now that blarg is locked, we are guaranteed
* that scooter's reference owned by blarg is safe in
* the thread we are currently acting in.  The problem is
* scooter is only safe to access while blarg is locked.
* The good news is that we can safely take a reference
* of scooter for our own thread without even having to
* lock scooter at all.  Again... blarg has its own reference
* to scooter, but we only control that reference
* when blarg is locked.
*/

if (blarg->scooter) {
  tmpscooter = ref(blarg->scooter, +1);
}
unlock(blarg)

/*
 * Now that tmpscooter is referenced for this thread
 * without blarg having to be locked, we can safely lock
 * scooter and then blarg in the correct locking order.
 */

if (tmpscooter) {
  lock(tmpscooter);
}

lock(blarg);

Let me know if there is anything else I can do to explain locking, ref counting, or Asterisk concurrency in any way.

Hooray!

By: David Vossel (dvossel) 2011-11-07 17:44:25.520-0600

I'm closing this.  Richard's patch will most definitely resolve the issue.

By: Gregory Hinton Nietsky (irroot) 2011-11-09 06:06:10.984-0600

Indeed looks like a keeper the logic with me ref +1 the bits was to prevent them dying keeping them around for a while longer and not prevent the deadlock that looked to be a destroy happening at the same time as something usefull happening ....

if the destroy can be delayed till the locks are released then the deadlock in the destroy func wont happen
till i release the ref and let it explode.

thank you for the info above will come in useful big time.