[Home]

Summary:ASTERISK-21657: asterisk locks up after running traffic
Reporter:Alejandro Orellana (alejandro orellana)Labels:
Date Opened:2013-04-19 12:35:09Date Closed:2013-05-17 19:15:06
Priority:MajorRegression?
Status:Closed/CompleteComponents:Applications/SLA Channels/chan_sip/General
Versions:Frequency of
Occurrence
Constant
Related
Issues:
Environment:linux ubuntu , asterisk-1.8.2.3Attachments:( 0) asteriskLocks.txt
( 1) file
Description:after running traffic asterisk locks up
it takes between 2-5 hours
i am using SLA feature. the SLA has 12 devices associated with it.
using 10 with tcp and 2 with udp
Comments:By: Alejandro Orellana (alejandro orellana) 2013-04-19 12:38:03.814-0500

core show locks  output

By: Rusty Newton (rnewton) 2013-04-23 17:43:00.067-0500

A few things:

* Please read through https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines
* You are on an older version of the 1.8 branch. Many issues have been fixed since then. If at all possible, please re-test and provide new debug from the latest release of 1.8 or the latest SVN revision of 1.8
* Attach any text debug or output as .txt so that it's easy to read from the browser
* Provide an Asterisk log https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information containing VERBOSE and DEBUG (set to level 5 or higher). We only need the first minute or right up until the lock happens.
* You mention "core show locks" output, but I don't see that attached to the issue. Can you re-compile Asterisk as mentioned here https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock. That is recompile with both DEBUG_THREADS and BETTER_BACKTRACES, then provide both the backtrace and core show locks output..



By: Alejandro Orellana (alejandro orellana) 2013-04-24 08:14:52.864-0500

Thanks
i will try to get the logs requested. a couple of things came out in  my testing
i removed SLA component from the system and use a normal Dial command to the set of devices, with same results
i am attaching the "core show locks output" file again. BTW i can see it being attached in the ticket

Thanks

By: Alejandro Orellana (alejandro orellana) 2013-04-24 08:18:10.508-0500

core show locks output

By: Alejandro Orellana (alejandro orellana) 2013-04-24 08:19:11.174-0500

I have attached the file with the core show locks output.
Thank you

By: Alejandro Orellana (alejandro orellana) 2013-04-26 07:24:09.770-0500

we have done some investigating and seems like the lockup is related  to  a SUBSCRIBE receive for a channel,
this subscription is due to the fact  we are using BLF (hints ) file.  the following analysis is from  our partner "Asteria", the threads id may differ but gives a pretty good ide of the problem. Hope this helps:

Thread 11619 (LWP 23061) in handle_statechange locks three locks:
{noformat}
               ast_rdlock_contexts();
               ao2_lock(hints);
               ao2_lock(hint);
{noformat}

Then calls into chan_sip
{noformat}
               /* For extension callbacks */
               AST_LIST_TRAVERSE(&hint->callbacks, cblist, entry) {
                       cblist->callback(hint->exten->parent->name, hint->exten->exten, state, cblist->data);
               }
{noformat}

Then tries to lock one more lock:
{noformat}
static int cb_extensionstate(char *context, char* exten, int state, void *data)
{
       struct sip_pvt *p = data;

       sip_pvt_lock(p);
{noformat}

But deadlocks because p is already locked by 23140.

At the same exact time, Thread 11583 (LWP 23140) gets an incoming SUBSCRIBE to 2008, locks these locks:
{noformat}
chan_sip.c:24138
       /* Process request, with netlock held, and with usual deadlock avoidance */
       for (lockretry = 10; lockretry > 0; lockretry--) {
               ast_mutex_lock(&netlock);
{noformat}
chan_sip.c:24152
{noformat}
               if (!p->owner || !ast_channel_trylock(p->owner))
                       break;  /* locking succeeded */
{noformat}
And attempts to get the "hint":
{noformat}
      /* If this is a subscription we actually just need to see if a hint exists for the extension */
       if (req->method == SIP_SUBSCRIBE) {
               char hint[AST_MAX_EXTENSION];
               int which = 0;
               if (ast_get_hint(hint, sizeof(hint), NULL, 0, NULL, p->context, uri) ||
                   (ast_get_hint(hint, sizeof(hint), NULL, 0, NULL, p->context, decoded_uri) && (which = 1))) {
{noformat}

Which calls this short function:
{noformat}
static struct ast_exten *ast_hint_extension(struct ast_channel *c, const char *context, const char *exten)
{
       struct ast_exten *e;
       ast_rdlock_contexts();
       e = ast_hint_extension_nolock(c, context, exten);
       ast_unlock_contexts();
       return e;
}
{noformat}

Where it deadlocks on the call to ast_rdlock_contexts() because 23061 is already holding that lock.

Despite the name, ast_rdlock_contexts is just a mutex, there cannot be multiple readers.


By: Matt Jordan (mjordan) 2013-04-26 09:18:46.625-0500

Which version of Asterisk are you running?

From your analysis, it does not appear to be a recent version of Asterisk 1.8. There were a lot of locking issues involving subscriptions and device state in early versions of 1.8 - before this issue goes any further, this needs to be reproduced in Asterisk 1.8.21.0.

By: Rusty Newton (rnewton) 2013-05-17 19:14:57.628-0500

Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested.  Further information can be found at http://www.asterisk.org/developers/bug-guidelines