Summary: | ASTERISK-21657: asterisk locks up after running traffic | ||
Reporter: | Alejandro Orellana (alejandro orellana) | Labels: | |
Date Opened: | 2013-04-19 12:35:09 | Date Closed: | 2013-05-17 19:15:06 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Applications/SLA Channels/chan_sip/General |
Versions: | Frequency of Occurrence | Constant | |
Related Issues: | |||
Environment: | linux ubuntu , asterisk-1.8.2.3 | Attachments: | ( 0) asteriskLocks.txt ( 1) file |
Description: | after running traffic asterisk locks up
it takes between 2-5 hours i am using SLA feature. the SLA has 12 devices associated with it. using 10 with tcp and 2 with udp | ||
Comments: | By: Alejandro Orellana (alejandro orellana) 2013-04-19 12:38:03.814-0500 core show locks output By: Rusty Newton (rnewton) 2013-04-23 17:43:00.067-0500 A few things: * Please read through https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines * You are on an older version of the 1.8 branch. Many issues have been fixed since then. If at all possible, please re-test and provide new debug from the latest release of 1.8 or the latest SVN revision of 1.8 * Attach any text debug or output as .txt so that it's easy to read from the browser * Provide an Asterisk log https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information containing VERBOSE and DEBUG (set to level 5 or higher). We only need the first minute or right up until the lock happens. * You mention "core show locks" output, but I don't see that attached to the issue. Can you re-compile Asterisk as mentioned here https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock. That is recompile with both DEBUG_THREADS and BETTER_BACKTRACES, then provide both the backtrace and core show locks output.. By: Alejandro Orellana (alejandro orellana) 2013-04-24 08:14:52.864-0500 Thanks i will try to get the logs requested. a couple of things came out in my testing i removed SLA component from the system and use a normal Dial command to the set of devices, with same results i am attaching the "core show locks output" file again. BTW i can see it being attached in the ticket Thanks By: Alejandro Orellana (alejandro orellana) 2013-04-24 08:18:10.508-0500 core show locks output By: Alejandro Orellana (alejandro orellana) 2013-04-24 08:19:11.174-0500 I have attached the file with the core show locks output. Thank you By: Alejandro Orellana (alejandro orellana) 2013-04-26 07:24:09.770-0500 we have done some investigating and seems like the lockup is related to a SUBSCRIBE receive for a channel, this subscription is due to the fact we are using BLF (hints ) file. the following analysis is from our partner "Asteria", the threads id may differ but gives a pretty good ide of the problem. Hope this helps: Thread 11619 (LWP 23061) in handle_statechange locks three locks: {noformat} ast_rdlock_contexts(); ao2_lock(hints); ao2_lock(hint); {noformat} Then calls into chan_sip {noformat} /* For extension callbacks */ AST_LIST_TRAVERSE(&hint->callbacks, cblist, entry) { cblist->callback(hint->exten->parent->name, hint->exten->exten, state, cblist->data); } {noformat} Then tries to lock one more lock: {noformat} static int cb_extensionstate(char *context, char* exten, int state, void *data) { struct sip_pvt *p = data; sip_pvt_lock(p); {noformat} But deadlocks because p is already locked by 23140. At the same exact time, Thread 11583 (LWP 23140) gets an incoming SUBSCRIBE to 2008, locks these locks: {noformat} chan_sip.c:24138 /* Process request, with netlock held, and with usual deadlock avoidance */ for (lockretry = 10; lockretry > 0; lockretry--) { ast_mutex_lock(&netlock); {noformat} chan_sip.c:24152 {noformat} if (!p->owner || !ast_channel_trylock(p->owner)) break; /* locking succeeded */ {noformat} And attempts to get the "hint": {noformat} /* If this is a subscription we actually just need to see if a hint exists for the extension */ if (req->method == SIP_SUBSCRIBE) { char hint[AST_MAX_EXTENSION]; int which = 0; if (ast_get_hint(hint, sizeof(hint), NULL, 0, NULL, p->context, uri) || (ast_get_hint(hint, sizeof(hint), NULL, 0, NULL, p->context, decoded_uri) && (which = 1))) { {noformat} Which calls this short function: {noformat} static struct ast_exten *ast_hint_extension(struct ast_channel *c, const char *context, const char *exten) { struct ast_exten *e; ast_rdlock_contexts(); e = ast_hint_extension_nolock(c, context, exten); ast_unlock_contexts(); return e; } {noformat} Where it deadlocks on the call to ast_rdlock_contexts() because 23061 is already holding that lock. Despite the name, ast_rdlock_contexts is just a mutex, there cannot be multiple readers. By: Matt Jordan (mjordan) 2013-04-26 09:18:46.625-0500 Which version of Asterisk are you running? From your analysis, it does not appear to be a recent version of Asterisk 1.8. There were a lot of locking issues involving subscriptions and device state in early versions of 1.8 - before this issue goes any further, this needs to be reproduced in Asterisk 1.8.21.0. By: Rusty Newton (rnewton) 2013-05-17 19:14:57.628-0500 Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested. Further information can be found at http://www.asterisk.org/developers/bug-guidelines |