[Home]

Summary:ASTERISK-18487: Daily deadlock issue
Reporter:Jason Legault (jlegault)Labels:
Date Opened:2011-09-08 13:41:26Date Closed:2011-09-15 10:46:41
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_queue
Versions:1.8.6.0 Frequency of
Occurrence
Frequent
Related
Issues:
duplicatesASTERISK-18101 Asterisk 1.8 Deadlock in app_queue
Environment:Linux Debian 2.6.26-2-amd64 x86_64Attachments:( 0) deadlock_gdb.txt
Description:I have a deadlock issue with asterisk 1.8.6.0 that happens once a day at peak times (150-200 calls or so being recorded) the problem has happened since 1.6.2.9 or so.. I've upgraded to every version since and they all have the same problem.  RTP continues to work for existing calls but new calls can't be made.  

"netstat -anp | grep 5060" shows Recv-Q of 124680.  

I attached gdb to the running PID and did a "info thread" and "thread apply all bt".  Output is attached.
Comments:By: Paul Belanger (pabelanger) 2011-09-13 00:23:50.531-0500

We really need the output of 'core show locks' to help trace this down.

By: Leif Madsen (lmadsen) 2011-09-13 11:15:20.296-0500

Requesting feedback from the reporter.

By: Jason Legault (jlegault) 2011-09-13 12:15:27.728-0500

I enabled DEBUG_THREADS and the system resources were at capacity after reaching 30 concurrent calls. This system usually deadlocks at 200+ calls.  I'm not sure what to try next.

I tried the patch from issues ASTERISK-18101 and haven't had a deadlock yet. Do you think it could be the same issue?

210 active calls
16978 calls processed
System uptime: 23 hours, 50 minutes, 43 seconds



By: Gregory Hinton Nietsky (irroot) 2011-09-13 12:42:01.849-0500

Its quite likely this is related to 18101

Thread 390 (Thread 0x44fd3950 (LWP 7495)):
#0  0x00007fa298926384 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007fa298921c0b in _L_lock_312 () from /lib/libpthread.so.0
#2  0x00007fa298921631 in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x00000000004df614 in __ast_pthread_mutex_lock (filename=0x591fa4 "astobj2.c", lineno=842,
   func=0x592150 "internal_ao2_iterator_next", mutex_name=0x59216b "a->c", t=0x300a290) at lock.c:244
#4  0x000000000044487c in __ao2_lock (user_data=0x300a2e8, file=0x591fa4 "astobj2.c", func=0x592150 "internal_ao2_iterator_next",
   line=842, var=0x59216b "a->c") at astobj2.c:157
#5  0x0000000000445e74 in internal_ao2_iterator_next (a=0x44fcaef0, q=0x44fcae80) at astobj2.c:842
#6  0x00000000004462bc in __ao2_iterator_next (a=0x44fcaef0) at astobj2.c:920
#7  0x00007fa279927440 in update_queue (q=0x3ca6698, member=0x5fbd248, callcompletedinsl=0, newtalktime=1554) at app_queue.c:4019
#8  0x00007fa27992c5b3 in try_calling (qe=0x44fcd1f0, options=0x44fcd12d "", announceoverride=0x44fcd12f "", url=0x44fcd12e "",


Thread 187 (Thread 0x452bb950 (LWP 1341)):
#0  0x00007fa298926384 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007fa298921c0b in _L_lock_312 () from /lib/libpthread.so.0
#2  0x00007fa298921631 in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x00000000004df614 in __ast_pthread_mutex_lock (filename=0x591fa4 "astobj2.c", lineno=842,
   func=0x592150 "internal_ao2_iterator_next", mutex_name=0x59216b "a->c", t=0x300a290) at lock.c:244
#4  0x000000000044487c in __ao2_lock (user_data=0x300a2e8, file=0x591fa4 "astobj2.c", func=0x592150 "internal_ao2_iterator_next",
   line=842, var=0x59216b "a->c") at astobj2.c:157
---Type <return> to continue, or q <return> to quit---
#5  0x0000000000445e74 in internal_ao2_iterator_next (a=0x452b2ee0, q=0x452b2e70) at astobj2.c:842
#6  0x00000000004462bc in __ao2_iterator_next (a=0x452b2ee0) at astobj2.c:920
#7  0x00007fa279927440 in update_queue (q=0x3a9c688, member=0x7b2b858, callcompletedinsl=0, newtalktime=501) at app_queue.c:4019

By: Jason Legault (jlegault) 2011-09-13 14:16:27.327-0500

So far no dead lock after applying patch.

By: Leif Madsen (lmadsen) 2011-09-13 15:41:44.005-0500

OK I'm actually going to mark this as a duplicate of ASTERISK-18101 then.