[Home]

Summary:ASTERISK-16390: ConfBridge crashes Asterisk
Reporter:Thomas Nilsen (mutex)Labels:
Date Opened:2010-07-19 06:52:51Date Closed:2011-06-07 14:00:41
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_confbridge
Versions:Frequency of
Occurrence
Related
Issues:
is related toASTERISK-16640 ConfBridge crashes when leave simultaneously
Environment:Attachments:( 0) asterisk_calls.png
( 1) asterisk_crash_log_gdb.txt
( 2) backtrace_core.txt
( 3) backtrace_rev278465.txt
( 4) backtrace.txt
( 5) backtrace2.txt
( 6) backtrace-threads_09082010.txt
( 7) backtrace-threads_10082010.txt
( 8) backtrace-threads.txt
( 9) backtrace-threads-03082010.txt
(10) console_info.txt
Description:I have an asterisk installation used mainly for ConfBridge use.

Operating system:

Linux atlas 2.6.18-194.3.1.el5.centos.plusPAE #1 SMP Wed May 19 10:00:02 EDT 2010 i686 i686 i386 GNU/Linux

Usage:

ConfBridge is hosting a small number of conferences with variable number of atendees.
Typically its 1-2 speaker pr conf room and n number of muted listeners.
Conferences are typically 1-2 hours long.

Description of behaviour:

It seems that the crash occours typically when alot of callers is hanging up, or something like this. Since the crash typically is after 1-2 hour (near end or at end of conference). Looks like its happening while users are leaving conference. But not sure! Ending in total crash of asterisk.


Confbridge configuration:

ConfBridge is used with "cM(something)" for speakers or "mM(something)1" for listeners.

Codecs / Media:

All codecs are same. aLaw between single endpoint Sonus NBS/GSX gateway.

System hardware:

Dell 2850 4 cpu Intel Xeon with 8GB ram.

****** STEPS TO REPRODUCE ******

Reproducability by using current setup / configuration. Time of crash is not certain, but always happens during an usage session as described in "Description" above.
Comments:By: Thomas Nilsen (mutex) 2010-07-19 06:58:58

Im willing to allow debugging and ssh access on my system if that helps.

By: Paul Belanger (pabelanger) 2010-07-19 09:58:05

Your backtrace is optimized (see below), you will need to upload a new trace.
---
Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory.

Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt.

When complete, attach that file to this issue report. Thanks!

By: Leif Madsen (lmadsen) 2010-07-20 10:17:09

From what I can see, you're probably using res_timing_pthread -- don't use that as it is known to be unstable.

Using res_timing_dahdi is probably your best bet. You may also be able to get some better results over pthread with res_timing_timerfd, but that is only available on certain kernels.

By: Thomas Nilsen (mutex) 2010-07-20 18:02:17

Thanks for you replies. I have compiled without code optimizing now.

I will try to get a usable backtrace from asterisk when it hangs or core dumps next time.

I will take a look at the other timing sources also.

Today i was unable to get the backtrace during this hang, but i will fix next time. Im not sure if this was the same since i have got core dumps earlier, but it might be due to same issue.

See attachment for console info (tried to 'execute core show channels' with variable success rate)

Top also shows cpu usage of 128% to asterisk.. :)

At this point asterisk have limited response (but is alive in console), but NO sip response what so ever when gateway is sending calls to it.



By: Leif Madsen (lmadsen) 2010-07-21 12:26:04

res_timing_pthread should be fixed by this commit: http://svn.digium.com/view/asterisk?view=rev&revision=278465

If you try after that commit, then likely the system should stabilize.

By: Thomas Nilsen (mutex) 2010-07-25 11:56:36

I updated and recompiled using rev 278465.

Same problem. Seems that asterisk hangs in a deadlock or something.

See attached backtrace_rev278465.txt

By: Thomas Nilsen (mutex) 2010-07-25 12:07:38

I can see that backtrace says 'No debug symbols found'. Maby i forgot to compile with DONT_OPTIMIZE flag on last build, or do i have to enable it with some other flag?

Problem seems very reproducable, so if someone can help with a call generator to setup 40-60 calls i guess we could probably get a good debug environment.

By: Paul Belanger (pabelanger) 2010-07-25 12:27:18

Yes, you need to upload an unoptimized backtrace.  See below for instructions.
---
Thank you for your bug report. In order to move your issue forward, we require a backtrace from the core file produced after the crash. Please see the doc/backtrace.txt file in your Asterisk source directory.

Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

after enabling, reproduce the crash, and then execute the instructions in doc/backtrace.txt.

When complete, attach that file to this issue report. Thanks!

By: Thomas Nilsen (mutex) 2010-08-03 15:02:44

I get same error every day now. It does not seem to segfault every time. Last times now it just hangs. I can connect to cli, show channels, etc. But no response on SIP.

I just read some other issues. Can my be related to 0017747 ?

I also use asterisk manager http to read channel statistics to monitoring software.

Attached is todays backtrace from hanging asterisk.

Thanks.

By: Thomas Nilsen (mutex) 2010-08-03 15:28:23

Im not familiar with the asterisk source code, but can it be that a mutex ( mutex_name=0x81c7684 "&bridge_channel->lock") is not released properly somewhere?

Seems that many of the threads are waiting for this one..

By: Thomas Nilsen (mutex) 2010-08-10 14:47:16

I have attached two backtraces from deadlocked asterisk. This bug really prevents usage of ConfBridge in production systems of any kind.

By: Thomas Nilsen (mutex) 2010-08-23 01:14:41

Isnt the backtraces usable? Asterisk is extremely unstable with ConfBridge. Please tell me if its something more i can provide to support the debugging process.

By: Leif Madsen (lmadsen) 2010-08-31 15:26:49

Issue acknowledged.

By: Thomas Nilsen (mutex) 2010-09-03 01:57:15

If there is a patch for the lock operations when exiting the conference i would be more than happy to test it in my system. Just message me if you need some assistance.



By: Clod Patry (junky) 2010-09-14 00:13:09

If would think there's a deadlocks, could you give us the output of "core show locks" in your CLI.

Thanks

By: Thomas Nilsen (mutex) 2010-09-14 01:13:46

atlas*CLI> core show locks
No such command 'core show locks' (type 'core show help core show locks' for other possible commands)

By: Thomas Nilsen (mutex) 2010-09-14 06:32:01

I recompiled asterisk and i have got the command now.

I will post the contents as soon as asterisk crashes again.

By: Thomas Nilsen (mutex) 2010-09-15 19:10:38

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <l                                                                                                                               ock addr> (times locked)
===
=== Thread ID: -1213207664 (do_monitor           started at [24340] chan_sip.c r                                                                                                                               estart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 9089 add_header_max_forwards dialog 0x98f54                                                                                                                               78 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x811ccec]
       /usr/sbin/asterisk(__ast_pthread_mutex_lock+0xaa) [0x81165f9]
       /usr/sbin/asterisk(__ao2_lock+0x4a) [0x8085ccc]
       /usr/lib/asterisk/modules/chan_sip.so [0xad80d1]
       /usr/lib/asterisk/modules/chan_sip.so [0xad9dd8]
       /usr/lib/asterisk/modules/chan_sip.so [0xadf8ca]
       /usr/lib/asterisk/modules/chan_sip.so [0xb25760]
       /usr/sbin/asterisk(ast_sched_runq+0x188) [0x81786fd]
       /usr/lib/asterisk/modules/chan_sip.so [0xb24c72]
       /usr/sbin/asterisk [0x81935d6]
       /lib/libpthread.so.0 [0x165832]
       /lib/libc.so.6(clone+0x5e) [0x499e0e]
=== ---> Lock #1 (chan_sip.c): MUTEX 24312 do_monitor &monlock 0xb66c80 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x811ccec]
       /usr/sbin/asterisk(__ast_pthread_mutex_lock+0xaa) [0x81165f9]
       /usr/lib/asterisk/modules/chan_sip.so [0xb24c62]
       /usr/sbin/asterisk [0x81935d6]
       /lib/libpthread.so.0 [0x165832]
       /lib/libc.so.6(clone+0x5e) [0x499e0e]
=== ---> Tried and failed to get Lock #2 (chan_sip.c): MUTEX 3403 retrans_pkt pk                                                                                                                               t->owner->owner 0x9908c28 (0)
       /usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x811ccec]
       /usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xaa) [0x8116956]
       /usr/sbin/asterisk(__ao2_trylock+0x4a) [0x8085e1e]
       /usr/lib/asterisk/modules/chan_sip.so [0xabbf2b]
       /usr/sbin/asterisk(ast_sched_runq+0x188) [0x81786fd]
       /usr/lib/asterisk/modules/chan_sip.so [0xb24c72]
       /usr/sbin/asterisk [0x81935d6]
       /lib/libpthread.so.0 [0x165832]
       /lib/libc.so.6(clone+0x5e) [0x499e0e]
=== -------------------------------------------------------------------
===
=== Thread ID: -1215546480 (multiplexed_thread_function started at [  274] bridg                                                                                                                               e_multiplexed.c multiplexed_add_or_remove())
=== ---> Lock #0 (bridge_multiplexed.c): MUTEX 221 multiplexed_thread_function m                                                                                                                               ultiplexed_thread 0x986ac50 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x811ccec]
       /usr/sbin/asterisk(__ast_pthread_mutex_lock+0xaa) [0x81165f9]
       /usr/sbin/asterisk(__ao2_lock+0x4a) [0x8085ccc]
       /usr/lib/asterisk/modules/bridge_multiplexed.so [0x204578]
       /usr/sbin/asterisk [0x81935d6]
       /lib/libpthread.so.0 [0x165832]
       /lib/libc.so.6(clone+0x5e) [0x499e0e]
=== ---> Lock #1 (channel.c): MUTEX 4550 ast_write chan 0x9908c28 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x811ccec]
       /usr/sbin/asterisk(__ast_pthread_mutex_trylock+0xaa) [0x8116956]
       /usr/sbin/asterisk(__ao2_trylock+0x4a) [0x8085e1e]
       /usr/sbin/asterisk(ast_write+0xe7) [0x80b32a4]
       /usr/lib/asterisk/modules/bridge_multiplexed.so [0x204e6f]
       /usr/sbin/asterisk(ast_bridge_handle_trip+0x19f) [0x808c5fe]
       /usr/lib/asterisk/modules/bridge_multiplexed.so [0x204644]
       /usr/sbin/asterisk [0x81935d6]
       /lib/libpthread.so.0 [0x165832]
       /lib/libc.so.6(clone+0x5e) [0x499e0e]
=== ---> Waiting for Lock #2 (chan_sip.c): MUTEX 6017 sip_write p 0x98f5478 (1)
       /usr/sbin/asterisk(ast_bt_get_addresses+0x19) [0x811ccec]
       /usr/sbin/asterisk(__ast_pthread_mutex_lock+0xaa) [0x81165f9]
       /usr/sbin/asterisk(__ao2_lock+0x4a) [0x8085ccc]
       /usr/lib/asterisk/modules/chan_sip.so [0xabbee1]
       /usr/sbin/asterisk(ast_sched_runq+0x188) [0x81786fd]
       /usr/lib/asterisk/modules/chan_sip.so [0xb24c72]
       /usr/sbin/asterisk [0x81935d6]
       /lib/libpthread.so.0 [0x165832]
       /lib/libc.so.6(clone+0x5e) [0x499e0e]
=== --- ---> Locked Here: chan_sip.c line 9089 (add_header_max_forwards)
=== -------------------------------------------------------------------
===
=======================================================================

By: Thomas Nilsen (mutex) 2010-09-23 08:30:52

Could one of the developers please e-mail me directly please.

By: thomas987 (thomas987) 2010-09-27 07:09:15

Please let me know how I can assist in fixing this bug. I can reproduce the error very easily/quickly in my setup so any code you might be working on to fix this issue I can help in testing.

By: Leif Madsen (lmadsen) 2010-10-04 10:16:35

Your issue is in queue, please be patient, and we will get to it as time permits and developer resources become available.

By: David Vossel (dvossel) 2011-05-06 17:20:06

Can someone please verify if this issue is reproducible in the new Trunk version of ConfBridge?  I have done quite a bit of work in that area lately.