[Home]

Summary:ASTERISK-18885: confbridge, video hangs, Exceptionally long queue length queuing to Bridge
Reporter:Dmitry Melekhov (slesru)Labels:
Date Opened:2011-11-18 05:33:52.000-0600Date Closed:2012-01-13 10:49:22.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Applications/app_confbridge
Versions:10.0.0-rc2 Frequency of
Occurrence
Frequent
Related
Issues:
is the original version of this clone:ASTERISK-19763 Confbridge with SIP to Local channel results in hung video; Exceptionally long queue length queuing to Bridge
Environment:Centos 5.7/x86Attachments:( 0) chan_bridge_cleanup_v2.diff
( 1) chan_bridge_cleanup_v3.diff
( 2) confbridge.conf
( 3) DebugLog
( 4) myDebugLog
( 5) sip.conf
Description:Hello!


We testing confbridge with video with  follow_talker , connected with 2 clients, ekiga 3.3.2.
First several seconds works OK, but then (I guess when switching video to talker) video from asterisk stops and I see many messages
[Nov 18 15:28:41] WARNING[31286]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0xb786908-input
[Nov 18 15:28:42] WARNING[31261]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0xb786908-input
[Nov 18 15:28:43] WARNING[31261]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0xb786908-input

after this only way to get video again is to reistablish call...
Comments:By: Leif Madsen (lmadsen) 2011-11-18 08:04:17.818-0600

More information required. At a minimum we'll need debug logs and the codecs in use on the bridge.

By: Leif Madsen (lmadsen) 2011-11-18 08:04:53.132-0600

Please provide a sip.conf and your ConfBridge() configuration in both the dialplan and the ConfBridge configuration file.

By: Dmitry Melekhov (slesru) 2011-11-18 10:24:07.011-0600

confbridge.conf

By: Dmitry Melekhov (slesru) 2011-11-18 10:25:04.450-0600

sip.conf with removed other (non related to confbridge test) users

By: Dmitry Melekhov (slesru) 2011-11-18 10:27:51.952-0600

dialplan to call confbridge:

exten =>6066,1,ConfBridge(1)

codecs are PCMU for audio and h261 for video.
and sorry, I don't know how to do debug for confbridge.
Could you tell what should I do?

Thank you!

By: Matt Jordan (mjordan) 2011-11-18 14:36:54.797-0600

Getting a debug log for ConfBridge is the same as getting a debug log for Asterisk.  This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Dmitry Melekhov (slesru) 2011-11-20 22:54:14.234-0600

this is log from another (less loaded) server with almost the same config, only difference is x86-64 and confbridge number 4406, sip debug is on.


By: Dmitry Melekhov (slesru) 2011-11-20 22:55:08.097-0600

Hello!

Just produced debug.
My colleague was in before I turned debug on, then he reentered to confbridge.

Thank you!


By: Matt Jordan (mjordan) 2011-11-21 09:20:21.113-0600

Your log file doesn't demonstrate the actual bug that you reported, wherein the frames are piled up in channel.c:

[Nov 18 15:28:43] WARNING[31261]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0xb786908-input

If you follow the instructions on the wiki, you should have all log message types being logged to the log file - including ERROR, WARNING, VERBOSE, DEBUG, etc.  Please make sure that the logger.conf that you've edited has the same filters as noted in the wiki article.  Before starting your test, make sure that all the log settings have been turned to the appropriate levels.  Then execute the test that reproduces the problem, and attach the logs.

By: Dmitry Melekhov (slesru) 2011-11-21 11:55:50.955-0600

Hello!

As I wrote, I did debug on another server- with x86-64 , not x86, just because it is hard to find time to run debug on production server. I did all right, according to instructions , and, behaviour, i.e. video stuck on talker switching is always reproducable. This may mean that problem is not in long queue, imho. So , there is debug log with problem.
I'll try to provide debug from x86 server tomorrow.

Thank you!

By: Dmitry Melekhov (slesru) 2011-11-21 22:59:05.919-0600

Hello!

Here is debug log with these warning.
And I were only user at this confbgridge...


By: Dmitry Melekhov (slesru) 2011-11-21 23:01:10.879-0600

btw, just because this warning appears even if there is no other users in conference and , thus, there is no talker video switching possible I guess video stuck is not related to this message ...


By: Dmitry Melekhov (slesru) 2011-11-22 00:43:52.941-0600

And just got crash ( may be somebody called confbridge, I don't know):

(gdb) bt
#0  0xb788c424 in __kernel_vsyscall ()
#1  0x48292df0 in raise () from /lib/libc.so.6
#2  0x48294701 in abort () from /lib/libc.so.6
#3  0x482cb3ab in __libc_message () from /lib/libc.so.6
#4  0x482d36c5 in _int_free () from /lib/libc.so.6
#5  0x482d3b09 in free () from /lib/libc.so.6
#6  0x081a00ed in ast_timer_close (handle=0x97589c8) at timing.c:150
#7  0x080b07ae in ast_channel_destructor (obj=0x99efe90) at channel.c:2459
#8  0x0808862a in internal_ao2_ref (user_data=0x99efe90, delta=-1) at astobj2.c:267
#9  0x08088567 in __ao2_ref (user_data=0x99efe90, delta=-1) at astobj2.c:237
#10 0x080b1fbc in ast_hangup (chan=0x99efe90) at channel.c:2906
#11 0x08092ab9 in bridge_channel_thread (data=0x976ec80) at bridging.c:1128
#12 0x081aacb2 in dummy_start (data=0x972d788) at utils.c:1010
#13 0x483fc832 in start_thread () from /lib/libpthread.so.0
#14 0x4833c46e in clone () from /lib/libc.so.6

By: David Vossel (dvossel) 2011-11-22 09:59:16.832-0600

What timing source are you using, timerfd, pthread, dahdi?

By: Dmitry Melekhov (slesru) 2011-11-22 10:41:51.102-0600

asterisk*CLI> module show like timing
Module                         Description                              Use Count
res_timing_dahdi.so            DAHDI Timing Interface                   1        
1 modules loaded


Thank you!

By: Matt Jordan (mjordan) 2011-11-22 12:35:41.627-0600

While we continue to investigate why the Ekiga endpoint is not triggering the video source switch logic, can you test with two video endpoints that are supported?  There is a list of the endpoints that we've tested with here:

https://wiki.asterisk.org/wiki/display/AST/ConfBridge+10

Thanks

Matt

By: Dmitry Melekhov (slesru) 2011-11-22 23:22:31.859-0600

Hello!

My colleague is out of office, so I reproduced the same problem with jitsi and call from dahdi (pri) channel:
[Nov 23 09:20:40] WARNING[9815]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0x8b3afd8-input
[Nov 23 09:20:40] WARNING[9815]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0x8b3afd8-input
[Nov 23 09:20:41] WARNING[9665]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0x8b3afd8-input
[Nov 23 09:20:42] WARNING[9815]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0x8b3afd8-input
[Nov 23 09:20:42] WARNING[9815]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0x8b3afd8-input

Thank you!


By: Dmitry Melekhov (slesru) 2011-11-24 02:37:44.521-0600

tested with 3 points with jitsi.
Yes, it works sometimes :-)

but the same problem

[Nov 24 12:29:09] WARNING[1488] channel.c: Exceptionally long queue length queuing to Bridge/0x9fad628-input                                  
[Nov 24 12:29:10] WARNING[1488] channel.c: Exceptionally long queue length queuing to Bridge/0x9fad628-input                                  
[Nov 24 12:29:10] WARNING[1488] channel.c: Exceptionally long queue length queuing to Bridge/0x9fad628-input                                  
[Nov 24 12:29:11] WARNING[1415] channel.c: Exceptionally long queue length queuing to Bridge/0x9fad628-input                                  
[Nov 24 12:29:11] WARNING[1488] channel.c: Exceptionally long queue length queuing to Bridge/0x9fad628-input                                  
[Nov 24 12:29:12] WARNING[1415] channel.c: Exceptionally long queue length queuing to Bridge/0x9fad628-input                                  
[Nov 24 12:29:12] WARNING[1415] file.c: File confbridge-leave does not exist in any format                                                    
[Nov 24 12:29:12] WARNING[1415] file.c: Unable to open confbridge-leave (format (slin)): No such file or directory                            
[Nov 24 12:29:12] WARNING[1415] channel.c: Hard hangup called by thread -1356481648 on Bridge/0x9fad628-output, while fd is blocked by thread
[Nov 24 12:29:12] ERROR[1498] lock.c: channel.c line 3784 (__ast_read): Error obtaining mutex: Invalid argument                              
[Nov 24 12:29:12] ERROR[1498] lock.c: channel.c line 4294 (__ast_read): mutex 'chan' freed more times than we've locked!                      
[Nov 24 12:29:12] ERROR[1498] lock.c: channel.c line 4294 (__ast_read): Error releasing mutex: Invalid argument                              
[Nov 24 12:29:12] ERROR[1498] lock.c: channel.c line 2780 (ast_hangup): Error obtaining mutex: Invalid argument                              
[Nov 24 12:29:12] ERROR[1498] lock.c: pbx.c line 9868 (pbx_builtin_getvar_helper): Error obtaining mutex: Invalid argument                    
[Nov 24 12:29:12] ERROR[1498] lock.c: pbx.c line 9890 (pbx_builtin_getvar_helper): mutex 'chan' freed more times than we've locked!          
[Nov 24 12:29:12] ERROR[1498] lock.c: pbx.c line 9890 (pbx_builtin_getvar_helper): Error releasing mutex: Invalid argument                    
[Nov 24 12:29:12] ERROR[1498] lock.c: cel.c line 482 (ast_cel_report_event): Error obtaining mutex: Invalid argument                          
[Nov 24 12:29:12] ERROR[1498] lock.c: cel.c line 487 (ast_cel_report_event): mutex 'chan' freed more times than we've locked!                
[Nov 24 12:29:12] ERROR[1498] lock.c: cel.c line 487 (ast_cel_report_event): Error releasing mutex: Invalid argument                          
[Nov 24 12:29:12] ERROR[1498] lock.c: channel.c line 2874 (ast_hangup): mutex 'chan' freed more times than we've locked!                      
[Nov 24 12:29:12] ERROR[1498] lock.c: channel.c line 2874 (ast_hangup): Error releasing mutex: Invalid argument                              
[Nov 24 12:29:12] ERROR[1498] lock.c: ccss.c line 3599 (ast_cc_offer): Error obtaining mutex: Invalid argument                                
[Nov 24 12:29:12] ERROR[1498] lock.c: ccss.c line 3601 (ast_cc_offer): mutex 'caller_chan' freed more times than we've locked!              

and, finally, crash

(gdb) bt
#0  0x09fe9cc0 in ?? ()
#1  0x081a00d1 in ast_timer_close (handle=0x9c74918) at timing.c:148
#2  0x080b07ae in ast_channel_destructor (obj=0x9ff4420) at channel.c:2459
#3  0x0808862a in internal_ao2_ref (user_data=0x9ff4420, delta=-1) at astobj2.c:267
#4  0x08088567 in __ao2_ref (user_data=0x9ff4420, delta=-1) at astobj2.c:237
#5  0x080b1fbc in ast_hangup (chan=0x9ff4420) at channel.c:2906
#6  0x08092ab9 in bridge_channel_thread (data=0x9fe4630) at bridging.c:1128
#7  0x081aacb2 in dummy_start (data=0xa06a308) at utils.c:1010
#8  0x483fc832 in start_thread () from /lib/libpthread.so.0
#9  0x4833c46e in clone () from /lib/libc.so.6


By: Matt Jordan (mjordan) 2011-12-01 13:13:40.245-0600

Are you running with DEBUG_THREADS and/or DONT_OPTIMIZE?

Either one of those options will, depending on the system and video flow, potentially slow the system down enough that you would run into this.

If you are using those two options, please recompile without them and retest.



By: Dmitry Melekhov (slesru) 2011-12-01 23:39:53.366-0600

Recompiled without debug and don't optimize.

got this after some time anyway with 3 users:

[Dec  2 09:36:37] WARNING[29800]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0xa1417b0-input
[Dec  2 09:36:38] WARNING[29800]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0xa1417b0-input
[Dec  2 09:36:38] WARNING[29800]: channel.c:1488 __ast_queue_frame: Exceptionally long queue length queuing to Bridge/0xa1417b0-input


but looks like it works anyway.
we'll do more tests, looks like we have problems with jitsi on windows - one of computers runs windows.

By: Dmitry Melekhov (slesru) 2011-12-02 00:33:20.014-0600

Well, it works far better without debug, but crashes anyway:

[Dec  2 10:31:41] WARNING[29794]: channel.c:2854 ast_hangup: Hard hangup called by thread -1252263024 on Bridge/0xa1417b0-output, while fd is blocked by thread -1251599472 in procedure ast_waitfor_nandfds!  Expect a failure
asterisk*CLI>

and then crash


By: Dmitry Melekhov (slesru) 2011-12-02 00:34:49.409-0600

forget to mention, this (crash) happened when I kicked off user from confbridge conference.



By: Matt Jordan (mjordan) 2011-12-02 09:03:54.704-0600

Dmitry - in the case of the crash, can you attach a backtrace?

By: Dmitry Melekhov (slesru) 2011-12-02 09:12:08.822-0600

Sorry, no :-(
Asterisk was compiled with no debug threads, so I decided backtrace will be useless and deleted core file...


By: Matt Jordan (mjordan) 2011-12-02 13:15:47.279-0600

DEBUG_THREADS may not be needed.  DONT_OPTIMIZE would be good to have on.  If you can reproduce the crash with that configuration and attach the backtrace, that'd be great.

Thanks!

By: Dmitry Melekhov (slesru) 2011-12-05 00:19:31.944-0600

Hello!

I'll do tests later, but just did 2 calls to bridge conference from Dahdi (PRI ) channel and got segfault:

(gdb) bt
#0  0xb750005d in ?? ()
#1  0x0819c965 in ast_timer_close (handle=0xb59a5d88) at timing.c:148
#2  0x080b06de in ast_channel_destructor (obj=0xb528aa18) at channel.c:2459
#3  0x0808855a in internal_ao2_ref (user_data=0xb528aa18, delta=-1) at astobj2.c:267
#4  0x08088497 in __ao2_ref (user_data=0xb528aa18, delta=-1) at astobj2.c:237
#5  0x080b1eec in ast_hangup (chan=0xb528aa18) at channel.c:2906
#6  0x080929e9 in bridge_channel_thread (data=0xb59d2148) at bridging.c:1128
#7  0x081a63d1 in dummy_start (data=0xb5287c00) at utils.c:1010
#8  0x47a47832 in start_thread () from /lib/libpthread.so.0
#9  0x47b5f46e in clone () from /lib/libc.so.6


and
[Dec  5 10:15:06] WARNING[25846] channel.c: Exceptionally long voice queue length queuing to Bridge/0xb59a81f0-input                          
[Dec  5 10:15:08] WARNING[25846] channel.c: Exceptionally long voice queue length queuing to Bridge/0xb59a81f0-input                          
[Dec  5 10:15:09] WARNING[25846] channel.c: Exceptionally long voice queue length queuing to Bridge/0xb59a81f0-input                          
[Dec  5 10:15:10] WARNING[25846] channel.c: Exceptionally long voice queue length queuing to Bridge/0xb59a81f0-input      
in messages

Thank you!


By: Dmitry Melekhov (slesru) 2011-12-08 22:26:45.670-0600

Unfortunately, my colleagues now (last several days) test another conference solution, so I have no camera :-)
I guess I'll not have it at least until next week, so no tests until then ;-)

Thank you!


By: David Vossel (dvossel) 2011-12-29 17:47:44.103-0600

I do not know if this will resolve your issue or not, but I've attached a patch that cleans up several issues I found in chan_bridge.  It would be worth testing this patch to see if it helps.

By: Dmitry Melekhov (slesru) 2011-12-30 01:29:39.032-0600

Hello!

First test shows that problem is fixed.
Thank you!

We'll do more tests, but, I'm afraid only after Jan 10.

Thank you!

And Happy New Year! :-)




By: David Vossel (dvossel) 2011-12-30 11:04:45.248-0600

Hey, that patch addressed some of the issues I saw, but not all of them.  I found some new stuff this morning.  If you could test version 2 of the patch as well that would be great. I am attaching it now.

By: David Vossel (dvossel) 2011-12-30 12:00:25.829-0600

I deleted version 1 of the patch.  Only version 2 exists now.

By: Dmitry Melekhov (slesru) 2011-12-31 03:47:26.618-0600

Thank you!

Installed new patch :-)
But can't test until Jan 10 because of long holidays in Russia ,and, thus no testers available ;-)


By: David Vossel (dvossel) 2012-01-04 16:52:10.322-0600

Okay, I have now uploaded version 3 of the patch.  Version 1 and 2 were aimed at fixing the "exceptionally long queue" issue.  Version 3 combines the other patches and a new fix that addresses the crash issue that was also reported here.


By: Matt Lehner (mlehner) 2012-01-06 16:51:49.996-0600

I am experiencing the segfault issue when leaving a ConfBridge. It happens when multiple channels are all leaving the ConfBridge very close to each other or at the same time.

Background is that we are using ConfBridge to handle paging for phones. We send commands to the phones outside of asterisk to tell them to call the ConfBridge as well as hangup. If we ask the phones to quit at the same time.. we will get the segfault, if we add a delay it doesn't.

It happens in the same place as the above comment, ast_timer_close. That scenario is the easiest to replicate. However, I did sometimes see it happen elsewhere inside the bridge destroy methods.. but I did not grab the backtrace.

v3 of the patch did not help the seg fault crashes.

By: Matt Jordan (mjordan) 2012-01-09 08:27:50.061-0600

If you're having a segmentation fault with v3 of this patch, please provide a backtrace (with the DONT_OPTIMIZE flag turned on).

By: Dmitry Melekhov (slesru) 2012-01-10 01:33:59.971-0600

Did more tests, can't reproduce Exceptionally long queue length queuing :-)
So issue is fixed.
Thank you!