[Home]

Summary:ASTERISK-25852: chan_iax2: Exceptionally long voice queue length with trunking
Reporter:Bobby Hakimi (bobbymc)Labels:
Date Opened:2016-03-17 02:11:24Date Closed:2018-01-02 08:44:23.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:11.21.2 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:dell r815 64 core OS gentooAttachments:( 0) backtrace-threads4.txt
( 1) core-show-locks4.txt
( 2) damned_call_server.png
( 3) full.4.crash
Description:randomly this error comes up which creates deadlocks and causes all current sip or iax channels to drop

channel.c: Exceptionally long voice queue length queuing to
{noformat}
[2016-03-16 21:55:14] WARNING[44200][C-0000012b] channel.c: Exceptionally long voice queue length queuing to Local/69696969696969696@vicidial-000000cd;1
[2016-03-16 21:55:14] WARNING[23555] channel.c: Exceptionally long voice queue length queuing to IAX2/safesoft-dialer-trunk-2087813841-797
[2016-03-16 21:55:14] WARNING[44218][C-0000012d] channel.c: Exceptionally long voice queue length queuing to Local/69696969696969696@vicidial-000000cf;1
[2016-03-16 21:55:14] WARNING[23555] channel.c: Exceptionally long voice queue length queuing to IAX2/safesoft-dialer-trunk-2087813836-11006
[2016-03-16 21:55:14] WARNING[8980][C-0000021f] channel.c: Exceptionally long voice queue length queuing to Local/78600093@vicidial-0000014e;2
[2016-03-16 21:55:15] WARNING[64607][C-0000019a] channel.c: Exceptionally long voice queue length queuing to Local/69696969696969696@vicidial-0000010b;1
[2016-03-16 21:55:15] WARNING[23557] channel.c: Exceptionally long voice queue length queuing to IAX2/safesoft-dialer-trunk-2087813836-10347
[2016-03-16 21:55:15] WARNING[17830][C-00000168] channel.c: Exceptionally long voice queue length queuing to Local/69696969696969696@vicidial-000000f1;1
[2016-03-16 21:55:15] WARNING[5805][C-00000218] channel.c: Exceptionally long voice queue length queuing to Local/78600095@vicidial-0000014a;2
[2016-03-16 21:55:15] WARNING[23559] channel.c: Exceptionally long voice queue length queuing to IAX2/safesoft-dialer-trunk-2087813841-4706
[2016-03-16 21:55:16] WARNING[7346][C-0000021a] channel.c: Exceptionally long voice queue length queuing to Local/78600097@vicidial-0000014b;2
[2016-03-16 21:55:16] WARNING[7876][C-0000021d] channel.c: Exceptionally long voice queue length queuing to Local/78600010@vicidial-0000014d;2
[2016-03-16 21:55:16] WARNING[3790][C-00000216] channel.c: Exceptionally long voice queue length queuing to Local/78600092@vicidial-00000149;2
[2016-03-16 21:55:16] WARNING[44200][C-0000012b] channel.c: Exceptionally long voice queue length queuing to Local/69696969696969696@vicidial-000000cd;1
[2016-03-16 21:55:16] WARNING[23554] channel.c: Exceptionally long voice queue length queuing to IAX2/safesoft-dialer-trunk-2087813841-797
[2016-03-16 21:55:16] WARNING[23559] channel.c: Exceptionally long voice queue length queuing to IAX2/safesoft-dialer-trunk-2087813836-4657
[2016-03-16 21:55:16] WARNING[44218][C-0000012d] channel.c: Exceptionally long voice queue length queuing to Local/69696969696969696@vicidial-000000cf;1
[2016-03-16 21:55:17] WARNING[24068][C-00000172] channel.c: Exceptionally long voice queue length queuing to Local/69696969696969696@vicidial-000000f7;1
[2016-03-16 21:55:17] WARNING[7872][C-0000021b] channel.c: Exceptionally long voice queue length queuing to Local/69696969696969696@vicidial-0000014c;1
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2016-03-17 02:11:24.944-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Bobby Hakimi (bobbymc) 2016-03-17 02:14:54.087-0500

i can reproduce this on demand


By: Richard Mudgett (rmudgett) 2016-03-17 10:54:20.557-0500

Looking at [^core-show-locks4.txt] and [^backtrace-threads4.txt] it appears that there may be a missing unlock somewhere.  The thread holding a lock is not still in the area where the lock was obtained.

By: Bobby Hakimi (bobbymc) 2016-03-17 10:56:50.729-0500

it is currently happening as we speak, please let me know what else i can do. its in production so i turned off debugging when compiling asterisk, i can reproduce it myself at night again if you guys need another backtrace or core show locks. thank you again in advanced, hope we can resolve this quick, i am really desperate to solve this soon as our clients are threatening to leave.

btw i may be wrong but it spears when we do a "core reload" that this issue happens more often around that time

thank you again

By: Joshua C. Colp (jcolp) 2016-03-17 10:59:38.460-0500

There's no guarantees on how fast this will get looked at. If it's urgent you may want to try to find a consultant to look deeper into things more immediately.

By: Bobby Hakimi (bobbymc) 2016-03-17 11:16:53.064-0500

i will try that in the meantime, Joshua Colp may i email you to get more information on how to best do this?

By: Joshua C. Colp (jcolp) 2016-03-17 11:18:34.152-0500

I don't have any suggestions or a list to give for such a thing. You can ask on the asterisk-biz[1] mailing list.

[1] http://lists.digium.com/mailman/listinfo/asterisk-biz

By: Bobby Hakimi (bobbymc) 2016-03-19 04:42:28.332-0500

found the cause of this, its a combination of things, under heavy load when doing a dialplan reload or core reload a few times with the mixture of allot of monitor applications recording calls, this issue happens right away, i've stopped using the monitor application to record calls and now the core reload doesn't cause this issue anymore, thought i let others know incase digium doesn't attempt to fix it

By: Bobby Hakimi (bobbymc) 2016-03-23 16:18:55.265-0500

issue is back again, stopped doing core reload and it still happens, but it appears to happen more with core reload commands run

By: Bobby Hakimi (bobbymc) 2016-03-28 18:05:16.401-0500

it appears to happen when we make heavy use of iax2 trunks. i disabled calls from being places via iax2 trunks and the issue has not happended so far

By: Bobby Hakimi (bobbymc) 2016-03-30 18:14:04.184-0500

narrowed the issue to asterisk db, under heavy load and when using DUNDi and setting the cachetime to 5 seconds causes the asterisk db to hang. this happens because we have about 6k routes shared in dundi and setting the cachetime to 5 secodns would cause the asterisk db to hang because of the large amount of dundi routes that need to be filled into the asterisk database,. setting the cachetime to somehting like 12 hours seems to have help fix the issue. i will report back again in a week once my findings have been confirmed


By: Bobby Hakimi (bobbymc) 2016-04-07 02:09:44.660-0500

its been happening much less then before but we still occasionally get these errors. looks like this happens more often when there is higher load on the server.

By: Adam D Topkek (Atechhito) 2016-11-22 11:10:48.687-0600

We are also having the very same issue. When we have THE ISSUE it throws the same exact errors your showing up there. It is also a more heavily used server as yours is. We think we have found the trigger for causing the error too. We are running freepbx, I'm not sure if you are too. How we caused the server to crash on demand:

~ The server is running normally in production
1) Called in with a cell phone to one of our numbers
2) Made a change then clicked the apply button in freepbx gui. Watched the asterisk CLI as the asterisk reload flooded my terminal.
3) Parked the call while the asterisk reload was reloading and still outputting lines.
4) Get flooded by: Exceptionally long voice queue length queuing

Running Asterisk 11.23.0 and FreePBX 13.0.190.5

Are you able to recreate the error using that method? We are trying a few things to attempt to fix it now that we discovered how to reproduce it. If you learn anything more give a reply, lets figure this thing out its VERY annoying.

By: Adam D Topkek (Atechhito) 2016-12-05 10:23:15.366-0600

We Upgraded Asterisk from 11.23 to 11.25 and now we cannot get it to crash on demand nor have we had a crash. Will update if something changes.

By: Rusty Newton (rnewton) 2016-12-05 10:47:04.098-0600

[~Atechhito] Hopefully something went in that fixed it...

Can anyone reproduce this on Asterisk 13?  I ask because Asterisk 11 went into Security Fix only on 2016-10-25.

By: Michael Cargile (mcargile) 2016-12-20 12:26:19.434-0600

I have been running into this issue with Asterisk 11.25.1. On a high volume system with the verbosity turned up and calls going into WaitForSilence the system locks up within 15 minutes. I am wondering if it has something to do with the absolutely massive amount of CLI output WaitForSilence produces with high verbosity. This would make sense for the other people reporting this issue as well as they are issuing reloads which produce large amounts of CLI output as well. Perhaps this issue is related to the CLI output. I have set the verbosity to 1 and the system has not crashed in the last two hours. If it goes the rest of the day without a crash, tomorrow I will run on high verbosity for a while to see if that is still crashing the system.

I know Asterisk 11 is in Security Fix mode, but if I am correct an attacker would just need to figure out how to produce a ton of CLI output to on a high volume system to lock up / crash Asterisk on demand. A SIP brute force attack could do this.

By: Michael Cargile (mcargile) 2016-12-20 12:33:49.045-0600

Well of course it locked up as soon as I hit submitted on my last comment. Still it did seem to run longer without locking up with the verbosity set very low.

By: Bobby Hakimi (bobbymc) 2016-12-20 13:14:08.208-0600

same for me, it locks up on high volume traffic, i drilled the issue to the astdb file, for me DUNDi cache expire was set to 15 seconds and because of the high volume the astdb file was getting updated often and when it gets updated often under high load, the system deadlocks. hope this helps you guys

By: Bobby Hakimi (bobbymc) 2016-12-20 13:18:59.343-0600

sorry forgot to tell you guys, i also used freepbx but not much traffic was going to the freepbx agi scripts or dial plans

By: Adam D Topkek (Atechhito) 2016-12-21 08:49:30.035-0600

So with 11.25 it is not crashing near as often but it still crashes at times. It's always on an "apply changes" or "fwconsole reload" involving Freepbx.

We tested, on another server that had 11.23, during the night when no one was using the system and even while under no load parking while doing an "apply changes" or "fwconsole reload" crashed it. The server had ample memory and CPU resources while the changes were applying.

I don't think we use DUNDi at all. dundi.conf is completely commented out. We mainly use chan_SIP for our phones. I wonder if there is a parking cache that was expiring...?

By: Adam D Topkek (Atechhito) 2016-12-21 09:13:26.540-0600

!https://issues.asterisk.org/jira/secure/attachment/54774/damned_call_server.png!

In the middle of a freepbx "apply changes" or "fwconsole reload" we got this output then directly after a flood of:

WARNING... channel.c1310 __ast_queue_fame: Exceptionally long voice queue length queuing to IAX2/ULCR-9530

By: Abhay Gupta (agupta) 2017-09-15 06:54:52.565-0500

I am also facing a similar issue , wondering if anything was found for the same and ours is asterisk 13.17.1 and this happens with IAX and PJSIP channels


By: ADTopkek (ADTopkek) 2017-09-15 09:51:26.548-0500

The problem was the Max stack size. Its set to 128 by default in asterisk. We were using FreePBX so every parking lot we had took up 1 stack, every outbound route took 1 stack, and modules took up a random amount. All together it went over 128 till it started causing stack errors. We had about 50 parking lots.

FreePBX seems to be fixing it in their newest version: https://issues.freepbx.org/browse/FREEPBX-15656

By: Joshua C. Colp (jcolp) 2017-12-18 08:43:49.378-0600

I've looked over the logs here and the core parts involved have been rewritten and the locking implications have changed which I think may cause the problem to no longer occur.

Does this occur under recent versions of Asterisk?

By: ADTopkek (ADTopkek) 2017-12-18 08:47:03.071-0600

It still did it in asterisk 14 for a while but me and a few others bugged Freepbx to increase the stack size and they raised it to 512 on the most current Distro 14 with Asterisk 14 and above.

By: Asterisk Team (asteriskteam) 2018-01-02 08:44:23.026-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines