ASTERISK-26956: Exceptionally long voice

[Home]

Summary: ASTERISK-26956: Exceptionally long voice > queue length queuing

Reporter: Ted G (tgwaste) Labels:

Date Opened: 2017-04-23 17:38:20 Date Closed: 2017-05-03 14:24:27

Priority: Major Regression?

Status: Closed/Complete Components:

Versions: 14.4.0 Frequency of
Occurrence

Related
Issues:

Environment: Debian GNU/Linux 8.7 (jessie) Attachments: ( 0) backtrace-threads.20170423.txt
( 1) core-show-locks.20170423.txt

Description: Hello,
I am getting tons of these messages in my logs a few times a day which causes my system to be completely dead:

[Apr 20 21:39:15] WARNING[24491][C-00000197]: channel.c:1162 _astqueue_frame: Exceptionally long voice queue length queuing to Local/s@announce-0000015b;1
[Apr 23 15:32:06] WARNING[20645] channel.c: Exceptionally long voice queue length queuing to CBAnn/12-000001f0;1

I am having a very difficult time figuring out what is causing it other than 'its a channel deadlock' which doesn't really tell me how to fix the issue. This just started happening around ~6 days ago.

This problem is extremely catastrophic as the only fix is to completely restart the asterisk system.

Included are backtrace and core-show-locks logs. Please assist this desperate fool. :)

Comments: By: Asterisk Team (asteriskteam) 2017-04-23 17:38:21.171-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].
By: Ted G (tgwaste) 2017-04-23 21:01:20.947-0500

Leak?

Tasks: 190 total, 2 running, 188 sleeping, 0 stopped, 0 zombie
%Cpu(s): 41.2 us, 51.1 sy, 0.0 ni, 7.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
KiB Mem: 16427168 total, 14702436 used, 1724732 free, 125352 buffers
KiB Swap: 15721468 total, 67096 used, 15654372 free. 3509960 cached Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27979 root 20 0 9946.8m 7.517g 16656 S 359.9 48.0 669:42.52 asterisk

Going to compile with MALLOC_DEBUG and obtain mmlog

Perhaps high resource usage is due to enabling DEBUG_THREADS, DONT_OPTIMIZE, BETTER_BACKTRACES
By: Richard Mudgett (rmudgett) 2017-04-24 09:42:15.755-0500

The core show locks and backtrace do not show a deadlock. The exceptionally long voice queue in this case just shows that something is not processing frames fast enough. DEBUG_THREADS does add quite a bit of processing overhead so I hope that you do not normally have that option enabled.
By: Ted G (tgwaste) 2017-04-25 14:47:23.629-0500

Its very difficult to diagnose because all this debugging stuff puts an enormous strain on my system.

Here is what I have noticed. It seems to be mostly local channels, i.e.: Local/s@announce-0000015b

My purpose for these is to play announcements into conferences so that everyone can hear them and thus to create these I do the following:

Create a call file like so and move it to: /var/spool/asterisk/outgoing

{quote}
Channel: Local/s@announce
Set: ARG1=11
Application: Playback
Data: /opt/sipx/wav/msg1&/opt/sipx/wav/msg2
{quote}

Which triggers an event in extensions.conf:

{quote}
[announce]
exten => s,1,NoOp
same => n,Answer
same => n,ConfBridge($ARG1,bridge,user)
same => n,Hangup
{quote}

So maybe something with that method is causing these channels to deadlock somehow. Ive been doing this for more than a year and this problem only cropped up a ~week ago.

I removed the { and } around ARG1 because it was being formatted badly for this page.
By: Ted G (tgwaste) 2017-04-26 01:08:47.476-0500

Problem just happened again, tons of these and froze system:

[Apr 25 23:01:02] WARNING[12664][C-00002989]: channel.c:1145 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/s@announce-00002c4a;1

This time I peeked in /var/spool/asterisk/outgoing/ and saw a bunch of .call files sitting there with an additional entry at the bottom of each that looks like:

StartRetry: 2188 1 (1493186605)

I think tomorrow I will turn these announcements off and see if the system can go 48 hours without a crash.
By: Rusty Newton (rnewton) 2017-04-26 16:22:45.072-0500

If you can post a debug log showing what is going on before and during the issue, that may provide some clues.

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information
By: Ted G (tgwaste) 2017-04-26 20:07:42.005-0500

Gosh... I think I know what the problem might be

# egrep 'rtpstart|rtpend' /etc/asterisk/rtp.conf
; Defaults are rtpstart=5000 and rtpend=31000
rtpstart=10000
rtpend=20000

my open ports:

--dports 5060:5080,10000:10500

*sigh*

*crosses fingers*

280 permit tcp any host xxx.xxx.xxx.xxx range 10000 10200

*double sigh*

By: Ted G (tgwaste) 2017-04-26 20:27:21.285-0500

Can anyone confirm the possibility of channels locking up because of blocked ports?

By: Joshua C. Colp (jcolp) 2017-04-27 11:53:25.222-0500

A lack of open ports shouldn't cause a lock up.
By: Ted G (tgwaste) 2017-04-27 16:22:59.873-0500

Im still watching. Some of my users were reporting their keys would suddenly stop responding and the system would go silent. Playing around with a simply C script (https://community.asterisk.org/t/c-issue-wait-for-digit/70493) I noticed that I would not receive any results back from key presses so I was thinking maybe the user getting locked up and not able to do anything caused a cascade that led to a lockup as at this point they would have hung up or at least tried to. When I fixed the open ports to match rtp.conf the issues went away so I'm hoping the 'Exceptionally long voice queue' issues will go away as well. So far I have not seen any since I have made the change. Once I'm at at least 3 days without issue I will consider the matter solved.

By: Ted G (tgwaste) 2017-04-30 22:05:12.849-0500

Well its been over 3 days now and there hasn't been a single issue. Looks like ports was ultimately the problem.

By: Joshua C. Colp (jcolp) 2017-05-03 14:24:27.379-0500

I'm suspending this for now per your comment, but if it reoccurs you can reopen it automatically by commenting again.
By: Vittorio Giordano (vittogio74) 2017-10-03 00:21:02.057-0500

I followed the steps, checking the rtp ports, but the issue is still there. Asterisk randomly hangs. Did you oticed any further failure after you applied this "fix"? i am running Asterisk 11-cert on centos7. I have both SIP and PSTN channels
By: Ted G (tgwaste) 2017-10-03 11:52:33.798-0500

No, no further issues. Have you tried disabling firewall completely? Maybe try updating to newest version if possible?
By: Asterisk Team (asteriskteam) 2017-10-03 11:52:34.181-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.
By: Vittorio Giordano (vittogio74) 2017-10-05 21:32:34.814-0500

the erro started with Asterisk version 11.25. then i moved back to the certified one, but, same issue. I noticed this when, looking at the console i saw Warining on max file size and then the hang of Asterisk and as workaround I changed the value in asterisk.conf from 1000 to 2048. Now i do not see anymore the hanging for the max file, but "just" the exceptionally long..... . It is like or a wrong channel closure during hangup, as i see that the system when this error occurs starts calling, recursively some local channels.
Firewall was not active with both version of Asterisk.
Now, i have Asterisk 11.6-cert17, Dahdi is 2.10.2. LibPRI 1,6.0. Kernel 3.10.0-693.2.2.el7.x86_64
Actually i removed one T435 card, as i had 4, so now i have 3xTE435 + 1xTCE400P. . Thanks
By: Vittorio Giordano (vittogio74) 2017-10-12 10:58:53.891-0500

We have been able to re-produce the issue, so finding the cause: SIP channel hangup is not correctly performed by the phone (hardware) when the digits (DTMF) sent over the AMI for the "hold" function are not correctly read by the system. Due to this the channel of the caller is not correctly closed and it is "frozen". Looking at the Asterisk console we see that the number of active channels starts increasing and the console starts getting full of Exceptionally long queueing message. Once forcing the closure of the caller channel by using the hangup button, we see on the console the "Autodestruct" messages. Then the whole system hang, The asterisk process is still available and the only was to stop and start it is to kill the process with the "-9" option. This has been reproduced with kernel 3.0 and Asterisk 11
By: Vittorio Giordano (vittogio74) 2017-10-20 04:54:03.465-0500

I have been running a test session on on CentOS 6.9 and Asterisk 11 and No issue at all.
By: NESE KIZIL (neshredkid) 2017-10-24 10:29:05.740-0500

so what is your conclusion
By: Ted G (tgwaste) 2017-10-24 10:49:08.663-0500

Trying setting rtp port range to like 100 -> 200 then in your firewall settings only open ports 100,101 then make a ton of calls and use a lot of touch tones. Might give you better chances of coming across the error.

Personally: this happened for weeks and as soon as I opened the correct ports in the firewall the issue went away. My call volume is very high though.

By: Asterisk Team (asteriskteam) 2017-10-24 10:49:08.824-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.
By: NESE KIZIL (neshredkid) 2017-10-24 12:14:17.232-0500

Hello Ted,
Actually there is no fw in our case as Vittorio, we handle the traffic in local Network.
And also we have lots of traffic as well about 20K calls per day.Both pjsip and sip endpoints have the same problems. We restart the asterisk everyday.

By: Vittorio Giordano (vittogio74) 2017-10-24 14:30:10.585-0500

From my experience, considering anyhow the correct settings for the RTP port, i see the certified version 11 working without issue on 2,6 kernels (what i use is CentOS 6.9). I'll keep anyhow monitoring the server and I will run some tests with version 13 certified, kernels 3.0.
Just a quick question, to who has still the issue: Are you using ODBC, what version? Do you noticed disconnection/connection on the asterisk console?
I know that this question could not be related to the topic, but as i have not a dump core, i also noticed that even those disappeared when moving back to kernel 2.6, while with the kernel 3.0 i had also a lot of. Again could not be related, so it is just for further investigation and understanding. Thank you