Summary: | ASTERISK-26956: Exceptionally long voice > queue length queuing | ||
Reporter: | Ted G (tgwaste) | Labels: | |
Date Opened: | 2017-04-23 17:38:20 | Date Closed: | 2017-05-03 14:24:27 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | |
Versions: | 14.4.0 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Debian GNU/Linux 8.7 (jessie) | Attachments: | ( 0) backtrace-threads.20170423.txt ( 1) core-show-locks.20170423.txt |
Description: | Hello,
I am getting tons of these messages in my logs a few times a day which causes my system to be completely dead: [Apr 20 21:39:15] WARNING[24491][C-00000197]: channel.c:1162 _astqueue_frame: Exceptionally long voice queue length queuing to Local/s@announce-0000015b;1 [Apr 23 15:32:06] WARNING[20645] channel.c: Exceptionally long voice queue length queuing to CBAnn/12-000001f0;1 I am having a very difficult time figuring out what is causing it other than 'its a channel deadlock' which doesn't really tell me how to fix the issue. This just started happening around ~6 days ago. This problem is extremely catastrophic as the only fix is to completely restart the asterisk system. Included are backtrace and core-show-locks logs. Please assist this desperate fool. :) | ||
Comments: | By: Asterisk Team (asteriskteam) 2017-04-23 17:38:21.171-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Ted G (tgwaste) 2017-04-23 21:01:20.947-0500 Leak? Tasks: 190 total, 2 running, 188 sleeping, 0 stopped, 0 zombie %Cpu(s): 41.2 us, 51.1 sy, 0.0 ni, 7.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st KiB Mem: 16427168 total, 14702436 used, 1724732 free, 125352 buffers KiB Swap: 15721468 total, 67096 used, 15654372 free. 3509960 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27979 root 20 0 9946.8m 7.517g 16656 S 359.9 48.0 669:42.52 asterisk Going to compile with MALLOC_DEBUG and obtain mmlog Perhaps high resource usage is due to enabling DEBUG_THREADS, DONT_OPTIMIZE, BETTER_BACKTRACES By: Richard Mudgett (rmudgett) 2017-04-24 09:42:15.755-0500 The core show locks and backtrace do not show a deadlock. The exceptionally long voice queue in this case just shows that something is not processing frames fast enough. DEBUG_THREADS does add quite a bit of processing overhead so I hope that you do not normally have that option enabled. By: Ted G (tgwaste) 2017-04-25 14:47:23.629-0500 Its very difficult to diagnose because all this debugging stuff puts an enormous strain on my system. Here is what I have noticed. It seems to be mostly local channels, i.e.: Local/s@announce-0000015b My purpose for these is to play announcements into conferences so that everyone can hear them and thus to create these I do the following: Create a call file like so and move it to: /var/spool/asterisk/outgoing {quote} Channel: Local/s@announce Set: ARG1=11 Application: Playback Data: /opt/sipx/wav/msg1&/opt/sipx/wav/msg2 {quote} Which triggers an event in extensions.conf: {quote} [announce] exten => s,1,NoOp same => n,Answer same => n,ConfBridge($ARG1,bridge,user) same => n,Hangup {quote} So maybe something with that method is causing these channels to deadlock somehow. Ive been doing this for more than a year and this problem only cropped up a ~week ago. I removed the { and } around ARG1 because it was being formatted badly for this page. By: Ted G (tgwaste) 2017-04-26 01:08:47.476-0500 Problem just happened again, tons of these and froze system: [Apr 25 23:01:02] WARNING[12664][C-00002989]: channel.c:1145 __ast_queue_frame: Exceptionally long voice queue length queuing to Local/s@announce-00002c4a;1 This time I peeked in /var/spool/asterisk/outgoing/ and saw a bunch of .call files sitting there with an additional entry at the bottom of each that looks like: StartRetry: 2188 1 (1493186605) I think tomorrow I will turn these announcements off and see if the system can go 48 hours without a crash. By: Rusty Newton (rnewton) 2017-04-26 16:22:45.072-0500 If you can post a debug log showing what is going on before and during the issue, that may provide some clues. https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information By: Ted G (tgwaste) 2017-04-26 20:07:42.005-0500 Gosh... I think I know what the problem might be # egrep 'rtpstart|rtpend' /etc/asterisk/rtp.conf ; Defaults are rtpstart=5000 and rtpend=31000 rtpstart=10000 rtpend=20000 my open ports: --dports 5060:5080,10000:10500 *sigh* *crosses fingers* 280 permit tcp any host xxx.xxx.xxx.xxx range 10000 10200 *double sigh* By: Ted G (tgwaste) 2017-04-26 20:27:21.285-0500 Can anyone confirm the possibility of channels locking up because of blocked ports? By: Joshua C. Colp (jcolp) 2017-04-27 11:53:25.222-0500 A lack of open ports shouldn't cause a lock up. By: Ted G (tgwaste) 2017-04-27 16:22:59.873-0500 Im still watching. Some of my users were reporting their keys would suddenly stop responding and the system would go silent. Playing around with a simply C script (https://community.asterisk.org/t/c-issue-wait-for-digit/70493) I noticed that I would not receive any results back from key presses so I was thinking maybe the user getting locked up and not able to do anything caused a cascade that led to a lockup as at this point they would have hung up or at least tried to. When I fixed the open ports to match rtp.conf the issues went away so I'm hoping the 'Exceptionally long voice queue' issues will go away as well. So far I have not seen any since I have made the change. Once I'm at at least 3 days without issue I will consider the matter solved. By: Ted G (tgwaste) 2017-04-30 22:05:12.849-0500 Well its been over 3 days now and there hasn't been a single issue. Looks like ports was ultimately the problem. By: Joshua C. Colp (jcolp) 2017-05-03 14:24:27.379-0500 I'm suspending this for now per your comment, but if it reoccurs you can reopen it automatically by commenting again. By: Vittorio Giordano (vittogio74) 2017-10-03 00:21:02.057-0500 I followed the steps, checking the rtp ports, but the issue is still there. Asterisk randomly hangs. Did you oticed any further failure after you applied this "fix"? i am running Asterisk 11-cert on centos7. I have both SIP and PSTN channels By: Ted G (tgwaste) 2017-10-03 11:52:33.798-0500 No, no further issues. Have you tried disabling firewall completely? Maybe try updating to newest version if possible? By: Asterisk Team (asteriskteam) 2017-10-03 11:52:34.181-0500 This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable. By: Vittorio Giordano (vittogio74) 2017-10-05 21:32:34.814-0500 the erro started with Asterisk version 11.25. then i moved back to the certified one, but, same issue. I noticed this when, looking at the console i saw Warining on max file size and then the hang of Asterisk and as workaround I changed the value in asterisk.conf from 1000 to 2048. Now i do not see anymore the hanging for the max file, but "just" the exceptionally long..... . It is like or a wrong channel closure during hangup, as i see that the system when this error occurs starts calling, recursively some local channels. Firewall was not active with both version of Asterisk. Now, i have Asterisk 11.6-cert17, Dahdi is 2.10.2. LibPRI 1,6.0. Kernel 3.10.0-693.2.2.el7.x86_64 Actually i removed one T435 card, as i had 4, so now i have 3xTE435 + 1xTCE400P. . Thanks By: Vittorio Giordano (vittogio74) 2017-10-12 10:58:53.891-0500 We have been able to re-produce the issue, so finding the cause: SIP channel hangup is not correctly performed by the phone (hardware) when the digits (DTMF) sent over the AMI for the "hold" function are not correctly read by the system. Due to this the channel of the caller is not correctly closed and it is "frozen". Looking at the Asterisk console we see that the number of active channels starts increasing and the console starts getting full of Exceptionally long queueing message. Once forcing the closure of the caller channel by using the hangup button, we see on the console the "Autodestruct" messages. Then the whole system hang, The asterisk process is still available and the only was to stop and start it is to kill the process with the "-9" option. This has been reproduced with kernel 3.0 and Asterisk 11 By: Vittorio Giordano (vittogio74) 2017-10-20 04:54:03.465-0500 I have been running a test session on on CentOS 6.9 and Asterisk 11 and No issue at all. By: NESE KIZIL (neshredkid) 2017-10-24 10:29:05.740-0500 so what is your conclusion By: Ted G (tgwaste) 2017-10-24 10:49:08.663-0500 Trying setting rtp port range to like 100 -> 200 then in your firewall settings only open ports 100,101 then make a ton of calls and use a lot of touch tones. Might give you better chances of coming across the error. Personally: this happened for weeks and as soon as I opened the correct ports in the firewall the issue went away. My call volume is very high though. By: Asterisk Team (asteriskteam) 2017-10-24 10:49:08.824-0500 This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable. By: NESE KIZIL (neshredkid) 2017-10-24 12:14:17.232-0500 Hello Ted, Actually there is no fw in our case as Vittorio, we handle the traffic in local Network. And also we have lots of traffic as well about 20K calls per day.Both pjsip and sip endpoints have the same problems. We restart the asterisk everyday. By: Vittorio Giordano (vittogio74) 2017-10-24 14:30:10.585-0500 From my experience, considering anyhow the correct settings for the RTP port, i see the certified version 11 working without issue on 2,6 kernels (what i use is CentOS 6.9). I'll keep anyhow monitoring the server and I will run some tests with version 13 certified, kernels 3.0. Just a quick question, to who has still the issue: Are you using ODBC, what version? Do you noticed disconnection/connection on the asterisk console? I know that this question could not be related to the topic, but as i have not a dump core, i also noticed that even those disappeared when moving back to kernel 2.6, while with the kernel 3.0 i had also a lot of. Again could not be related, so it is just for further investigation and understanding. Thank you |