[Home]

Summary:ASTERISK-25456: Asterisk becomes unresponsive - res_rtp_asterisk: WebRTC STUN Deadlock
Reporter:Bobby Hakimi (bobbymc)Labels:
Date Opened:2015-10-08 16:52:01Date Closed:2018-01-02 08:30:30.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/WebSocket Resources/res_rtp_asterisk
Versions:11.17.1 11.20.0 11.21.2 Frequency of
Occurrence
Constant
Related
Issues:
Environment:gentooAttachments:( 0) backtrace-threads-5.txt
( 1) core-show-locks-5.txt
( 2) full.zip
Description:this issue happens when the iax2 trunk between two servers happens and there is some network issue that causes the connection to break. once this happens asterisk gets flooded with these errors and eventually the system becomes unresponsive

{noformat}
socket_process_meta: Received trunked frame before first full voice frame
[2015-10-08 13:18:13] WARNING[23623]: chan_iax2.c:9993 socket_process_meta: Received trunked frame before first full voice frame
[2015-10-08 13:18:14] WARNING[23620]: chan_iax2.c:3642 __attempt_transmit: Max retries exceeded to host XXX.XXX.XXX.XXX on IAX2/dialer-outbound-2552 (type = 6, subclass = 11, ts=140008, seqno=28)
[2015-10-08 13:18:17] WARNING[20761][C-0000047b]: res_srtp.c:407 ast_srtp_unprotect: SRTP unprotect failed with: authentication failure 110
[2015-10-08 13:18:21] WARNING[23622]: chan_iax2.c:3642 __attempt_transmit: Max retries exceeded to host XXX.XXX.XXX.XXX on IAX2/dialer-outbound-2552 (type = 6, subclass = 2, ts=147010, seqno=29)
[2015-10-08 13:18:24] WARNING[23623]: chan_iax2.c:3642 __attempt_transmit: Max retries exceeded to host XXX.XXX.XXX.XXX on IAX2/dialer-outbound-2552 (type = 6, subclass = 11, ts=150007, seqno=30)
   -- Remote UNIX connection
[2015-10-08 13:18:34] WARNING[23617]: chan_iax2.c:3642 __attempt_transmit: Max retries exceeded to host XXX.XXX.XXX.XXX on IAX2/dialer-outbound-2552 (type = 6, subclass = 11, ts=160006, seqno=31)
[2015-10-08 13:18:38] WARNING[1055][C-00000b69]: res_srtp.c:407 ast_srtp_unprotect: SRTP unprotect failed with: authentication failure 110
[2015-10-08 13:18:42] WARNING[23617]: chan_iax2.c:3642 __attempt_transmit: Max retries exceeded to host XXX.XXX.XXX.XXX on IAX2/dialer-outbound-2552 (type = 6, subclass = 2, ts=168010, seqno=32)
[2015-10-08 13:18:44] WARNING[23622]: chan_iax2.c:3642 __attempt_transmit: Max retries exceeded to host XXX.XXX.XXX.XXX on IAX2/dialer-outbound-2552 (type = 6, subclass = 11, ts=170006, seqno=33)
 == WebSocket connection from 'XXX.XX.XXX.XXX:17450' for protocol 'sip' accepted using version '13'
 == Setting global variable 'MEETMEADMINSTATUS' to 'OK'
   -- <SIP/138-00000169> Playing 'conf-kicked.slin' (language 'en')
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2015-10-08 16:52:02.522-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Rusty Newton (rnewton) 2015-10-08 20:14:44.321-0500

Thank you for taking the time to report this bug and helping to make Asterisk better. Unfortunately, we cannot work on this bug because your description did not include enough information. Please read over the Asterisk Issue Guidelines [1] which discusses the information necessary for your issue to be resolved and the format that information needs to be in. We would be grateful if you would then provide a more complete description of the problem. At a minimum, we need:

1. The specific steps or actions you took that caused you to encounter the problem.
2. The behavior you expected and the location of documentation that led you to that expectation.
3. The behavior you actually encountered.

To demonstrate the issue in detail, please include Asterisk log files generated per the instructions on the wiki [2]. If applicable, please ensure that protocol-level trace debugging is enabled, e.g., 'sip set debug on' if the issue involves chan_sip, and configuration information such as dialplan and channel configuration.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

[2] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Bobby Hakimi (bobbymc) 2015-10-09 19:25:30.069-0500

1. The specific steps or actions you took that caused you to encounter the problem:

we have a application that places calls out from asterisk box a, once the calls pick up they get sent to asterisk box b where they get connected to a meetme room. this happens hundred of times every day, once in a while we get this message and asterisk freezes.

2. The behavior you expected and the location of documentation that led you to that expectation:

i expect the iax trunk to connect between both servers and the call to continue as normal

3. The behavior you actually encountered:

eventually asterisk egst flooded with this error messages and it becomes unresponsive. i have to kill -9 the application and restart it. i am how ever able to get into the CLI but no calls are able at that point to be placed.

after those error logs the rest logs turn into this:



[2015-10-09 17:07:50] WARNING[8896] chan_iax2.c: Max retries exceeded to host 208.78.138.41 on IAX2/safesoft-dialer-outbound-2087813841-10078 (type = 6, subclass = 2, ts=567018, seqno=91)
[2015-10-09 17:07:51] WARNING[3086] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[3336] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[13311] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[10115] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[8896] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[8896] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[10115] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[8896] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[3086] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[3336] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[3336] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[3086] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[3336] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[13311] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[13311] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[3336] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[13311] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:51] WARNING[10115] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:52] WARNING[8896] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:52] WARNING[3086] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:52] WARNING[3086] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:52] WARNING[8896] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:52] WARNING[8896] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:52] WARNING[10115] chan_iax2.c: Received trunked frame before first full voice frame
[2015-10-09 17:07:52] WARNING[8896] chan_iax2.c: Received trunked frame before first full voice frame



By: Bobby Hakimi (bobbymc) 2015-10-14 14:42:38.336-0500

i have suspicion that this may be caused due to res_timing_timerfd.so

By: Rusty Newton (rnewton) 2015-10-15 17:10:35.244-0500

You need to provide us with a method to reproduce the issue, or provide us with enough debug to identify the issue simply.

It doesn't appear that you know how to reproduce the issue at will so you'll need to provide us additional debug.

You didn't provide the debug logs requested (instructions at the link in my last comment). Perhaps they are unavailable if you are unable to reproduce the issue.

To make any progress with the issue we'll at least need a packet capture showing all the IAX network traffic at the time of the issue and a debug log (again as described in the instructions) captured at the same time.

Thanks Bobby!

By: Rusty Newton (rnewton) 2015-10-15 17:11:13.641-0500

{quote}
i have suspicion that this may be caused due to res_timing_timerfd.so
{quote}
Additionally if you suspect an issue here, please try a different timer and see how that works. Report back what happens. Thanks!

By: Bobby Hakimi (bobbymc) 2015-10-15 17:19:22.050-0500

Thank you for the response. i will start the logging of this and report next time it happens,  i have a feeling this is related to this bug ASTERISK-25139



By: Bobby Hakimi (bobbymc) 2015-10-15 17:34:45.383-0500

i tried the dahdi timer and i still had a crash today, currently running all debug logs on and waiting for the next crash, i had to set core debug to 3 instead of 5 because the logs were getting so massively big that i was running out of space. hope that is ok

By: Rusty Newton (rnewton) 2015-10-16 13:00:59.713-0500

Yeah debug 3 will probably be fine!

Remember to attach new debug to the issue as files, either .txt, or .pcap where appropriate. You can find the option under "More > Attach Files".

Thanks Bobby.

By: Bobby Hakimi (bobbymc) 2015-10-16 21:11:33.459-0500

i back ported the this fix ASTERISK-25139 to 11.17.1 and so far no crash, i will know for sure by next week Wednesday if this is truly fixed or not.

By: Bobby Hakimi (bobbymc) 2015-10-20 14:42:34.769-0500

this is the full debug log for the crash

this is the error that kept scrolling and ultimately caused asterisk to not respond and lose all audio on all channels:

channel.c: Exceptionally long voice queue length queuing to Local/78600008@vicidial-00000082;2

By: Bobby Hakimi (bobbymc) 2015-10-20 14:45:28.827-0500

after back porting ASTERISK-25139 the crash we got today was one we had gotten before, im not sure if its even related to this error or if i should open a new ticket.

this is the error that kept scrolling which caused all audio channels to drop off and asterisk eventually stopped responding.

[2015-10-20 10:46:28] WARNING[10438][C-000000c9] channel.c: Exceptionally long voice queue length queuing to Local/78600008@vicidial-00000082;2

the attached zip file included a debug log for iax and sip and core set debug to 3. please let me know if there is anything else that is needed. it appears to even several times a day and i have no clue how to generate it on demand.

thank you






By: Bobby Hakimi (bobbymc) 2015-10-20 14:53:49.162-0500

incase it helps, this channel is a monitor inside a meetme room that records the conference call and was hung up on. there is another bug that reported a similar issue but no one was able to reproduce it : ASTERISK-23998

By: Bobby Hakimi (bobbymc) 2015-10-21 19:50:20.982-0500

i just had the same crash with the iax error i described initially. the log debug is 400 mb, even if i zip it i dont think i can upload due to the max upload size, any recommendations?


By: Bobby Hakimi (bobbymc) 2015-10-21 19:59:06.891-0500

look or this channel: IAX2/safesoft-dialer-outbound-2087813836-7728

By: Bobby Hakimi (bobbymc) 2015-10-23 13:43:47.652-0500

i can now confirm all of these issues are caused by meetme becoming unresponsive. im not sure how to troubleshoot why meetme freezes and causes the whole system to freeze. we rely heavily on meetme and im not sure what else to do at this point. please let me know what you like me to do to troubelshoot this issue

By: Bobby Hakimi (bobbymc) 2015-10-26 19:19:39.420-0500

is there a update on this? its been almost a week and asterisk still hangs every day, it seems to be related to the heavy use of meetme or confbridge. i switched the application to use confbridge and i still have the same effect. i either get "Received trunked frame before first full voice frame" and the iax error above or the Local channel error and asterisk drops all audio and becomes unresponsive when in CLI.



By: Bobby Hakimi (bobbymc) 2015-10-27 09:04:44.255-0500

more debug logs

By: Rusty Newton (rnewton) 2015-10-27 19:05:44.652-0500

Bobby any updates will be posted to the issue and you will receive E-mail notification of the issue.

When one of us has time to look into the logs then the status will change.

I'd like to note that massive logs are generally not helpful. If possible try to attach the last few minutes or several thousands lines right before and after the issue occurs. The log messages around that time are what we want to look at.

Be sure to point out *where* in the log file the issue begins. If you cannot reproduce the issue or point out where the issue occurs in the logs then we won't have much to go on. Even with that this issue may be impossible to trace down without the ability to reproduce.

If you believe the issue is a deadlock occurring then be sure to follow the instructions here:

https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock

By: Bobby Hakimi (bobbymc) 2015-10-27 20:50:57.936-0500

i finally found out the cause of these issues. its a deadlock issue and here is a dump of the core show locks resutls. does this help?

By: Bobby Hakimi (bobbymc) 2015-10-29 14:25:20.668-0500

this is a backtrace and show locks of the deadlock that causes this and other issues

By: Bobby Hakimi (bobbymc) 2015-11-03 13:20:58.621-0600

i just attached all debug logs, backtraces and deadlock list needed for this issue. please let me know what else you need.

digium paid support said it was due to a STUN issue, when searching around i saw the same issue on this ticket and the same exact behavior
ASTERISK-24968  



By: Rusty Newton (rnewton) 2015-11-04 09:06:05.013-0600

Since we can't reproduce this issue we'll need as much information as possible. Probably want to go ahead and post your Asterisk configuration files (of course sanitize private data).

By: Rusty Newton (rnewton) 2015-11-04 09:07:11.410-0600

Additionally did Digium support identify the specific issue with STUN?

Edit: I've talked with the developer that looked over this and the issue occurs due to a problem in the underlying library used for ICE/STUN/TURN support for WebRTC.

By: Bobby Hakimi (bobbymc) 2015-11-04 12:00:53.149-0600

do you still need me to post my asterisk configs?

By: Bobby Hakimi (bobbymc) 2015-11-06 02:06:06.547-0600

i got a $5k bounty for anyone that can fix this

By: Bobby Hakimi (bobbymc) 2015-11-11 00:41:28.977-0600

do you know if there is a goal in the future to fix this bug? webrtc worked flawless in asterisk 11 besides this issue.



By: Stefano Fancello (Stell0) 2016-02-02 02:42:59.743-0600

I've got same problem on two boxes, is there any updates on this? Does anyone have a workaround?

By: Bobby Hakimi (bobbymc) 2016-03-10 11:37:42.581-0600

@Stefano Fancello from what i heard digium is not planning on fixing this anytime soon =( also another update, this ICE/STUN/TURN bug is also effecting any extension that registers to the server and makes use of the ICE/STUN server.

i was getting the same deadlocks so i had to remove ICE support completely from asterisk and im keeping a eye out for the deadlocks

By: Stefano Fancello (Stell0) 2016-03-17 05:05:44.490-0500

@Bobby Hakimi I was wrong, my deadlock was ASTERISK-25476, workarounded with a patch

By: Joshua C. Colp (jcolp) 2017-12-18 06:29:12.720-0600

Asterisk 13 now uses upstream PJSIP on the system or using bundled. Work was done in PJSIP to fix some locking problems in this area, as well as some changes in Asterisk.

We're using ICE/STUN/TURN more heavily now and have not seen this. Is it still a problem in a supported version?

By: Asterisk Team (asteriskteam) 2018-01-02 08:30:30.899-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].
[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines