[Home]

Summary:ASTERISK-21696: Assertion error results in crash in pjproject's ICE worker thread
Reporter:James Mortensen (jmort253)Labels:
Date Opened:2013-04-25 15:43:33Date Closed:2017-12-13 09:16:21.000-0600
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Resources/res_rtp_asterisk
Versions:11.4.0 Frequency of
Occurrence
Occasional
Related
Issues:
is duplicated byASTERISK-22196 Asterisk crashes setting RTP source address to the browser
is related toASTERISK-20762 Asterisk Crash, assertion failed, in res_rtp_asterisk thread (ice_worker_thread)
is related toASTERISK-22889 Segmentation fault when RTP going via ICE
is related toASTERISK-22938 Asterisk Crashes with assert fail for 'ype <= PJ_ICE_CAND_TYPE_RELAYED'
is related toASTERISK-23017 Crash on inbound calls using WebRTC config with ICE Servers -signal 6 abort, while in ice_worker_thread
Environment:EC2 - Ubuntu 12.10 - root@ip-10-188-135-200:/opt/asterisk-11.4.0-rc1/sbin# uname -a Linux ip-10-188-135-200 3.5.0-21-generic #32-Ubuntu SMP Tue Dec 11 18:51:59 UTC 2012 x86_64 x86_64 x86_64 GNU/LinuxAttachments:( 0) ast-crash-logs.txt
( 1) backtrace_dont_optimize.txt
( 2) backtrace_ice.txt
( 3) backtrace.txt
( 4) backtrace2.txt
Description:Asterisk 11.4.0-rc1 crashes on incoming calls. The crash is occasional and doesn't happen for every call.

Below is the output from obtaining the backtrace as found here: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

[Edit by Rusty Newton - removed inline backtrace - please don't paste debug inline. Always attach as a separate file.
https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines ]
Comments:By: James Mortensen (jmort253) 2013-04-25 15:45:48.108-0500

Attached is the backtrace itself.

By: Rusty Newton (rnewton) 2013-04-29 19:40:17.694-0500

How often does the crash occur, is it fairly easy to reproduce?

Did you recompile with DONT_OPTIMIZE and BETTER_BACKTRACES before gathering the backtrace?

If you can reproduce can you also provide an Asterisk full log (VERBOSE and DEBUG) showing whats happening right before the crash occurs? https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information


By: James Mortensen (jmort253) 2013-04-29 23:52:30.062-0500

Hi Rusty,

It doesn't take long to replicate the crash. I just made a few calls inbound and Asterisk crashed. I verified that BETTER_BACKTRACES and DONT_OPTIMIZE were both enabled in make menuselect.

Attached is the last portion of the full log before the crash, with debug and verbose both enabled and set to 11.

I'll work on another backtrace extraction from the latest core dump and will attach that later.


By: James Mortensen (jmort253) 2013-04-30 00:02:17.791-0500

Attached is the backtrace for the same crash.  One question, when running the first command from the gdb tool, is it normal to see this output:

root@ip-10-188-135-200:/opt/asterisk-11.4.0-rc1/etc/asterisk# gdb -se "asterisk" -ex "bt full" -ex "thread apply all bt" --batch -c core > /tmp/backtrace.txt

warning: Can't read pathname for load map: Input/output error.

Hope this helps

By: Rusty Newton (rnewton) 2013-05-01 17:36:55.685-0500

attaching the ast-crash-logs.rtf as .txt

By: James Mortensen (jmort253) 2013-05-01 17:40:33.729-0500

Hi Rusty,

Does that second backtrace have everything you need? I'm afraid I'm not 100% sure what they're supposed to look like and am happy to keep working on this if you need more information.

By: Rusty Newton (rnewton) 2013-05-01 18:02:39.896-0500

Looks like it does. Thank you. I'll acknowledge it from here and it'll go in the queue. It also looks like it may be similar to ASTERISK-20762 .. but more experienced eyes will have to determine that.  If the developers need anything else they'll post on the issue. I'll link this one with ASTERISK-20762.

By: Matt Jordan (mjordan) 2013-05-01 21:31:59.062-0500

As an aside, since this is occurring in pjproject's ICE worker thread, you can probably prevent the crashes by disabling ICE support {{rtp.conf}}.

By: James Mortensen (jmort253) 2013-05-01 21:53:54.817-0500

Hi Matt,

That's good to know. Unfortunately, we're using WebRTC to make calls from my cell phone to a Google Chrome app. ICE is a necessary component to traverse the NAT, so unless I'm just missing something, we won't be able to disable NAT.

Just out of curiosity, I disabled icesupport, and I get one-way audio TO Chrome FROM Asterisk, because Chrome doesn't receive any ICE candidates from Asterisk to let Chrome know where to send audio. This might be more than you need to know, but it might give you more of an idea of what's going on.

Hope this helps!

By: Matt Jordan (mjordan) 2013-05-01 22:29:44.615-0500

Yup, if you're using SIP over WebSockets you will need the ICE support provided by pjproject. Some folks will have it enabled without really needing it, so it was worth a shot.

By: James Mortensen (jmort253) 2013-05-03 19:52:05.341-0500

As an experiment, I tried replacing pjproject 2.0 with pjproject 2.1 from this Asterisk fork of PJSIP:  https://github.com/asterisk/pjproject  To get it to build, I had to copy over all of the Makefiles from the 2.0 version, copy over the .mak files, and make a small change to res_rtp_asterisk.c that modifies the call to pj_ice_sess_create to account for an additional parameter.

After two or three inbound calls, Asterisk crashed, just like with pjproject 2.0.  I'm not sure if this is helpful to you or not, but if a backtrace from this experiment will help at all, please let me know and I'll put another one together.

By: Pedro Howat (phowat) 2013-10-07 11:29:36.250-0500

This is in no way a fix for this but I was able to prevent these crashes with no apparent side effect by simply commenting the assertion in pjlib. I'm using this patch until I can dig into this code a little deeper and understand the real issue.

{code}
--- /tmp/asterisk-11.5.1/res/pjproject/pjlib/src/pj/timer.c     2012-07-01 14:28:57.000000000 -0300
+++ pjproject/pjlib/src/pj/timer.c      2013-10-02 13:36:26.000000000 -0300
@@ -460,7 +460,7 @@
    pj_time_val expires;

    PJ_ASSERT_RETURN(ht && entry && delay, PJ_EINVAL);
-    PJ_ASSERT_RETURN(entry->cb != NULL, PJ_EINVAL);
+    //PJ_ASSERT_RETURN(entry->cb != NULL, PJ_EINVAL);

    /* Prevent same entry from being scheduled more than once */
    PJ_ASSERT_RETURN(entry->_timer_id < 1, PJ_EINVALIDOP);
{code}

By: Vytis Valentinavičius (xytis) 2013-12-03 12:41:06.403-0600

To avoid the PJSIP asserting out of asterisk, simply build the pjproject with 'CFLAGS=-DNDEBUG'. Yet this is only a temporary fix, just as the patch above.
This can be done using 'user.mak' in pjproject directory.

By: Andrea Suisani (s1ckpig) 2014-01-31 10:35:56.981-0600

I'm experiencing the same problem as i've described in ASTERISK-23225. The asterisk version I'm currently using is 11.7.0 and even
if I've applied the patch you've suggested we were able to crash the server in production again, with a different backtrace though
(segfault instead of sig abort).

I've now recompiled the binary without compiler optimizations, next time it'll crash I can attach the full bt here. For now I'm going
to attach only the one that I got with compiler optimization on.

By: Andrea Suisani (s1ckpig) 2014-02-04 05:19:54.233-0600

Attached backtrace generated by asterisk compiled with DONT_OPTIMIZE set to true

By: Andrea Suisani (s1ckpig) 2014-03-05 07:49:03.440-0600

Just wanna say that the problem is still present in the just released asterisk 11.8.0

It seems that the crash/abort happens less frequently though.

The stack trace look the same. As soon as I have the chance I'll attach a new one just for confirmation.

By: Corey Farrell (coreyfarrell) 2017-12-13 09:16:21.110-0600

We no longer embed pjproject the way we did in Asterisk 11.  pjproject can be a bit over aggressive with assertions so they should only be enabled during development/testing.  In Asterisk 13+ when you use bundled pjproject assertions are not enabled by default.

----

Per the Asterisk versions page \[1\], the maintenance (bug fix) support for the Asterisk branch you are using has ended. For continued maintenance support please move to a supported branch of Asterisk. After testing with a supported branch, if you find this problem has not been resolved, please open a new issue against the latest version of that Asterisk branch.

Thanks!

\[1\] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions