[Home]

Summary:ASTERISK-24875: Randomly get segfaults processing WEBRTC calls with app_konference
Reporter:Jacques Brooks (itsjacques)Labels:
Date Opened:2015-03-13 13:06:34Date Closed:2015-04-02 15:56:04
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_rtp_asterisk
Versions:11.16.0 13.2.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:CentOS LinuxAttachments:( 0) asterisklog.gz
( 1) backtrace.txt
( 2) backtrace2.txt
( 3) badmagiclog
( 4) extensions.conf
( 5) refs.txt
( 6) sip.conf
Description:Asterisk randomly crashes when processing WEBRTC calls. Doesn't seem to be dependent on number of calls currently handling or how long Asterisk is running; have crashed with less than 10 calls and over 200 calls and have crashed with Asterisk running less than 20 minutes and at times when it's been running for hours  Using version 13.2 with sip.conf.  Tried to convert to pjsip.conf but was not very successful (would get only one or two calls up before crashing) so reverted back to sip.conf.
Comments:By: Richard Mudgett (rmudgett) 2015-03-13 13:46:39.117-0500

Thank you for your bug report. In order to move your issue forward, we require a backtrace[1] from the core file produced after the crash. Also, be sure you have DONT_OPTIMIZE enabled in menuselect within the Compiler Flags section, then:

make install

After enabling, reproduce the crash, and then execute the backtrace[1] instructions. When complete, attach that file to this issue report.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace



By: Jacques Brooks (itsjacques) 2015-03-13 15:25:06.164-0500

Backtrace attached. Thanks

By: Jacques Brooks (itsjacques) 2015-03-15 07:57:35.203-0500

Hi Joshua. Not sure if I'm supposed to do something based on your update; are you just indicating the module where the problem lies or you'd like me to attach the res_rtp_asterisk module that we're running?

By: Joshua C. Colp (jcolp) 2015-03-15 08:00:49.585-0500

I was updating the issue with the module, nothing more.

By: Rusty Newton (rnewton) 2015-03-16 13:15:56.241-0500

Please attach both the sip.conf you are using successfully and the pjsip.conf that you used when observing the crash.

If you can reproduce the issue, please grab an Asterisk log captured during the time of the crash.

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

Use "Send Back" or "Enter Feedback" to send the issue back after you have this information. Thanks.

By: Jacques Brooks (itsjacques) 2015-03-17 09:13:30.315-0500

I turned on the required logging but after waiting a couple of hours I shut off the logging when I hadn't received the segfault yet and the log size was close to 8G.  I noticed that at some point I start to get the message "astobj2.c: bad magic number for object 0xfa5910. Object is likely destroyed" spewing out at close to 6,000 times a second. I'm wondering if perhaps this already indicates that the underlying problem has kicked in that will eventually cause the segfault and if maybe I can shut off logging at that point and submit the log or do I need to wait for the segfault to occur?

By: Rusty Newton (rnewton) 2015-03-23 09:09:17.530-0500

If you are not getting a segfault immediately, but you are getting the 'bad magic number' messages, then you may first want to gather some reference count debugging output.

Please read through this guide: https://wiki.asterisk.org/wiki/display/AST/Reference+Count+Debugging and follow the steps under Enabling Reference Count Logs. You can stop Asterisk and process/provide the output of the refs log right after you begin seeing the 'bad magic number' messages.

If you are doing this in Asterisk 13, and not using res_pjsip, then please disable res_pjsip by unloading the relevant modules before following those steps. [How to disable res_pjsip|https://wiki.asterisk.org/wiki/display/AST/Migrating+from+chan_sip+to+res_pjsip#Migratingfromchan_siptores_pjsip-Disablingres_pjsipandchan_pjsip]



By: Rusty Newton (rnewton) 2015-03-23 09:09:39.529-0500

Remember to press 'Enter Feedback' or 'Send Back' once you have the required debug.

By: Jacques Brooks (itsjacques) 2015-03-23 10:32:23.123-0500

I've attached a backtrace, sip.conf and log file. As I alluded to before I'm assuming that the "bad magic" errors indicate that the problem has started to kick in so I've only included the log starting from 10 minutes before the "bad magic" errors kicked in. This time around didn't get the same segfault error but a general protection error - kernel: asterisk[536] general protection ip:7f23f2d97435 sp:7f22eae8a9a8 error:0 in libc-2.12.so[7f23f2c6c000+18a000].  The pjsip modules were not in the modules.conf files so I expected not to see any messages pertaining to pjsip but as you can see in the logs there definitely are pjsip messages; I'm hoping it's not an issue in terms of troubleshooting. You had also requested the pjsip.conf that we used when we tested with pjsip but I haven't attached since the box we tested with got corrupted and the pjsip.conf was lost so would have had to start from scratch for that.

By: Jacques Brooks (itsjacques) 2015-03-23 10:33:07.250-0500

I've attached applicable files

By: Rusty Newton (rnewton) 2015-03-25 15:42:34.884-0500

I consulted with [~jcolp] about the issue. It appears as if the two backtraces may represent two different potential issues.

To further investigate we will need:

* Please disable pjsip. All you have to do is follow the instructions linked in my previous comment.
* The refs debug that was requested previously.
* Your Asterisk dialplan so that we can better understand the call flow.

For the next backtrace, please provide the log and refs debug that correlate.

Again, be sure that you disable pjsip before collecting the next set of debug.

By: Abhay Gupta (agupta) 2015-03-27 03:18:52.258-0500

I saw that you are using app_konference . Can that be the cause of this crash .

By: Jacques Brooks (itsjacques) 2015-03-27 07:29:16.527-0500

We've been using app_konference since our product went live, 4-5 years ago.  Asterisk 13 did need a newer version of app_konference to work properly. With the possibility in mind that it could be app_konference we fell back to asterisk 11.16, which enabled us to use a prior version of app_konference, and that we knew had no crashing issues processing "regular" sip traffic but we still crashed when throwing WEBRTC into the mix  Of course the combination of app_konference and WEBRTC could be the culprit but I'm not expert enough to comment on that (-:

By: Jacques Brooks (itsjacques) 2015-04-02 12:10:30.438-0500

So it does indeed look like it's an app_konference issue when using WEBRTC. We put together a very simple dialplan (as you can see) and we started getting the bad magic errors only when invoking app_konference with WEBRTC.  When we redid the same test using confbridge we didn't get bad magic errors; at least for the simple (and short) test that we did.

Not sure how I'm supposed to proceed now; assuming it's an app_konference issue does this leave your purview and I should try to contact app_konference originator(s)?

By: Matt Jordan (mjordan) 2015-04-02 15:55:42.235-0500

Correct. That module is not supported by the Asterisk project. You'll need to contact that project for any bug fixes.

Sorry we can't help you here - and good luck!

By: Matt Jordan (mjordan) 2015-04-02 15:56:04.288-0500

Closing as "Not a Bug" in Asterisk.