[Home]

Summary:ASTERISK-26853: res_rtp_asterisk: Crash in pjnath when receiving packet
Reporter:Adagio (studioadagio)Labels:
Date Opened:2017-03-10 07:28:30.000-0600Date Closed:2017-04-21 13:12:03
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_rtp_asterisk
Versions:13.14.0 14.2.0 Frequency of
Occurrence
Frequent
Related
Issues:
duplicatesASTERISK-26912 Asterisk Crash on pj_ice_sess_on_rx_pkt
is related toASTERISK-26835 res_rtp_asterisk: Crash when freeing RTCP address string
Environment:Debian jessieAttachments:
Description:Hi
We have a business application that uses both conventional telephony and VoIP.
We use the PJSIP library to make VoIP calls from mobile devices (Android & iOS). On server side we have Asterisk with PJSIP.
Sometimes "Asterisk" process crash with "double free or corruption". This happens shortly after the INVITE transaction was finished (we hear about 0.5s of sound) and only if the call was started on Android device.
We tried to reproduce the crash with other softphones (Zoiper, CSipSimple, Ekiga) and pjsua in CLI but it doesn't crash. Also it doesn't crash when iOS app is used. So, it seems that, the problem is with our Android implementation, but we don't know where to search for the solution.
We tried workarounds from here: ASTERISK-25274
ASTERISK-25275
But nothing worked.
This crash occur once in about 200 calls.
After using Valgrind (valgrind.org) to analyze Asterisk memory, we restart Asterisk and crash is happening more often. Is there a link ?
You will find backtrace and debug in attachments.
We tried Asterisk versions: 13.14 and 14.2
PJSSIP versions: 2.5.5, 2.6
(We tried to change audio codec but nothing changed)
Thanks a lot
Comments:By: Asterisk Team (asteriskteam) 2017-03-10 07:28:31.420-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Joshua C. Colp (jcolp) 2017-03-12 17:57:20.782-0500

Thank you for taking the time to report this bug and helping to make Asterisk better. Unfortunately, we cannot work on this bug because your description did not include enough information. Please read over the Asterisk Issue Guidelines [1] which discusses the information necessary for your issue to be resolved and the format that information needs to be in. We would be grateful if you would then provide a more complete description of the problem. At a minimum, we need:

1. The specific steps or actions you took that caused you to encounter the problem.
2. The behavior you expected and the location of documentation that led you to that expectation.
3. The behavior you actually encountered.

To demonstrate the issue in detail, please include Asterisk log files generated per the instructions on the wiki [2]. If applicable, please ensure that protocol-level trace debugging is enabled, e.g., 'sip set debug on' if the issue involves chan_sip, and configuration information such as dialplan and channel configuration.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

[2] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Adagio (studioadagio) 2017-03-13 09:50:13.190-0500

Thank you Joshua
Here are our answers :
1. The specific steps or actions you took that caused you to encounter the problem.
> Make calls with Android application using PJSIP library

2. The behavior you expected and the location of documentation that led you to that expectation.
> We just expect for no crash of Asterisk

3. The behavior you actually encountered.
> Sometimes Asterisk crashs when he receive a call from Android application using PJSIP Library

In attached file debug.txt
Is it enough or do you need more from us please ?
Thanks a lot

By: Joshua C. Colp (jcolp) 2017-03-13 10:13:41.641-0500

If a crash occurs we also need a backtrace[1] as well as details about the configuration to see what is in use. For example your debug shows that you are using ICE, but you did not mention this initially.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Joshua C. Colp (jcolp) 2017-03-13 11:04:45.703-0500

You will also need to stop marking your log as a "Contribution" or it will not appear without signing the license agreement.

By: Adagio (studioadagio) 2017-03-13 12:30:13.373-0500

I already signed license agreement but it seems it is in pending review.
I attached gdb.txt previously that is my backtrace (https://issues.asterisk.org/jira/secure/attachment/55125/gdb.txt) that contains this :

[Edit by Rusty - removed inline debug, please attach to issue instead]

I don't know what else I could give you...

By: Joshua C. Colp (jcolp) 2017-03-14 05:34:09.687-0500

If you mark an attachment as a contribution (which your logs ARE NOT) then they won't show up until the license agreement is approved which can take some time. You can attach them again not marking them as a contribution and they will instantly appear.

We will also need console output and configuration. As it is your problem isn't with PJSIP, it's with the ICE support.

By: Adagio (studioadagio) 2017-03-15 12:08:42.861-0500

Asterisk configuration

By: Adagio (studioadagio) 2017-03-15 12:11:43.603-0500

Hi Joshua,
In attached files config.tar.gz and message.log
Is "message.log" what you need for console output ?
Tks

By: Rusty Newton (rnewton) 2017-03-17 11:17:38.379-0500

The messages.log looks good thanks.

Please follow: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

Exactly, once you have the backtrace, attach it to the issue with a .txt format.

Be sure to follow the instructions exactly and compile with the requested compiler flags.

The backtrace should be gathered with "gdb -se "asterisk" -ex "bt full" -ex "thread apply all bt" --batch -c core > /tmp/backtrace.txt"

In addition please attach the valgrind output that you mentioned to the issue.

Do NOT mark your attachments as contributions or they will not show up.

By: Adagio (studioadagio) 2017-03-20 04:28:04.068-0500

Hi Rusty
Here are valgrind.txt and backtrace.txt in attached files
Flags activated for compilation :
listes des flags
DONT_OPTIMIZE
COMPILE_DOUBLE
DEBUG_THREAD
LOADABLE_MODULE
BETTER_BACKTRACES
MALLOC_DEBUG
BUILD_NATIVE
OPTIONAL_API

Tks a lot

By: Joshua C. Colp (jcolp) 2017-03-20 18:58:49.705-0500

To confirm the environment - did you build PJSIP yourself? Does the problem occur if you use bundled?

By: Adagio (studioadagio) 2017-03-22 07:49:25.254-0500

Hi Joshua
We tried both : build ourself and not. It changes nothing.
We tested with 2.5.5 and 2.6

By: Richard Mudgett (rmudgett) 2017-03-23 10:27:58.466-0500

From the backtrace it looks like a similar reentrancy cause as issue ASTERISK-26835.

By: Richard Mudgett (rmudgett) 2017-03-27 17:26:24.012-0500

It turns out that this issue and ASTERISK-26835 really are the same issue but just with different aspects of the RTP struct.  When I tried to have separate patches, they became so interdependent that I had to create one patch to fix the reentrancy problems with the RTP struct.

A patch is up for review at https://gerrit.asterisk.org/#/c/5341/  It needs some real-world testing.  I have run it through the testsuite a couple times and done some test calls.

By: Adagio (studioadagio) 2017-03-28 10:42:15.486-0500

Hi Asterisk team,
We tried the patch but we had 2 times a possible deadlock.

We tried the "sip reload" command without returning to the console.
"Core stop now" and "core restart now" don't work either.

We had to kill the Asterisk process to restart it

Attached files :
messages_deadlock
verbose_deadlock
debug_deadlock

By: Richard Mudgett (rmudgett) 2017-03-28 10:48:15.497-0500

In the future please attach text files with a {{.txt}} extension as requested in the issue guidelines wiki page.  JIRA sucks at allowing you to view the file directly without that extension.  Thanks.

By: Adagio (studioadagio) 2017-03-28 10:57:28.900-0500

Ok
Understood... sorry

By: Richard Mudgett (rmudgett) 2017-03-28 13:06:03.341-0500

The logs aren't showing enough information to determine what is going on to identify a deadlock with the patch.  The classic deadlock can be shown by the CLI "core show locks" output as described by [1] with the menuselect compilation flags DONT_OPTIMIZE, DEBUG_THREADS, and BETTER_BACKTRACES enabled.  Along with that output a gcb backtrace [2] is also useful to determine what is going on for a deadlock.

By the way, to which version of Asterisk did you apply the patch?

[1] https://wiki.asterisk.org/wiki/display/AST/CLI+commands+useful+for+debugging
[2] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Adagio (studioadagio) 2017-03-29 08:57:19.204-0500

Hi Richard,
In attached files core-show-locks.txt and backtrace-threads.txt
We did as required in "Getting Information For A Deadlock" to get the files.

About Asterisk version : we downloaded it right from https://gerrit.asterisk.org/#/c/5341/ (download section)
Do you need something else from us ?
Thanks

By: Richard Mudgett (rmudgett) 2017-03-30 15:41:31.126-0500

New patch up on gerrit.  Still at https://gerrit.asterisk.org/#/c/5341/

By: Adagio (studioadagio) 2017-04-03 09:32:29.933-0500

Hi Asterisk team,
We installed your patch and launched almost 30 calls.
Asterisk crashed with a "segfault" error.
We tried again to verify and had the same issue.

Files attached :
backtrace_segfault.txt
debug_segfault.txt
messages_segfault.txt
verbose_segfault.txt

By: Richard Mudgett (rmudgett) 2017-04-03 12:58:29.448-0500

New patch to fix today's segfault up on gerrit.

By: Adagio (studioadagio) 2017-04-04 10:58:46.981-0500

Hi Asterisk team,

We installed your patch and launched almost 20 calls.
Asterisk still crashed with a "segfault" error.

Files attached :
verbose_segfault_2.txt
backtrace_segfault_2.txt
debug_segfault_2.txt

By: Richard Mudgett (rmudgett) 2017-04-04 16:53:26.139-0500

New patch to fix today's segfault up on gerrit.

By: Richard Mudgett (rmudgett) 2017-04-06 13:17:17.446-0500

New patch up on gerrit.  The new patch adds more protection from reinvites restarting ICE negotiations.

By: Richard Mudgett (rmudgett) 2017-04-12 12:35:34.336-0500

Patch version 6 up on gerrit is expected to be merged in a few days (after a merge conflict is resolved).  For those testing the patch, I haven't heard about how well the patch is working for you.

By: Adagio (studioadagio) 2017-04-13 10:41:25.412-0500

Hi Richard,
Thank you for this patch. Unfortuneatly, we didn't test it. Previous one is actually in production but we have troubles with simultaneous calls. No more crashs ! But voice is cut when many calls are done simultaneously.
Actually, our developpers are not able to install the new patch, they are busy until next week. But we will test it on tuesday.
Do you think our actual troubles (voice cut) can be fixed with new patch please ?

By: Richard Mudgett (rmudgett) 2017-04-13 13:18:38.594-0500

I cannot say if the "voice cut" could be fixed by the newer patch version.  The "voice cut" could be caused by something else entirely.  However, it is the last patch version that gets merged and not earlier versions.

By: Adagio (studioadagio) 2017-04-21 12:52:59.923-0500

Hi Asterisk team,
Sorry for delay.
Since we moved to these 2 last patchs we get an awfull voice on some calls.
As I told you, voice of all participants is cut.
Because our customers are on production server, we prefere they get sometimes a crash instead this awful voice.
So we uninstalled and move back to our old buggy version...
In attached files verbose_voice_cut.txt and debug_voice_cut.txt

By: Friendly Automation (friendly-automation) 2017-04-21 13:12:04.127-0500

Change 5342 merged by George Joseph:
rtp_engine/res_rtp_asterisk: Fix RTP struct reentrancy crashes.

[https://gerrit.asterisk.org/5342|https://gerrit.asterisk.org/5342]

By: Friendly Automation (friendly-automation) 2017-04-21 13:12:27.445-0500

Change 5341 merged by George Joseph:
rtp_engine/res_rtp_asterisk: Fix RTP struct reentrancy crashes.

[https://gerrit.asterisk.org/5341|https://gerrit.asterisk.org/5341]

By: Richard Mudgett (rmudgett) 2017-04-21 13:39:48.644-0500

"Voice cut" is so vague as to be meaningless.  Audio quality issues are unlikely to show up in the logs you provided.

* Does the voice stutter or sound choppy?  Do you have DEBUG_THREADS enabled?  DEBUG_THREADS will slow the systems performance so much that you could get a choppy or stuttering voice.  These patches add more locking to protect from reentrancy.
* Does the voice just stop even though the call is still connected?  Simply stopping could be a deadlock but you would also see channels hanging around after the calls are supposed to be gone.  With DEBUG_THREADS enabled you can get "core show locks" output and a backtrace [1].
* Does this happen to all calls?
* Are the calls in a conference (ConfBridge)?  You mentioned participants this time which implies a conference.
* I see in the logs that you are using both chan_sip and chan_pjsip on the same system.  There might be an interaction between these channel drivers.

Also the patch is currently going through the final automatic integration tests to merge.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Friendly Automation (friendly-automation) 2017-04-21 15:47:40.947-0500

Change 5343 merged by George Joseph:
rtp_engine/res_rtp_asterisk: Fix RTP struct reentrancy crashes.

[https://gerrit.asterisk.org/5343|https://gerrit.asterisk.org/5343]

By: Adagio (studioadagio) 2017-04-24 03:38:44.678-0500

Hi Richard,
"voice cut" was how i felt it, it is hatched on both voices. I'll post here an example of sound.
The voice is not stutter or the sound choppy. Yes we hace DEBUG_THREADS enabled.
It doesn't happen on all calls. It seems it happens when there are simultaneous calls, but we can not be sure about that.

About conferences, yes, every call is in conference. Sometimes 2 participants, sometimes more.
But voice is hatched even if there is only 2 participants.

I hope it will help you

By: Asterisk Team (asteriskteam) 2017-04-24 03:38:46.647-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Richard Mudgett (rmudgett) 2017-04-24 10:10:25.211-0500

That audio was choppy and DEBUG_THREADS can cause that because of the higher overhead imposed by the lock tracking.  The higher overhead is why DEBUG_THREADS is not recommended for normal operations.

The patch to fix the original issue has been merged so the issue is closed.