[Home]

Summary:ASTERISK-28839: Sporadic crashes with Segmentation fault
Reporter:Joeran Vinzens (jvinzens)Labels:fax patch
Date Opened:2020-04-16 04:34:50Date Closed:2020-04-22 15:06:07
Priority:MajorRegression?
Status:Closed/CompleteComponents:Core/General
Versions:17.3.0 Frequency of
Occurrence
Frequent
Related
Issues:
is related toASTERISK-28811 Crash occurs when fax session switches from T.38 to audio
Environment:Asterisk 17.3 is build from github branch 17.3 running on Debian busterAttachments:( 0) asterisk17_1:1.0.0+0~20191118130339.4.d388b49+buster_amd64.deb
( 1) asterisk17_1:1.0.0+0~20200407073332.14.c50f0c5+buster_amd64.deb
( 2) asterisk17-dbgsym_1:1.0.0+0~20200407073332.14.c50f0c5+buster_amd64.deb
( 3) ASTERISK-28811-2.diff
( 4) core_asterisk_32124_11_113_1587025086-locks.txt
( 5) core_asterisk_32124_11_113_1587025086-thread1.txt
Description:We are facing crashes (about twice a week on each system) and so far we do not know why this happens. We have now enabled core dumps so we could see something (we are unfortunately the wrong people)

Asterisk is connected to an FastAGI running locally on the same machine and there are some Perl AGI scripts triggered by Asterisk Dialplan.

The PJSIP conf is quite small since we just have a couple of endpoints (Registration and stuff is done by Kamailio (SBC))

Comments:By: Asterisk Team (asteriskteam) 2020-04-16 04:34:51.512-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Joeran Vinzens (jvinzens) 2020-04-16 04:38:57.145-0500

Please find attached the Thread1 from gdb backtrace (seems to contain the crashing thread).

the full and brief logs do contain phone Numbers of our customers. If needed we will provide these but it would be great if this is not necessary. If needed we would appreciate if we can share these data on a more secure way.

By: Joeran Vinzens (jvinzens) 2020-04-16 06:00:28.855-0500

One more thing, we do have one custom patch within the asterisk in order to enable lawful interception. This patch generates a duplication of rtp in case there is an manager event. So far. this patch never had a negative influence on the Asterisk and we havn't had the issues in this frequency before upgrading to Asterisk 17.X . However the patch we are using might cause an issue.


By: George Joseph (gjoseph) 2020-04-16 08:29:50.170-0500

Hi Joeran,

The full backtraces are going to be needed for this issue.  If you can run {{ast_coredumper --tarball-coredumps --no-default-search <path_to_existing_coredump>}} you'll get a tarball that includes the coredump itself, the text output files and the asterisk binaries.  This way we can examine the memory of the process as it was at the time of the crash.

The tarball will probably be quite large and, as you said, would contain sensitive information,  so if you could upload it to Google Drive, Dropbox, etc and send the link to asteriskteam@digium.com that's be great.  Use the subject "ASTERISK-28839 Coredump Files".  Once we download them, we'll let you know and you can remove the file from your hosting service.

Thanks.


By: Kevin Harwell (kharwell) 2020-04-16 17:21:06.166-0500

Also what's the call scenario during a crash? Does it involve fax? Simple end to end call? Something else? Can you attach your pjsip.conf as well?

Thanks!

By: Joeran Vinzens (jvinzens) 2020-04-17 06:33:46.056-0500

As requested I send an email with link to the coredump as well as pjsip.conf attached.

The Call scenario is quite easy. There are only simple calls involved. So no features are used within the call. Call Setup contains some "Set" some "AGI" applications but noting fancy. All calls are simple calls using Dial command.

In terms of Fax, if there is a Fax call involved I havn't seen it. We enabled t.38 forwarding but as far as i investigated I havn't seen anything like this.

In addition we just installed a vanilla Asterisk 17.3 on one of the machines just to ensure if out patch is involved or not. As soon as we have more outcome here i will let you know.

many Thanks!
Jöran

By: George Joseph (gjoseph) 2020-04-17 08:21:02.290-0500

It looks like the zip file you sent was missing the actual coredump itself.  Can you get that to us?

We _think_ that this issue may be related to another crash which [~kharwell] is currently working on.


By: George Joseph (gjoseph) 2020-04-17 08:43:19.356-0500

We've got the actual coredump now.


By: Joshua C. Colp (jcolp) 2020-04-17 12:03:00.981-0500

Unfortunately the binaries provided are stripped, which has made investigating deeply into the core dump not possible. Is it possible to get unstripped ones?

As well so far the two other cases have been narrowed down to T.38, so this may be similar too.

By: Joeran Vinzens (jvinzens) 2020-04-17 12:34:41.225-0500

please find the binaries we used.
Upload contains the deb package which includes all the binaries we compiled and the debug symbols.

By: Joshua C. Colp (jcolp) 2020-04-20 06:17:44.071-0500

Unfortunately the two packages don't match, one appears to be from 2019 and the other this year.

By: Joeran Vinzens (jvinzens) 2020-04-20 06:37:33.592-0500

Sorry. I Uploaded the old Asterisk binary package. Now this is the currently installed one wich fits to the debug symbols already attached

By: Joeran Vinzens (jvinzens) 2020-04-20 09:04:51.976-0500

Hi, we installed on one of our machines a vanilla asterisk and we have encountered a crash. According to your idea we search for a fax involved in the scenario.

The new Stack trace led us to a pbx thread which handled a call where we received a T.38 Fax Re-Invite. Asterisk forwards  the re-Invite to the other call-leg. After this it takes 4.6 sec until the Asterisk crashes and we do not see anything else for the PBX thread.

The incoming Re-Invite contains an SDP:
{code}
v=0
o=user 801993 801994 IN IP4 1.2.3.4
s=call
c=IN IP4 217.10.77.156
t=0 0
m=image 20764 udptl t38
a=T38FaxVersion:1
a=T38MaxBitRate:14400
a=T38FaxTranscodingMMR
a=T38FaxTranscodingJBIG
a=T38FaxRateManagement:transferredTCF
a=T38FaxUdpEC:t38UDPRedundancy
a=T38FaxUdpEC:t38UDPFEC
a=T38FaxMaxDatagram:512
a=sendrecv
a=ptime:20
{code}

The SDP Asterisk is sending out is:

{code}
v=0
o=- 0 3 IN IP4 212.9.44.9
s=sipgate VoIP GW
c=IN IP4 212.9.44.9
t=0 0
m=image 4198 udptl t38
a=T38FaxVersion:1
a=T38MaxBitRate:14400
a=T38FaxTranscodingMMR
a=T38FaxTranscodingJBIG
a=T38FaxRateManagement:transferredTCF
a=T38FaxMaxDatagram:1006
a=T38FaxUdpEC:t38UDPRedundancy
{code}

if you need further details (Backtrace etc.) let us know!

By: Joshua C. Colp (jcolp) 2020-04-20 10:08:23.711-0500

Can you please try the attached patch?

By: Laura Geisen (LGeisen) 2020-04-21 03:18:57.308-0500

Thanks for the quick patch!

We've installed the patched asterisk in our production environment and will report again as soon as we have any news. If we do not encounter any crashes for two full days, we will also update this issue as this would already be an improvement.

Regards,
Laura (a colleague of Jöran)

By: Friendly Automation (friendly-automation) 2020-04-22 10:10:22.219-0500

Change 14274 merged by Friendly Automation:
fax: Fix crashes in PJSIP re-negotiation scenarios.

[https://gerrit.asterisk.org/c/asterisk/+/14274|https://gerrit.asterisk.org/c/asterisk/+/14274]

By: Friendly Automation (friendly-automation) 2020-04-22 10:10:29.303-0500

Change 14299 merged by Friendly Automation:
fax: Fix crashes in PJSIP re-negotiation scenarios.

[https://gerrit.asterisk.org/c/asterisk/+/14299|https://gerrit.asterisk.org/c/asterisk/+/14299]

By: Friendly Automation (friendly-automation) 2020-04-22 10:15:31.826-0500

Change 14298 merged by Joshua Colp:
fax: Fix crashes in PJSIP re-negotiation scenarios.

[https://gerrit.asterisk.org/c/asterisk/+/14298|https://gerrit.asterisk.org/c/asterisk/+/14298]

By: Laura Geisen (LGeisen) 2020-04-27 06:44:08.587-0500

We are pretty happy with the patch and had no crashes on all three patched machines for over 5 days now.

We would consider this fixed from our side and look forward to the patched release.

Thanks again for the quick reaction and fix!

Regards,
Laura

By: Friendly Automation (friendly-automation) 2020-04-30 10:52:19.419-0500

Change 14371 merged by George Joseph:
fax: Fix crashes in PJSIP re-negotiation scenarios.

[https://gerrit.asterisk.org/c/asterisk/+/14371|https://gerrit.asterisk.org/c/asterisk/+/14371]