[Home]

Summary:ASTERISK-28811: Crash occurs when fax session switches from T.38 to audio
Reporter:Alexey Vasilyev (vasilevalex)Labels:fax patch
Date Opened:2020-04-07 02:13:02Date Closed:2020-04-22 10:10:23
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:pjproject/pjsip
Versions:16.9.0 Frequency of
Occurrence
One Time
Related
Issues:
is related toASTERISK-28839 Sporadic crashes with Segmentation fault
is related toASTERISK-28846 stream: Enforce formats immutability
Environment:CentOS Linux release 7.7.1908 (Core) 3.10.0-1062.4.3.el7.x86_64Attachments:( 0) ASTERISK-28811-2.diff
( 1) cisco-pbx.txt
( 2) core.28811.tar.gz
( 3) crash1-backtrace.txt
( 4) crash1-sip-trace.txt
( 5) crash2-backtrace.txt
( 6) crash2-sip-trace.txt
( 7) fax_491.txt
( 8) pbx-fax.txt
( 9) pjsip.conf
(10) sip-flow-488.txt
(11) sip-trace-488.txt
Description:During sending fax from Cisco SPA112 device through several Asterisk servers, latest updated server is crashed (Asterisk 16.9.0). But we can't reproduce crash, as sometimes faxes send fine, from other Cisco SPA112 devices faxes just stopped sending (receiving works fine). After downgrade to 16.8.0 everything works fine again.
Comments:By: Asterisk Team (asteriskteam) 2020-04-07 02:13:03.153-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Alexey Vasilyev (vasilevalex) 2020-04-07 03:05:02.801-0500

Backtraces

By: George Joseph (gjoseph) 2020-04-07 10:39:40.082-0500

Did this issue happen with earlier versions of Asterisk?  If not, can you pinpoint which version the issue first appeared?


By: Alexey Vasilyev (vasilevalex) 2020-04-07 12:55:41.764-0500

This first happened in version 16.9.0. Downgrading to 16.8.0 fixed the issue. The problem happened in function sip_session_refresh() in the file /usr/lib64/asterisk/modules/res_pjsip_session.so which was significantly modified from 16.8.0 to 16.9.0.

By: Kevin Harwell (kharwell) 2020-04-13 13:55:57.792-0500

[~vasilevalex], could you attach your pjsip.conf file, or at least the endpoint configurations for the involved parties, along with the relevant dialplan. Also please attach an Asterisk debug log with SIP tracing enabled [1] of a good run of the scenario (one where Asterisk does not crash), and if possible (although might be hard due to the sporadic nature of the problem) similar logging for when a bad run (where Asterisk does crash).

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

Thanks!

By: Alexey Vasilyev (vasilevalex) 2020-04-14 06:58:45.270-0500

Two endpoints. Call was from pbx to fax. Due to legal reasons, I can't attach dial plan and debug log, but there is nothing special there.

By: Kevin Harwell (kharwell) 2020-04-14 11:13:28.253-0500

What about a pcap then of a "good run"? Seeing the expected call flow, and SDP's would probably be helpful. I might then be able to setup a test on my end to replicate.

I understand there are some things that you can't make public. If you can't attach it here would it be a problem to email that information to asteriskteam@digium.com? We could then attach it to our associated, and restricted (non public) internal issue..

By: Alexey Vasilyev (vasilevalex) 2020-04-15 08:36:21.255-0500

I attached SIP traces, that were made, when Asterisk 16.9.0 was running and later we had the crash. cisco-pbx.txt - SIP call from Cisco SPA112 to first Asterisk server (pbx1) in the chain. Then the call goes to Asterisk 16.9.0, then to another pbx2 and then to fax-server. pbx-fax is SIP trace for the same call, but for the last leg. I don't know if it can help - but all these calls were failed, Cisco could not send fax. When we downgraded server in the middle to 16.8.0. all the faxes start working again. And it looks, like the similar call crashed 16.9.0

By: Joshua C. Colp (jcolp) 2020-04-20 07:05:33.131-0500

Can you please try applying the attached patch and retrying your faxing.

By: Joshua C. Colp (jcolp) 2020-04-20 10:08:05.575-0500

Here is a slightly updated version.

By: Alexei Gradinari (alexei gradinari) 2020-04-20 11:37:11.848-0500

We observed the same or related crashes with 2 different types off-nominal re-negotiation.

1. 491 Another INVITE transaction in progress
files: crash1-sip-trace.txt, crash1-backtrace.txt

2. 488 Not Acceptable Here
files: crash2-sip-trace.txt, crash2-backtrace.txt

All files were edited to remove private information about ip-addresses, usernames and caller ids, so the length values are incorrect.


By: Joshua C. Colp (jcolp) 2020-04-20 11:42:22.151-0500

Are these before the current patch that is up, or with the patch applied?

By: Kevin Harwell (kharwell) 2020-04-20 11:43:54.817-0500

[~alexei gradinari] Is that with or without the attached patch ([^ASTERISK-28811-2.diff]) applied?

By: Alexei Gradinari (alexei gradinari) 2020-04-20 12:02:35.866-0500

Without patch.
I'm compiling the asterisk with this patch right now, will run patched asterisk and let you know about further results.
I uploaded 2 SIP traces and backtraces so you  know that there are 2 places to crash.


By: Joshua C. Colp (jcolp) 2020-04-20 12:11:31.628-0500

The core of the issue is the same for both.

By: Alexey Vasilyev (vasilevalex) 2020-04-20 13:04:40.766-0500

Thanks. I'll try to test with patch tomorrow

By: Alexei Gradinari (alexei gradinari) 2020-04-20 14:12:49.244-0500

With patch in case "2. 488 Not Acceptable Here" there isn't crash,
but the asterisk did not send re-INVITE with sdp on 488.


By: Joshua C. Colp (jcolp) 2020-04-20 14:18:09.996-0500

I'm not sure what you mean by that. Can you clarify further what you are expecting/what is seen in previous versions in comparison to this?

By: Alexei Gradinari (alexei gradinari) 2020-04-20 14:46:08.697-0500

The FAX1 sends INVITE (VOICE) to asterisk, asterisk sends INVITE to FAX2 (VOICE).
The FAX2 replies with 200 to asterisk, asterisk replies with 200 to FAX1.

The FAX1 detects fax tone and sends re-INVITE (T.38) to asterisk, asterisk sends re-INVITE (T.38) to FAX2.
The FAX2 replies 488 to asterisk, asterisk replies 488 to FAX1.
The FAX1 switches back to voice as T.38 not supported and sends re-INVITE (VOICE) to asterisk, asterisk DOES NOT send re-INVITE (VOICE) to FAX2.

The FAX1 sends BYE to asterisk, the asterisk send BYE to FAX2.
The FAX2 replies 481 Call Leg/Transaction Does Not Exist to asterisk... I think because the asterisk didn't switch to VOICE.

In version 16.9.0 without patch the asterisk always crached on 488, my files crash2-backtrace.txt and crash2-sip-trace.txt.
I didn't check this scenario with version 16.8.0.

The "481 Call Leg/Transaction Does Not Exist " bothers me in this scenario.
I think the asterisk should switch to voice after 488.
But may be this is not related issue.


By: Joshua C. Colp (jcolp) 2020-04-20 14:51:12.777-0500

I don't believe that issue is related, and switching to voice in that scenario isn't required and I don't believe Asterisk has ever done so. This is because when a re-INVITE Is sent the previous SDP negotiation and state is kept and only replaced if it was successful. This means that if you send a re-INVITE and it receives a 488 then things continue on, as if the re-INVITE was never attempted.

"During the session, either Alice or Bob may decide to change the
  characteristics of the media session.  This is accomplished by
  sending a re-INVITE containing a new media description.  This re-
  INVITE references the existing dialog so that the other party knows
  that it is to modify an existing session instead of establishing a
  new session.  The other party sends a 200 (OK) to accept the change.
  The requestor responds to the 200 (OK) with an ACK.  If the other
  party does not accept the change, he sends an error response such as
  488 (Not Acceptable Here), which also receives an ACK.  However, the
  failure of the re-INVITE does not cause the existing call to fail -
  the session continues using the previously negotiated
  characteristics."

The 481 would be concerning, but I don't think anything that has been done would have changed anything there.

By: Alexei Gradinari (alexei gradinari) 2020-04-21 16:49:05.366-0500

[~jcolp],
I applied today both your latest patches: "fax: Fix crashes in PJSIP re-negotiation scenarios." and "stream: Enforce formats immutability and ensure formats exist.".
I can confirm there no more  crashes and even no more 481 on BYE.


By: Alexey Vasilyev (vasilevalex) 2020-04-22 04:23:14.595-0500

I've applied only one patch ASTERISK-28811-2.diff We have tested the same scenario. Faxes from Cisco SPA works fine now.
In the test we used the same device, and before patch we could not send any faxes at all.
Thanks!

By: Joshua C. Colp (jcolp) 2020-04-22 04:51:28.756-0500

Glad to hear it everyone!

By: Friendly Automation (friendly-automation) 2020-04-22 10:10:23.672-0500

Change 14274 merged by Friendly Automation:
fax: Fix crashes in PJSIP re-negotiation scenarios.

[https://gerrit.asterisk.org/c/asterisk/+/14274|https://gerrit.asterisk.org/c/asterisk/+/14274]

By: Friendly Automation (friendly-automation) 2020-04-22 10:10:27.916-0500

Change 14299 merged by Friendly Automation:
fax: Fix crashes in PJSIP re-negotiation scenarios.

[https://gerrit.asterisk.org/c/asterisk/+/14299|https://gerrit.asterisk.org/c/asterisk/+/14299]

By: Friendly Automation (friendly-automation) 2020-04-22 10:15:33.032-0500

Change 14298 merged by Joshua Colp:
fax: Fix crashes in PJSIP re-negotiation scenarios.

[https://gerrit.asterisk.org/c/asterisk/+/14298|https://gerrit.asterisk.org/c/asterisk/+/14298]

By: Alexei Gradinari (alexei gradinari) 2020-04-29 15:01:04.476-0500

[~jcolp],

I was able to catch the case "491 Another INVITE transaction in progress" with version 16.10.0-rc2 (file fax_491.txt)
The good news - the asterisk wasn't crashed.
The bad news - T.38 re-Invite failed.

Should I open a new issue?

By: Joshua C. Colp (jcolp) 2020-04-29 15:49:53.125-0500

Yes, that would be a separate unrelated issue.

By: Friendly Automation (friendly-automation) 2020-04-30 10:52:22.853-0500

Change 14371 merged by George Joseph:
fax: Fix crashes in PJSIP re-negotiation scenarios.

[https://gerrit.asterisk.org/c/asterisk/+/14371|https://gerrit.asterisk.org/c/asterisk/+/14371]