[Home]

Summary:ASTERISK-27666: chan_sip: Crash processing CANCEL request
Reporter:Leandro Dardini (ldardini)Labels:patch
Date Opened:2018-02-11 00:59:52.000-0600Date Closed:2018-02-13 08:01:11.000-0600
Priority:MinorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:13.19.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:CentOS 6.9 64bitAttachments:( 0) core.srv02-2018-02-11T07-50-18+0100-brief.txt
( 1) core.srv02-2018-02-11T07-50-18+0100-full.txt
( 2) core.srv02-2018-02-11T07-50-18+0100-locks.txt
( 3) core.srv02-2018-02-11T07-50-18+0100-thread1.txt
( 4) core.srv02-2018-02-11T17-53-10+0100-brief.txt
( 5) core.srv02-2018-02-11T17-53-10+0100-full.txt
( 6) core.srv02-2018-02-11T17-53-10+0100-locks.txt
( 7) core.srv02-2018-02-11T17-53-10+0100-thread1.txt
( 8) crash-27666.pcap
( 9) crashlog.pcap
(10) debug_log_27666
(11) jira_asterisk_27666_v13.19.0.patch
Description:I usually log the HANGUPCAUSE when the Dial command doesn't succeed. I discovered, with one particular provider, it crashes asterisk. Here the simple asterisk dialplan:
{noformat}
       9999 => {
            Dial(SIP/999383371137@crashprovider);
            NoOp(This is CallEnd - DIALSTATUS is ${DIALSTATUS} - HANGUPCAUSE is ${HANGUPCAUSE});
            }
{noformat}

In attach, you'll find the crash report and the pcap for the traffic with the server. The call was made from extension 103 (103-DEVEL)

Comments:By: Asterisk Team (asteriskteam) 2018-02-11 00:59:53.371-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Richard Mudgett (rmudgett) 2018-02-11 09:52:25.549-0600

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

From the pcap:
Who is 138.201.140.216?
Who is 81.92.186.135?
Looking at the SDP, both sides seem to be Asterisk 13.19.0

Asterisk crashed trying to process a CANCEL.  From the pcap 138.201.140.216 seems to be your Asterisk.

A debug log of the whole call from the crashed Asterisk would be useful to know what Asterisk was thinking before the crash.


Please be aware that there is not active maintainer of chan_sip.  It is wholly supported by the community and response times will reflect that.

By: Leandro Dardini (ldardini) 2018-02-11 11:00:53.794-0600

I feel so sorry when I submit an issue and you remind me I have not given you all details or followed all required steps. I am fully aware of the chan_sip age and maintenance state. Hoping to do better on the second try:

IP 138.201.140.216 is my development box. Running Asterisk 13.19.0.
IP 81.92.186.135 is my provider PBX. I learn now it is running Asterisk too.

This time I run the call from Extension 104.

I attach the core dump log, the pcap for this call and the debug log

By: Richard Mudgett (rmudgett) 2018-02-11 15:00:34.616-0600

Looking at the pcaps.  There appears to be two things wrong here.  The provider is botching the SIP transaction and we are crashing on their CANCEL.

When we send the INVITE to the provider, they respond with a 100 trying followed by a reINVITE.  They should not be sending us a reINVITE.  The reINVITE is too early since they have not sent a final response for our INVITE yet.  We rightly respond to their reINVITE with a 481 call does not exist which they ACK.  They then repeatedly try new reINVITE transactions without completing them.  When they give up, they send a CANCEL and we crash.  They then send us a 480 unavailable final response to the original INVITE.  The 480 response shows that they didn't misdirect a final response before they sent us a reINVITE.

By: Richard Mudgett (rmudgett) 2018-02-11 15:33:30.001-0600

[^jira_asterisk_27666_v13.19.0.patch] - This does the minimum to prevent the crash and may be all that is really needed.  I haven't setup a SIPp scenario to confirm.

Apply {{patch -p1 -i jira_asterisk_27666_v13.19.0.patch}} to see if it fixes the crash.  We cann't do anything about the provider messing up the SIP transaction but we shouldn't crash as a result.

By: Leandro Dardini (ldardini) 2018-02-12 09:45:46.815-0600

I confirm the patch fixes the issue.

By: Friendly Automation (friendly-automation) 2018-02-13 08:01:12.892-0600

Change 8202 merged by Jenkins2:
chan_sip.c: Fix crash processing CANCEL.

[https://gerrit.asterisk.org/8202|https://gerrit.asterisk.org/8202]

By: Friendly Automation (friendly-automation) 2018-02-13 08:09:20.842-0600

Change 8201 merged by Jenkins2:
chan_sip.c: Fix crash processing CANCEL.

[https://gerrit.asterisk.org/8201|https://gerrit.asterisk.org/8201]

By: Friendly Automation (friendly-automation) 2018-02-13 08:10:44.049-0600

Change 8200 merged by Jenkins2:
chan_sip.c: Fix crash processing CANCEL.

[https://gerrit.asterisk.org/8200|https://gerrit.asterisk.org/8200]