[Home]

Summary:ASTERISK-25371: Crash in hangup at chan_pjsip.c:1749 when Asterisk attempts to generate hangup event
Reporter:Abhay Gupta (agupta)Labels:pjsip
Date Opened:2015-09-03 21:02:09Date Closed:2017-12-22 00:29:13.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_pjsip
Versions:13.5.0 16.4.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Linux ubuntu 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/LinuxAttachments:( 0) 8sep.txt
( 1) first.txt
( 2) fourth.txt
( 3) full.txt
( 4) full.txt
( 5) putty1.log
( 6) second.txt
( 7) succ_fail.txt
( 8) third.txt
Description:Asterisk frequently crashes in hangup function of chan_pjsip.c at line chan_pjsip.c:1749

#0  0x00007f8413ab2d42 in hangup (data=0x7f84840b17e8) at chan_pjsip.c:1749

and it looks like that channel is NULL

#0  0x00007f8413ab2d42 in hangup (data=0x7f84840b17e8) at chan_pjsip.c:1749
       h_data = 0x7f84840b17e8
       ast = 0x7f8484003cb8
       channel = 0x0
       pvt = 0x7f8448905bb0
       session = 0x7f848402a2e0
       cause = 0

All the coredumps with bt , bt full and thread apply all bt is attached

Comments:By: Asterisk Team (asteriskteam) 2015-09-03 21:02:10.818-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Abhay Gupta (agupta) 2015-09-03 21:04:24.458-0500

Few core dumps are attached , which have come at different time but all on the same function .

By: Abhay Gupta (agupta) 2015-09-03 21:37:12.775-0500

The issue is tracked from full log in the following area .

A incoming call comes on chan_pjsip , it has a Goto command specified in dialplan.
The extension in goto does not exist .
It tries to look for invalid handler which is non existent .

Asterisk core dumps with the attached core dumps
{noformat}
Sep  4 03:53:14 ubuntu kernel: [205663.026837] asterisk[17520]: segfault at 0 ip 00007f0c6a127d42 sp 00007f0c62388c90 error 4 in chan_pjsip.so[7f0c6a11e000+12000]
[Sep  4 03:53:14] VERBOSE[20121][C-00002afd] pbx.c: Executing [s@default:1] Goto("PJSIP/voipountna-00003f46", "8312631566,1") in new stack
[Sep  4 03:53:14] VERBOSE[20121][C-00002afd] pbx.c: Goto (default,8312631566,1)
[Sep  4 03:53:14] WARNING[20121][C-00002afd] pbx.c: Channel 'PJSIP/voipountna-00003f46' sent to invalid extension but no invalid handler: context,exten,priority=default,8312631566,1
[Sep  4 03:53:19] Asterisk 13.5.0 built by root @ ubuntu on a x86_64 running Linux on 2015-09-03 04:36:50 UTC
{noformat}

By: Abhay Gupta (agupta) 2015-09-03 21:38:06.769-0500

Full.log snippets exactly at the time of segmentation fault .

By: Abhay Gupta (agupta) 2015-09-04 22:28:39.991-0500

SIP signalling of the call causing crash

{noformat}
[Sep  5 00:42:38] VERBOSE[11248] res_pjsip_logger.c: <--- Received SIP request (837 bytes) from UDP:176.227.212.66:5060 --->
INVITE sip:e3ce91b2-5367-44e3-a230-be93b954cf7f@103.19.196.156:5060 SIP/2.0^M
Via: SIP/2.0/UDP 176.227.212.66:5060;branch=z9hG4bK26f1ae264cbffa4a^M
From: <sip:15125356188@176.227.212.66>;tag=7f89a1ad4d2bc17d^M
To: "5127176649" <sip:5127176649@103.19.196.156>;tag=1be061f8-3400-4c45-9b5e-38ee93715b1c^M
User-Agent: VOS3000 V2.1.2.0^M
CSeq: 1 INVITE^M
Call-ID: 42d44cbb-c965-4ed3-b2d1-17a0acdadaf0^M
Contact: <sip:15125356188@176.227.212.66:5060>^M
Max-Forwards: 70^M
Allow: INVITE, ACK, CANCEL, BYE, OPTIONS, INFO, UPDATE, PRACK^M
Supported: timer^M
Session-Expires: 1800;refresher=uac^M
Content-Length: 208^M
Content-Type: application/sdp^M
^M
v=0^M
o=- 1441395703 1441395704 IN IP4 67.23.255.10^M
s=VOS3000^M
c=IN IP4 67.23.255.10^M
t=0 0^M
m=audio 10306 RTP/AVP 0 101^M
a=rtpmap:0 PCMU/8000^M
a=rtpmap:101 telephone-event/8000^M
a=fmtp:101 0-15^M
a=sendrecv^M

[Sep  5 00:42:39] VERBOSE[12783][C-00006d2d] pbx.c: Spawn extension (default, s, 2) exited non-zero on 'PJSIP/voipout-0000b429'
[Sep  5 00:42:39] VERBOSE[9649] res_pjsip_logger.c: <--- Transmitting SIP request (444 bytes) to UDP:176.227.212.66:5060 --->
CANCEL sip:15125356188@176.227.212.66:5060 SIP/2.0^M
Via: SIP/2.0/UDP 103.19.196.156:5060;rport;branch=z9hG4bKPjdde2cc56-c4e1-4fed-9db3-abfb68e5a0a9^M
From: "5127176649" <sip:5127176649@103.19.196.156>;tag=1be061f8-3400-4c45-9b5e-38ee93715b1c^M
To: <sip:15125356188@176.227.212.66>^M
Call-ID: 42d44cbb-c965-4ed3-b2d1-17a0acdadaf0^M
CSeq: 19431 CANCEL^M
Reason: Q.850;cause=16^M
Max-Forwards: 70^M
User-Agent: Asterisk PBX 13.5.0^M
Content-Length:  0^M
^M
{noformat}

By: Rusty Newton (rnewton) 2015-09-05 10:23:56.189-0500

Thanks for all of the data you have provided so far.

In addition can you provide a full log including the "DEBUG" log channel type? It would be useful to see exactly what is happening right up until the time of the crash. Please provide a backtrace along with that crash and be sure that it has BETTER_BACKTRACES and DONT_OPTIMIZE of course.

Oh! Please include pjsip logger output in the same log as well.

Thanks again!



By: Abhay Gupta (agupta) 2015-09-07 20:02:21.609-0500

Attached is full debug log at the time of crash and also the gdb with bt , bt full and thread apply all bt

By: Abhay Gupta (agupta) 2015-09-07 22:18:28.897-0500

I have tried to trace the complete log of the call which is expected to be the cause of crash . This is what was noticed

1. The call was originated , it led to dial timeout
2. The call was transferred to extension i,1 which has Hangup application
3. OrinigateResponse event with failure comes
4. SoftHangupRequest event is created
5. It creates sip CANCEL event and sends that
6. Debug message is pjsip:         endpoint ...Request msg CANCEL/cseq=8486 (tdta0x7ffe44021e30): skipping target resolution because address is already set
7. There is no Manager event Hangup and asterisk crashes in trying to generate Hangup event

By: Abhay Gupta (agupta) 2015-09-07 22:51:30.662-0500

The file has on top debug message of CANCEL with proper hangup manager event and when it was working fine  .

On bottom is failure message where channel seems to have changed with a new Invite message coming . The channel on which CANCEL is supposed to go is PJSIP/novanetpharma1-0000f60c where in full log channel PJSIP/2026-0000f674 appears and a crash thereafter .

By: Rusty Newton (rnewton) 2015-09-08 15:52:26.899-0500

Thank you for all the additional detail. We'll let you know if we need any additional information, but in the meantime feel free to attach anything helpful else you find.

By: Abhay Gupta (agupta) 2017-12-22 00:23:44.920-0600

This problem is not seen in the latest asterisk versions 13.18.4

By: Abhay Gupta (agupta) 2019-06-04 01:35:16.434-0500

Got the same issue again with latest asterisk . Seems some problem is there . Just wondering in chan_pjsip.c in static int hangup why there is no check for NULL

struct hangup_data *h_data = data;
       struct ast_channel *ast = h_data->chan;
       struct ast_sip_channel_pvt *channel = ast_channel_tech_pvt(ast);
       struct ast_sip_session *session = channel->session;
       int cause = h_data->cause;

what could be the reason for channel to be NULL in which case we get segfault when we try to get channel->session from a NULL pointer ?

By: Asterisk Team (asteriskteam) 2019-06-04 01:35:17.237-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Chris Savinovich (csavinovich) 2019-06-10 16:54:03.556-0500

Hello Abhay. Since this issue was originally reported so long ago (4 years) and the traces are old, it is best if we tackle it fresh. Meaning new logs. Would you please submit new versions of all relevant traces. A step by step description of how to recreate it will help (I see there is one description from 2015 but could you please review to make sure I am able to replicate it with a current version). It is better if you open a new issue, and indicate on its description the number of this issue (ASTERISK-25371) so that we can link them in the future.
Thank you.
Chris


By: Abhay Gupta (agupta) 2019-06-11 01:47:31.733-0500

Since the issue remains the same and being an issue with race condition of threads this issue crops up only once a while in a month or two and it is difficult to generate and so there is no point in creating a fresh issue since i know the race condition that leads  to this issue .

I have submitted the patch for asterisk latest version since it is a mistake in the code in hangup function of all versions which tries to access members of a structure without checking if they exist and tries to free them as well .  The patch ensures that it checks the channel and session data structure before trying to access it which will solve this issue .

By: Abhay Gupta (agupta) 2019-06-11 01:56:54.334-0500

Pls assign this issue to me .

By: Friendly Automation (friendly-automation) 2019-06-12 08:52:04.028-0500

Change 11448 merged by George Joseph:
chan_pjsip.c: Check for channel and session to not be NULL in hangup

[https://gerrit.asterisk.org/c/asterisk/+/11448|https://gerrit.asterisk.org/c/asterisk/+/11448]

By: Friendly Automation (friendly-automation) 2019-06-12 08:52:12.833-0500

Change 11447 merged by George Joseph:
chan_pjsip.c: Check for channel and session to not be NULL in hangup

[https://gerrit.asterisk.org/c/asterisk/+/11447|https://gerrit.asterisk.org/c/asterisk/+/11447]

By: Friendly Automation (friendly-automation) 2019-06-12 08:52:25.750-0500

Change 11444 merged by George Joseph:
chan_pjsip.c: Check for channel and session to not be NULL in hangup

[https://gerrit.asterisk.org/c/asterisk/+/11444|https://gerrit.asterisk.org/c/asterisk/+/11444]