[Home]

Summary:ASTERISK-27568: PJSIP: Crash during SIP attended transfer.
Reporter:Bryan Walters (gamegamer43)Labels:pjsip
Date Opened:2018-01-09 12:04:38.000-0600Date Closed:2018-03-01 08:40:20.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_pjsip Resources/res_pjsip_refer
Versions:13.18.5 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) backtrace.txt
Description:We've had reports from users of Asterisk 13.18.5 where asterisk will core dump pretty frequently when using chan_pjsip. Reviewing this with our team, it appears that that chan_pjsip_session_end checks session->channel for validity and later calls ast_channel_hangupcause(session->channel). However, between the time of the check and the call to ast_channel_hangupcause, something is setting session->channel to null, thus causing Asterisk to core dump.
Comments:By: Asterisk Team (asteriskteam) 2018-01-09 12:04:38.992-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Kevin Harwell (kharwell) 2018-01-10 14:39:29.210-0600

This is marked as a regression. When did users start noticing the issue after upgrading? For instance 13.17->13.18 or 13.18.x->13.18.5?

It looks like this happened during an attended transfer. Are all the crashes the same? In the same spot that is?

Anything else you can provide with regards to the scenario in which the crash occurs might help us duplicate/track down the problem. Like was it just a basic transfer between two parties? Local channels, queue, etc... involved?

Log output could be helpful especially if it contains debug and a sip trace.

Thanks!



By: Bryan Walters (gamegamer43) 2018-01-10 16:42:18.915-0600

This started occurring for the user as upgrading from 13.18.4 to 13.18.5. Looking over a bunch of crashes, they all happen at the same spot and let me see if I can dig up some logs to attach to the issue to further assist you guys.

By: Kevin Harwell (kharwell) 2018-01-11 17:50:29.968-0600

Hrm, interesting. There is only issue fixed between 13.18.4 and 13.18.5 and it doesn't appear it would cause/effect something like this. Hopefully the logs are helpful.

By: Kevin Harwell (kharwell) 2018-01-18 12:18:00.354-0600

Putting in "waiting on feedback" while we await log files.

By: Richard Mudgett (rmudgett) 2018-02-07 12:20:47.117-0600

The log is huge but it doesn't have the right log levels to show what is happening at the time of the crash.  Asterisk restarts several times in that log without any log messages hinting at what happened.  However, looking at the backtrace, I see that the crash happens during a SIP attended transfer.  The transferrer channel completes hanging up and disassociates itself from the session in another thread even though there is supposed to be protection from that happening.

I think refer_attended_task() needs to push ast_sip_session_end_if_deferred() onto the transferrer's serializer to avoid the problem.

By: Richard Mudgett (rmudgett) 2018-02-07 12:27:40.047-0600

I think this crash potential has been around awhile.

By: Friendly Automation (friendly-automation) 2018-03-01 08:40:22.090-0600

Change 8375 merged by Jenkins2:
res_pjsip_refer.c: Fix attended transfer race condition crash.

[https://gerrit.asterisk.org/8375|https://gerrit.asterisk.org/8375]

By: Friendly Automation (friendly-automation) 2018-03-01 08:44:48.179-0600

Change 8376 merged by Jenkins2:
res_pjsip_refer.c: Fix attended transfer race condition crash.

[https://gerrit.asterisk.org/8376|https://gerrit.asterisk.org/8376]

By: Friendly Automation (friendly-automation) 2018-03-01 08:45:50.778-0600

Change 8377 merged by Jenkins2:
res_pjsip_refer.c: Fix attended transfer race condition crash.

[https://gerrit.asterisk.org/8377|https://gerrit.asterisk.org/8377]

By: Friendly Automation (friendly-automation) 2018-03-01 08:58:21.584-0600

Change 8399 merged by Jenkins2:
res_pjsip_refer.c: Fix attended transfer race condition crash.

[https://gerrit.asterisk.org/8399|https://gerrit.asterisk.org/8399]