[Home]

Summary:ASTERISK-26764: chan_pjsip: Crash looking up PJSIP call-id on hungup channel.
Reporter:Richard Mudgett (rmudgett)Labels:
Date Opened:2017-02-01 12:07:41.000-0600Date Closed:2017-03-13 08:27:41
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_pjsip
Versions:13.13.1 Frequency of
Occurrence
One Time
Related
Issues:
is duplicated byASTERISK-26857 chan_pjsip: Dialplan function race condition
Environment:kubuntu 10.04 32-bitAttachments:( 0) asterisk_26764_testsuite_logs.zip
( 1) backtrace.txt
( 2) full_backtrace.txt
Description:Got a crash during testsuite test:
channels/pjsip/transfers/blind_transfer/caller_refer_only

res_hep_rtcp was processing a stasis bus message and trying to lookup the PJSIP channel's call-id in assign_uuid().  This is a third-party thread trying to get the call-id of a channel that may get hung up while trying to get the information.  The dialplan function CHANNEL(pjsip,call-id) calls pjsip_acf_channel_read() which indirectly calls read_pjsip() in another thread.  read_pjsip() then calls channel_read_pjsip() which can crash if the channel is hungup by the time execution gets to channel_read_pjsip().
Comments:By: Daniel Journo (journo) 2017-02-27 12:34:30.575-0600

Full Backtrace attached.

By: Daniel Journo (journo) 2017-02-27 13:06:53.817-0600

Frequency for me is about 10 times today on 4 separate servers.
13.14.0-rc1 was installed 3 weeks ago, and suddenly today, it's broken.
I did make a few dialplan changes 3 days ago but I've tested the dialplan and it appears to work fine.
The CLI doesn't show anything obvious, nor points to any specific line in the dialplan.


By: Daniel Journo (journo) 2017-03-07 02:04:11.491-0600

Although I've seen this crash a few times on my production servers, I can't seem to recreate it using the testsuite test.
I've run the test over 500 times without a crash.

The only difference between my production servers and my development machine is the pjsip version.
Production is running against pjsip 2.4.5 and development is running pjsip 2.6.

Did you have to do anything else to get it to crash during the testsuite?


By: Richard Mudgett (rmudgett) 2017-03-07 11:17:55.325-0600

I didn't have to do anything particularly special and this crash only happened once to me.  My test box is a nine year old Dell Vostro 200 with kubuntu 10.4 installed.  On the test box, I build Asterisk with bundled pjproject, no optimization, and BETTER_BACKTRACES to get the best bactraces.  I enable MALLOC_DEBUG and DO_CRASH to catch memory corruption issues when detected.  I build, install, and load just about all modules.  In my case it was a race-condition interaction with the res_hep_rtcp module using CHANNEL(pjsip,call-id) on a channel that was going away.  The res_hep_rtcp module is not needed by this particular test as it is not what the test is trying to verify.  My initial examination of the crash showed that CHANNEL(pjsip,xxx) needs better protection from pjsip channels that may disappear while trying to get the requested channel information.

By: Joshua C. Colp (jcolp) 2017-03-13 08:27:41.226-0500

Duplicated by ASTERISK-26857, whoops.