[Home]

Summary:ASTERISK-26623: res_pjsip: Crash when calling PJSIPShowEndpoint
Reporter:Jørgen H (jorgen)Labels:
Date Opened:2016-11-25 03:39:25.000-0600Date Closed:2017-02-28 10:25:48.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:14.2.0 Frequency of
Occurrence
Related
Issues:
is related toASTERISK-26899 Unable to apply outbound proxy on request to qualify
Environment:linux x64Attachments:( 0) bt_original.txt
( 1) bt.txt
( 2) patch.diff
Description:Crash caused by AMI PJSIPShowEndpoint call
Race condition? Looks like status object is null when calling ast_str_append on line 1252 of res/res_pjsip/pjsip_options.c

Backtrace excerpt:
{noformat}
#0  format_contact_status (obj=obj@entry=0x7f6436c41158, arg=arg@entry=0x7f6439c26060, flags=flags@entry=0) at res_pjsip/pjsip_options.c:1252
       wrapper = 0x7f6436c41158
       contact = 0x7f6436bf4528
       ami = 0x7f6439c26060
       status = 0x0
       buf = 0x7f6437de0fa0
       endpoint = 0x7f64378f7428
       __PRETTY_FUNCTION__ = "format_contact_status"
#1  0x00007f64e4a0f518 in ast_sip_for_each_contact (aor=0x7f643772d5f0, on_contact=0x7f64e49f8ab0 <format_contact_status>, arg=0x7f6439c26060) at res_pjsip/location.c:674
       contact = 0x7f6436bf4528
       wrapper = 0x7f6436c41158
       aor_id = 0x7f6434e8b4b0 "xxxxxxxx"
       contacts = 0x7f64482bba38
       i = {c = 0x7f64482bba38, last_node = 0x7f6436bae668, complete = 0, flags = 0}
       res = 0
       object = 0x7f6436bf4528
       __PRETTY_FUNCTION__ = "ast_sip_for_each_contact"
{noformat}
[Edit by Rusty - removed rest of BT and copied the whole thing to bt_original.txt. Please don't include large chunks of debug in the description field]
Comments:By: Asterisk Team (asteriskteam) 2016-11-25 03:39:29.754-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Joshua C. Colp (jcolp) 2016-11-28 06:34:56.000-0600

Can you please attach the full "thread apply all bt" as an attachment as well as the console output before the crash?

By: Jørgen H (jorgen) 2016-11-28 07:10:50.365-0600

full backtrace

By: Rusty Newton (rnewton) 2016-11-28 13:49:31.225-0600

Thanks for the additional trace. Can you reproduce the issue at will? We would like to see an Asterisk log captured during the reproduction.

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Jørgen H (jorgen) 2016-11-28 14:33:36.875-0600

It happens aprox 1 time per day

The logfile didnt have any intersting data with debug disabled.
I cant really enable debug logging 24/7 on this server as it is in production and there is too much logging if I enable it.

However, Thread 1 and Thread 5 from the backtrace are both dealing with the same endpoint/aor.
Max_contacts is set to 1, so it could be that the AMI is trying to get info from the previous endpoint/aor that was deleted between calls from registering the new endpoint.

Perhaps this delete endpoint + register same endpoint is not atomic with regards to calling pjsipshowendpoint in between calls ?



By: Jørgen H (jorgen) 2016-12-01 14:26:20.252-0600

The problem seems to be solved by adding the following after  status = ast_sorcery_retrieve_by_id(...) (line 1234)

if (!status)
       {
               ast_free(buf);
               return -1;
       }

so the call to format_contact_status will fail instead of crashing. I'm not sure if this is the proper way to do it though.


By: Richard Mudgett (rmudgett) 2016-12-01 17:43:55.150-0600

That change will work and it is not the only place in the code handling the PJSIPShowEndpoint response that can return a -1 failure like that.  When one of those error returns happen though, I think the AMI response for that action is malformed.  See res/res_pjsip/pjsip_configuration.c:ami_show_endpoint() when it reports "Unable to retrieve endpoint %s".

If you would like to create a patch:
https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process

By: Jørgen H (jorgen) 2016-12-06 08:23:04.560-0600

fix to problem

By: Rusty Newton (rnewton) 2016-12-09 08:50:51.518-0600

Thanks! The next step is to get the patch onto Gerrit. https://wiki.asterisk.org/wiki/display/AST/Git+Usage

By: Friendly Automation (friendly-automation) 2017-02-28 10:25:48.813-0600

Change 4967 merged by Joshua Colp:
res_pjsip: Fix crash when contact has no status

[https://gerrit.asterisk.org/4967|https://gerrit.asterisk.org/4967]

By: Friendly Automation (friendly-automation) 2017-02-28 12:36:55.696-0600

Change 5096 merged by Joshua Colp:
res_pjsip: Fix crash when contact has no status

[https://gerrit.asterisk.org/5096|https://gerrit.asterisk.org/5096]

By: Friendly Automation (friendly-automation) 2017-02-28 13:33:22.352-0600

Change 5097 merged by zuul:
res_pjsip: Fix crash when contact has no status

[https://gerrit.asterisk.org/5097|https://gerrit.asterisk.org/5097]