Summary: | ASTERISK-26623: res_pjsip: Crash when calling PJSIPShowEndpoint | ||||
Reporter: | Jørgen H (jorgen) | Labels: | |||
Date Opened: | 2016-11-25 03:39:25.000-0600 | Date Closed: | 2017-02-28 10:25:48.000-0600 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Resources/res_pjsip | ||
Versions: | 14.2.0 | Frequency of Occurrence | |||
Related Issues: |
| ||||
Environment: | linux x64 | Attachments: | ( 0) bt_original.txt ( 1) bt.txt ( 2) patch.diff | ||
Description: | Crash caused by AMI PJSIPShowEndpoint call
Race condition? Looks like status object is null when calling ast_str_append on line 1252 of res/res_pjsip/pjsip_options.c Backtrace excerpt: {noformat} #0 format_contact_status (obj=obj@entry=0x7f6436c41158, arg=arg@entry=0x7f6439c26060, flags=flags@entry=0) at res_pjsip/pjsip_options.c:1252 wrapper = 0x7f6436c41158 contact = 0x7f6436bf4528 ami = 0x7f6439c26060 status = 0x0 buf = 0x7f6437de0fa0 endpoint = 0x7f64378f7428 __PRETTY_FUNCTION__ = "format_contact_status" #1 0x00007f64e4a0f518 in ast_sip_for_each_contact (aor=0x7f643772d5f0, on_contact=0x7f64e49f8ab0 <format_contact_status>, arg=0x7f6439c26060) at res_pjsip/location.c:674 contact = 0x7f6436bf4528 wrapper = 0x7f6436c41158 aor_id = 0x7f6434e8b4b0 "xxxxxxxx" contacts = 0x7f64482bba38 i = {c = 0x7f64482bba38, last_node = 0x7f6436bae668, complete = 0, flags = 0} res = 0 object = 0x7f6436bf4528 __PRETTY_FUNCTION__ = "ast_sip_for_each_contact" {noformat} [Edit by Rusty - removed rest of BT and copied the whole thing to bt_original.txt. Please don't include large chunks of debug in the description field] | ||||
Comments: | By: Asterisk Team (asteriskteam) 2016-11-25 03:39:29.754-0600 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Joshua C. Colp (jcolp) 2016-11-28 06:34:56.000-0600 Can you please attach the full "thread apply all bt" as an attachment as well as the console output before the crash? By: Jørgen H (jorgen) 2016-11-28 07:10:50.365-0600 full backtrace By: Rusty Newton (rnewton) 2016-11-28 13:49:31.225-0600 Thanks for the additional trace. Can you reproduce the issue at will? We would like to see an Asterisk log captured during the reproduction. https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information By: Jørgen H (jorgen) 2016-11-28 14:33:36.875-0600 It happens aprox 1 time per day The logfile didnt have any intersting data with debug disabled. I cant really enable debug logging 24/7 on this server as it is in production and there is too much logging if I enable it. However, Thread 1 and Thread 5 from the backtrace are both dealing with the same endpoint/aor. Max_contacts is set to 1, so it could be that the AMI is trying to get info from the previous endpoint/aor that was deleted between calls from registering the new endpoint. Perhaps this delete endpoint + register same endpoint is not atomic with regards to calling pjsipshowendpoint in between calls ? By: Jørgen H (jorgen) 2016-12-01 14:26:20.252-0600 The problem seems to be solved by adding the following after status = ast_sorcery_retrieve_by_id(...) (line 1234) if (!status) { ast_free(buf); return -1; } so the call to format_contact_status will fail instead of crashing. I'm not sure if this is the proper way to do it though. By: Richard Mudgett (rmudgett) 2016-12-01 17:43:55.150-0600 That change will work and it is not the only place in the code handling the PJSIPShowEndpoint response that can return a -1 failure like that. When one of those error returns happen though, I think the AMI response for that action is malformed. See res/res_pjsip/pjsip_configuration.c:ami_show_endpoint() when it reports "Unable to retrieve endpoint %s". If you would like to create a patch: https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process By: Jørgen H (jorgen) 2016-12-06 08:23:04.560-0600 fix to problem By: Rusty Newton (rnewton) 2016-12-09 08:50:51.518-0600 Thanks! The next step is to get the patch onto Gerrit. https://wiki.asterisk.org/wiki/display/AST/Git+Usage By: Friendly Automation (friendly-automation) 2017-02-28 10:25:48.813-0600 Change 4967 merged by Joshua Colp: res_pjsip: Fix crash when contact has no status [https://gerrit.asterisk.org/4967|https://gerrit.asterisk.org/4967] By: Friendly Automation (friendly-automation) 2017-02-28 12:36:55.696-0600 Change 5096 merged by Joshua Colp: res_pjsip: Fix crash when contact has no status [https://gerrit.asterisk.org/5096|https://gerrit.asterisk.org/5096] By: Friendly Automation (friendly-automation) 2017-02-28 13:33:22.352-0600 Change 5097 merged by zuul: res_pjsip: Fix crash when contact has no status [https://gerrit.asterisk.org/5097|https://gerrit.asterisk.org/5097] |