[Home]

Summary:ASTERISK-26776: res_pjsip_pubsub: Crash when generating xpidf content
Reporter:Andrew Green (andrew867)Labels:
Date Opened:2017-02-08 12:59:55.000-0600Date Closed:2017-03-22 07:07:29
Priority:MinorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip_pubsub
Versions:13.13.1 Frequency of
Occurrence
Constant
Related
Issues:
is duplicated byASTERISK-26819 Crash in xpidf_to_string
Environment:FreePBX 13 SHMZ release 6.6 (Final) Linux freepbx 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Cisco 7962 using firmware SIP42.9-4-2SR2-2S Attachments:( 0) asterisk_console_and_pjsip_log.txt
( 1) Asterisk_crash_on_start.txt
( 2) backtrace.core.1757_crash_on_start.txt
( 3) backtrace.core.29832.txt
( 4) crash-xpidf.txt
( 5) dialplan.xml
( 6) extensions.conf
( 7) malloc-backtrace.txt
( 8) pjsip.conf
( 9) REGISTER_SUBSCRIBE_client.csv
(10) REGISTER_SUBSCRIBE_client.xml
(11) SEPmachere.cnf.xml
(12) softkeys.xml
(13) valgrind-output.txt
(14) XMLDefault.cnf.xml
Description:Asterisk crashes randomly when using chan_pjsip and Cisco 7962 phones. This is the first repeatable bug I found, the other crashes happened when the phone would attempt registration and do not happen after upgrading from 13.12 to 13.13.1. I have not tried other phone firmware versions but I can upon request. Server backtrace is attached.

Actions to reproduce:
-Register Cisco 7962 to chan_pjsip using TCP transport (see attached XML config files)
-Dial an internal three digit extension, call completes correctly
-After call try typing extension number again and Asterisk will crash and phone will reset indicating lost registration with SIP server.

Note: On my phone external numbers do not cause this issue

Comments:By: Asterisk Team (asteriskteam) 2017-02-08 12:59:56.474-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Andrew Green (andrew867) 2017-02-08 13:24:40.669-0600

Crash does not happen if you engage the line first, ex: press the line button, speaker button, or pick up the handset.

By: Joshua C. Colp (jcolp) 2017-02-09 07:54:03.120-0600

Thank you for taking the time to report this bug and helping to make Asterisk better. Unfortunately, we cannot work on this bug because your description did not include enough information. Please read over the Asterisk Issue Guidelines [1] which discusses the information necessary for your issue to be resolved and the format that information needs to be in. We would be grateful if you would then provide a more complete description of the problem. At a minimum, we need:

1. The specific steps or actions you took that caused you to encounter the problem.
2. The behavior you expected and the location of documentation that led you to that expectation.
3. The behavior you actually encountered.

To demonstrate the issue in detail, please include Asterisk log files generated per the instructions on the wiki [2]. If applicable, please ensure that protocol-level trace debugging is enabled, e.g., 'sip set debug on' if the issue involves chan_sip, and configuration information such as dialplan and channel configuration.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

[2] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

In this case the Asterisk console output and SIP signaling (pjsip set logger on) would be useful.

By: Andrew Green (andrew867) 2017-02-09 10:20:19.319-0600

I've attached the SIP and console output logs.

By: Andrew Green (andrew867) 2017-02-09 10:45:20.850-0600

Asterisk is now crashing on start even before it completes loading.

By: Andrew Green (andrew867) 2017-02-09 10:58:54.292-0600

I renamed res_pjsip_xpidf_body_generator.so to prevent Asterisk from loading that module (could have changed modules.conf but I'm not sure if FreePBX rewrites that file) and it seems to be starting and working without crashing.

By: Andrew Green (andrew867) 2017-02-10 09:56:36.166-0600

1. The specific steps or actions you took that caused you to encounter the problem.
Set up extension using chan_pjsip for Cisco 7962 phone (let me know if you need the PJSIP configs or other console output). Created XML config as attached to allow phone to connect to Asterisk server. Note I am using the Cisco phone behind a NAT and incoming calls work fine, and outgoing calls if you engage the line first.
2. The behavior you expected and the location of documentation that led you to that expectation.
Asterisk stay running and respond to the phone with a correct XML SIP SUBSCRIBE response.
3. The behavior you actually encountered.
Asterisk crash and subsequent restart (through safe_asterisk) and phone reboot due to lost registration. Somehow it also looks like the PJSIP modules stored the failed SIP SUBSCRIBE and would not start without crashing, I've attached those logs as well. I worked around the crashes by removing res_pjsip_xpidf_body_generator.so from Asterisk's loaded modules to prevent an attempted response to the SUBSCRIBE message.

By: Richard Mudgett (rmudgett) 2017-02-24 12:49:47.285-0600

The backtrace on ASTERISK-26819 has useful symbols in it.

By: Joshua Elson (joshelson) 2017-03-10 14:09:44.460-0600

In case anyone has a chance to look at this before we do, I'm attaching an extensions.conf, pjsip.conf, and a SIPP scenario that will 100% of the time crash the box.

You can run the scenario by running:

sipp <asterisk-ip> -sf REGISTER_SUBSCRIBE_client.xml -inf REGISTER_SUBSCRIBE_client.csv -m 1 -l 1 -r 1 -t tn

Hope this helps.

By: Richard Mudgett (rmudgett) 2017-03-10 14:39:08.742-0600

[~joshelson] Please turn on MALLOC_DEBUG when recreating the crash as a malloc/free abort is a sign of memory corruption.

https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag

By: Joshua Elson (joshelson) 2017-03-11 10:12:51.522-0600

Recreated with mmlog entry below:

1489248637 - New session
WARNING: High fence violation of 0x7febd4045f30 allocated at res_pjsip_pubsub.c allocate_subscription() line 1114
1489248650 - New session

By: Joshua Elson (joshelson) 2017-03-13 15:26:50.713-0500

This seems like a poor solution, but it does prevent the crash under the existing circumstances, for anyone that wants to test this:

https://gerrit.asterisk.org/#/c/5171/

The xpidf body here can be a lot bigger than 128... so additional space is certainly needed. But still probably needs a better fix to not crash.

By: Joshua Elson (joshelson) 2017-03-13 19:23:53.309-0500

Adding output for both reproducing the issue under valgrind and with debug malloc enabled.

By: Matthew Fredrickson (mattf) 2017-03-20 11:01:32.071-0500

Hey [~joshelson],

I talked a little bit with [~rmudgett], and from looking at your valgrind output, it looks like there might be a problem in the xml_print_node() function in pjproject.  If you look down the function, at these lines:

{code}
   /* Check for empty node. */
   if (node->content.slen==0 &&
       node->node_head.next==(pj_xml_node*)&node->node_head)
   {
       *p++ = ' ';
       *p++ = '/';
       *p++ = '>';
       return (int)(p-buf);
   }
{code}

it appears that this if block needs to check to see if there's enough room left in the buffer to write those three characters " />" before trying to write them.

I'm guessing that's where you need to focus your patching efforts (from your gerrit review).  Unfortunately, that would make it a bug in pjproject - but if you provide a patch against pjproject as a patch file in Asterisk's third-party/pjproject/patches/ directory we can make sure that it's applied against bundled builds of Asterisk.

By: Joshua Elson (joshelson) 2017-03-20 21:11:20.319-0500

[~mattf] and [~rmudgett] Thanks a bunch for the help. Was able to patch properly in pjsip and verify my sipp scenario passes now. And that all makes sense as to why that was happening.

Do we have a process for submitting patches upstream?

By: Matthew Fredrickson (mattf) 2017-03-21 08:42:17.652-0500

Hey Josh,

I talked with [~gjoseph] (he's done quite a few of these patches) and typically what he does is he'll post a review with the patch on gerrit, and also submit the patch for inclusion upstream with the pjproject guys.  When the upstream PJPROJECT guys accept it, usually we'll +2 the bundled patch and merge it into the Asterisk project's patch directory.  Hope that helps!  If you'd like some more "real-time" questions answered, we're also usually on irc.freenode.net in #asterisk-dev.

Hope that helps!
Matthew Fredrickson

By: Andrew Green (andrew867) 2017-03-21 09:37:55.448-0500

Just wanted to say thanks to everyone and how long do you think it will be until we see this patch included in a v13 release?

By: Joshua Elson (joshelson) 2017-03-21 09:44:46.099-0500

Patch is submitted against 13, 14, and trunk. Have to go through the normal process there.

If you want to manually apply it against your FreePBX distro, it's not too hard. Just drop the patch file into the third-party/pjproject/patches/ directory and completely rebuild (run a make dist-clean) Asterisk.

By: Sean Bright (seanbright) 2017-03-21 17:13:58.124-0500

[~andrew867], it will be in 13.15 & 14.4 when they are released, assuming you are using the bundled version of PJSIP. If you are using an external PJSIP you can patch it today.

By: Joshua Elson (joshelson) 2017-03-21 19:39:10.314-0500

[~gjoseph] and [~mattf]. This has been accepted as an upstream patch as well.

https://trac.pjsip.org/repos/changeset/5570

Not sure what that means for when this will be in pjsip, but they've accepted the patch as well.

Hope that helps!

By: Friendly Automation (friendly-automation) 2017-03-22 07:07:30.724-0500

Change 5171 merged by zuul:
pjsip: prevent memory corruption on creation of xml bodies

[https://gerrit.asterisk.org/5171|https://gerrit.asterisk.org/5171]

By: Friendly Automation (friendly-automation) 2017-03-22 07:18:08.136-0500

Change 5172 merged by Joshua Colp:
pjsip: prevent memory corruption on creation of xml bodies

[https://gerrit.asterisk.org/5172|https://gerrit.asterisk.org/5172]

By: Friendly Automation (friendly-automation) 2017-03-22 08:33:17.211-0500

Change 5173 merged by zuul:
pjsip: prevent memory corruption on creation of xml bodies

[https://gerrit.asterisk.org/5173|https://gerrit.asterisk.org/5173]