[Home]

Summary:ASTERISK-24635: PJSIP outbound PUBLISH crashes when no response is ever received
Reporter:Marco Paland (mpaland)Labels:
Date Opened:2014-12-22 10:55:07.000-0600Date Closed:2015-01-30 11:39:21.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Resources/res_pjsip_publish_asterisk
Versions:13.1.0 Frequency of
Occurrence
Related
Issues:
Environment:DebianAttachments:( 0) backtrace.txt
( 1) debug.txt
( 2) ext-hints.conf
( 3) pjsip.conf
Description:Activating the configuration to publish device states to a remote system causes segfaults.
Asterisk is segfaulting nearly exactly every 10 minutes when the publish config is given. This is reproduceable on the production pbx.
During the test time no event/hint change occured, so all device states were stable and unchanged.

Upon deactivation (uncomment all [dds-h] sections) of the pj config, asterisk runs without any segfaults.

I set up an asterisk with debug symbols and created a backtrace, see attached file.

Segfault is every time (10 minutes) in file res_pjsip_outbound_publish.c in line 196.

pj config and all hints are attached.
Comments:By: Rusty Newton (rnewton) 2014-12-22 18:21:59.450-0600

Thanks for the debug. Can you also provide an Asterisk full log, captured during the crash, including the  "DEBUG" logger channel (verbose and debug levels turned up to 5)?

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Marco Paland (mpaland) 2014-12-29 07:51:22.070-0600

Added a debug log as requested.
After the last line the crash occured.

By: Marco Paland (mpaland) 2015-01-13 04:30:36.482-0600

I've attached the log file, please update the status. Thanx.

By: Matt Jordan (mjordan) 2015-01-13 08:11:37.233-0600

Your log file is interesting, in that it doesn't show any response being received for the outbound PUBLISH requests.

Is the remote system reachable? Does it show it attempting to respond?

By: Marco Paland (mpaland) 2015-01-13 08:30:18.821-0600

Yes, the remote system is basically reachable (on IP level), but it's running an older asterisk version (11.9.0) not using pjsip. It's scheduled for updating soon.

So, I guess it can't respond - but this must not lead to local crashes every 10 minutes.
Imagine the vpn tunnel is down...


By: Félim Whiteley (felimwhiteley) 2016-04-08 06:05:24.854-0500

Hi All, a bit late to this chain but...

I'm seeing what is described here using asterisk-certified-13.1-cert4, I'm curious if this was backported to the certified release? I have two servers both with remote device state set up, if serverA is offline, from serverB console I see it trying a PUBLISH every few seconds till it relatively quickly shuts down cleanly with

Disconnected from Asterisk server
Asterisk cleanly ending (0).
Executing last minute cleanups

As Marco above states in a scenario with VPN links being offline it's going to take the serverB centre offline. Strangely if serverA was up at some point and the serverB daemon has not been restarted since it seems to not fail.

By: Joshua C. Colp (jcolp) 2016-04-08 06:08:41.178-0500

That change did not go into certified. Certified does not receive every bugfix, only fixes needed by support agreement customers.

By: Félim Whiteley (felimwhiteley) 2016-04-08 06:21:42.395-0500

Cheers for reply! Was expecting this to be a shout into the ether...

Do you have an ETA on when the next certified will be released? It seems a fairly serious bug. I'd happily use 13.8 except I need the DPMA. I could potentially use a manual config of the phones via XML etc. but I'd like to avoid it.

By: Joshua C. Colp (jcolp) 2016-04-08 06:24:13.389-0500

I don't have an ETA on that. As for being a fairly serious bug - to a small subset of people, yes. As I said though it has not impacted customers using certified and thus was not pulled in.

By: Félim Whiteley (felimwhiteley) 2016-04-08 06:30:37.994-0500

Ok cheers, hopefully a new cert will be out soon so.

By: Félim Whiteley (felimwhiteley) 2016-05-05 08:45:34.159-0500

Oh just tested cert7 and it seems to be fixed (5 & 6 hadn't). Appreciate that!

By: Félim Whiteley (felimwhiteley) 2016-07-03 09:44:10.549-0500

Ok correction this is not fixed. Testing in a VM appeared to show it was but once was under live environment it died.

[101640.457522] asterisk[2547]: segfault at 1158 ip 00007f9611214f54 sp 00007f95ebffec00 error 4 in res_pjsip_outbound_publish.so[7f9611211000+7000]



By: Kevin Harwell (kharwell) 2016-07-07 10:34:50.612-0500

[~felimwhiteley] The fix for this is not currently in the Asterisk 13.1-certified branch, so it is possible for the crash to still occur there.