[Home]

Summary:ASTERISK-23129: segfault in res_pjsip_pubsub.so
Reporter:Dan Jenkins (danjenkins)Labels:
Date Opened:2014-01-10 04:02:37.000-0600Date Closed:2014-01-28 17:45:03.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip_pubsub
Versions:SVN 12.0.0 Frequency of
Occurrence
One Time
Related
Issues:
Environment:debianAttachments:( 0) ASTERISK-23129.diff
( 1) backtrace.txt
( 2) backtrace-2.txt
( 3) gdb_debugging_on_backtrace-2.txt
( 4) moar_debugging_from_process_exiting.txt
Description:SVN-branch-12-r405131 died at some point last night.

Running dmesg gives me

[7296736.015060] asterisk[7629]: segfault at 10 ip 00007feff0df441c sp 00007fefef92cdb0 error 4 in res_pjsip_pubsub.so[7feff0df1000+8000]

I'm currently running Asterisk within gdb but it may be a while before it stops with the error

Also, not sure if its relevant but the last thing in /var/log/messages was:

[2014-01-09 23:55:28] ERROR[7629] astobj2.c: user_data is NULL

Comments:By: Rusty Newton (rnewton) 2014-01-10 18:11:00.918-0600

Will put this into feedback until we see if we end up with a backtrace.

By: Dan Jenkins (djenkins) 2014-01-13 08:43:00.527-0600

So backtrace.txt was from Jan 10th from a core dump that I had missed.

backtrace-2.txt is from today in a live gdb session with asterisk running within it.

When I run dmesg today I get:

[7335619.955058] asterisk[23966] trap int3 ip:7ffff6500c21 sp:7fffeec9bb18 error:0

And the last thing in /var/log/asterisk/messages is:

[2014-01-13 14:16:41] ERROR[25966] astobj2.c: user_data is NULL

By: Dan Jenkins (danjenkins) 2014-01-14 10:01:42.056-0600

So asterisk died again :D

This time with more debugging, which I'm about to attach


dmesg says:

[7682521.182821] asterisk[18472]: segfault at 10 ip 00007fa79d66f41c sp 00007fa79c1a7db0 error 4 in res_pjsip_pubsub.so[7fa79d66c000+8000]

Oh and asterisk didn't generate a core dump

By: Dan Jenkins (djenkins) 2014-01-14 11:45:47.600-0600

Added more info, sending back for triage

By: Kevin Harwell (kharwell) 2014-01-23 16:52:36.655-0600

Adding patch that hopefully resolves the crash.  What seems to be happening is if a subscription has been terminated and the subscription timeout/expires is less than the time it takes for all pending transactions (currently on the subscription) to end then the subscription timer will not have been canceled yet and sub will be null.


Really though the expiration timer on the subscription should be canceled immediately upon the subscription being terminated, but this will more than likely require a change to the pjproject source.  However, since the subscription has already been canceled nothing needs to be done so a null check in the asterisk code should be sufficient in working around this problem.


By: Dan Jenkins (danjenkins) 2014-01-30 03:56:38.978-0600

I'm sure this fixes the issue of the crashing but I'm 100% sure it's causing all of my PJSIP registrations to be removed, and they never re-register themselves, However, the digium phone does not see it as having lost connection to Asterisk - it still has the icon saying it's connected.

If I try to register with Asterisk on a softphone, that won't connect either. I'm unapplying the patch to see if my phones stay registered.