Summary: | ASTERISK-23129: segfault in res_pjsip_pubsub.so | ||
Reporter: | Dan Jenkins (danjenkins) | Labels: | |
Date Opened: | 2014-01-10 04:02:37.000-0600 | Date Closed: | 2014-01-28 17:45:03.000-0600 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Resources/res_pjsip_pubsub |
Versions: | SVN 12.0.0 | Frequency of Occurrence | One Time |
Related Issues: | |||
Environment: | debian | Attachments: | ( 0) ASTERISK-23129.diff ( 1) backtrace.txt ( 2) backtrace-2.txt ( 3) gdb_debugging_on_backtrace-2.txt ( 4) moar_debugging_from_process_exiting.txt |
Description: | SVN-branch-12-r405131 died at some point last night.
Running dmesg gives me [7296736.015060] asterisk[7629]: segfault at 10 ip 00007feff0df441c sp 00007fefef92cdb0 error 4 in res_pjsip_pubsub.so[7feff0df1000+8000] I'm currently running Asterisk within gdb but it may be a while before it stops with the error Also, not sure if its relevant but the last thing in /var/log/messages was: [2014-01-09 23:55:28] ERROR[7629] astobj2.c: user_data is NULL | ||
Comments: | By: Rusty Newton (rnewton) 2014-01-10 18:11:00.918-0600 Will put this into feedback until we see if we end up with a backtrace. By: Dan Jenkins (djenkins) 2014-01-13 08:43:00.527-0600 So backtrace.txt was from Jan 10th from a core dump that I had missed. backtrace-2.txt is from today in a live gdb session with asterisk running within it. When I run dmesg today I get: [7335619.955058] asterisk[23966] trap int3 ip:7ffff6500c21 sp:7fffeec9bb18 error:0 And the last thing in /var/log/asterisk/messages is: [2014-01-13 14:16:41] ERROR[25966] astobj2.c: user_data is NULL By: Dan Jenkins (danjenkins) 2014-01-14 10:01:42.056-0600 So asterisk died again :D This time with more debugging, which I'm about to attach dmesg says: [7682521.182821] asterisk[18472]: segfault at 10 ip 00007fa79d66f41c sp 00007fa79c1a7db0 error 4 in res_pjsip_pubsub.so[7fa79d66c000+8000] Oh and asterisk didn't generate a core dump By: Dan Jenkins (djenkins) 2014-01-14 11:45:47.600-0600 Added more info, sending back for triage By: Kevin Harwell (kharwell) 2014-01-23 16:52:36.655-0600 Adding patch that hopefully resolves the crash. What seems to be happening is if a subscription has been terminated and the subscription timeout/expires is less than the time it takes for all pending transactions (currently on the subscription) to end then the subscription timer will not have been canceled yet and sub will be null. Really though the expiration timer on the subscription should be canceled immediately upon the subscription being terminated, but this will more than likely require a change to the pjproject source. However, since the subscription has already been canceled nothing needs to be done so a null check in the asterisk code should be sufficient in working around this problem. By: Dan Jenkins (danjenkins) 2014-01-30 03:56:38.978-0600 I'm sure this fixes the issue of the crashing but I'm 100% sure it's causing all of my PJSIP registrations to be removed, and they never re-register themselves, However, the digium phone does not see it as having lost connection to Asterisk - it still has the icon saying it's connected. If I try to register with Asterisk on a softphone, that won't connect either. I'm unapplying the patch to see if my phones stay registered. |