[Home]

Summary:ASTERISK-26675: PJSIP Segmentation Fault grp_lock_acquire
Reporter:Ross Beer (rossbeer)Labels:
Date Opened:2016-12-23 08:04:47.000-0600Date Closed:2017-02-13 10:13:55.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_pjsip
Versions:GIT Frequency of
Occurrence
Frequent
Related
Issues:
is related toASTERISK-26669 PJSIP Segfault 13.13.1 (Bundled PJSIP)
Environment:Fedora 23, PJSIP SVN 2.5.5 (Commit 5503) Attachments:( 0) pjproject-svn-tsx_timer_callback.patch
Description:Asterisk 13 GIT segfaults when trying to obtain a group lock when accessing a SIP Transaction:

{noformat}
#0  0x00007fe560cd97a8 in grp_lock_acquire (p=0x7fe574848758) at ../src/pj/lock.c:290
       glock = 0x7fe574848758
       lck = 0x0
#1  0x00007fe560cd97f2 in pj_grp_lock_acquire (grp_lock=<optimized out>) at ../src/pj/lock.c:478
#2  0x00007fe56cd45509 in tsx_timer_callback (theap=<optimized out>, entry=0x7fe57593a1b8) at ../src/pjsip/sip_transaction.c:1170
       event = {prev = 0x0, next = 0x7fe560cd57e7 <pj_elapsed_time+71>, type = PJSIP_EVENT_TIMER, body = {timer = {entry = 0x7fe57593a1b8}, tsx_state = {src = {rdata = 0x7fe57593a1b8, tdata = 0x7fe57593a1b8, timer = 0x7fe57593a1b8, status = 1972609464, data = 0x7fe57593a1b8}, tsx = 0x2e1bcb0, prev_state = 1624070230, type = 32741}, tx_msg = {tdata = 0x7fe57593a1b8}, tx_error = {tdata = 0x7fe57593a1b8, tsx = 0x2e1bcb0}, rx_msg = {rdata = 0x7fe57593a1b8}, user = {user1 = 0x7fe57593a1b8, user2 = 0x2e1bcb0, user3 = 0x7fe560cd5856 <pj_gettickcount+54>, user4 = 0x7fe57593a1b8}}}
       tsx = 0x7fe57593a038
#3  0x00007fe560ce3b1f in pj_timer_heap_poll (ht=0x2e1bcb0, next_delay=next_delay@entry=0x7fe55a8c4d70) at ../src/pj/timer.c:643
       node = 0x7fe57593a1b8
       grp_lock = 0x0
       now = {sec = 4625280, msec = 350}
       count = 1
#4  0x00007fe56cd342bb in pjsip_endpt_handle_events2 (endpt=0x2e1b9c8, max_timeout=max_timeout@entry=0x7fe55a8c4db0, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:713
       timeout = {sec = 0, msec = 0}
       count = 0
       net_event_count = 0
       c = <optimized out>
#5  0x00007fe56cd34387 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7fe55a8c4db0) at ../src/pjsip/sip_endpoint.c:770
#6  0x00007fe55ec12d18 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4029
       delay = {sec = 0, msec = 10}
#7  0x00007fe560cd4196 in thread_main (param=0x2e237a8) at ../src/pj/os_core_unix.c:541
       rec = 0x2e237a8
       result = <optimized out>
#8  0x00007fe5eeeb061a in start_thread (arg=0x7fe55a8c5700) at pthread_create.c:334
       __res = <optimized out>
       pd = 0x7fe55a8c5700
       now = <optimized out>
       unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140623043385088, 2214477618423267645, 140725615033839, 140623043385088, 8388608, 0, -2201747081483517635, -2202055161909214915}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
       not_first_call = <optimized out>
       pagesize_m1 = <optimized out>
       sp = <optimized out>
       freesize = <optimized out>
#9  0x00007fe5ee1ec5fd in clone () at /lib64/libc.so.6
{noformat}

The thread running at the time was :

{noformat}
Thread 1 (Thread 0x7fe55a8c5700 (LWP 26817)):
#0  0x00007fe560cd97a8 in grp_lock_acquire (p=0x7fe574848758) at ../src/pj/lock.c:290
#1  0x00007fe560cd97f2 in pj_grp_lock_acquire (grp_lock=<optimized out>) at ../src/pj/lock.c:478
#2  0x00007fe56cd45509 in tsx_timer_callback (theap=<optimized out>, entry=0x7fe57593a1b8) at ../src/pjsip/sip_transaction.c:1170
#3  0x00007fe560ce3b1f in pj_timer_heap_poll (ht=0x2e1bcb0, next_delay=next_delay@entry=0x7fe55a8c4d70) at ../src/pj/timer.c:643
#4  0x00007fe56cd342bb in pjsip_endpt_handle_events2 (endpt=0x2e1b9c8, max_timeout=max_timeout@entry=0x7fe55a8c4db0, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:713
#5  0x00007fe56cd34387 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7fe55a8c4db0) at ../src/pjsip/sip_endpoint.c:770
#6  0x00007fe55ec12d18 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4029
#7  0x00007fe560cd4196 in thread_main (param=0x2e237a8) at ../src/pj/os_core_unix.c:541
#8  0x00007fe5eeeb061a in start_thread (arg=0x7fe55a8c5700) at pthread_create.c:334
#9  0x00007fe5ee1ec5fd in clone () at /lib64/libc.so.6
{noformat}

Another segfault happened:

{noformat}
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/asterisk -f -vvvg -c'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fdb45d69417 in copy_node (ht=0x142b0e0, ht=0x142b0e0, moved_node=0x7fdb506cb730, slot=0) at ../src/pj/timer.c:136
136         ht->timer_ids[moved_node->_timer_id] = (int)slot;
[Current thread is 1 (Thread 0x7fdaecd08700 (LWP 29086))]
#0  0x00007fdb45d69417 in remove_node (ht=0x142b0e0, ht=0x142b0e0, moved_node=0x7fdb506cb730, slot=0) at ../src/pj/timer.c:136
       parent = <optimized out>
       moved_node = 0x7fdb506cb730
       removed_node = 0x7fdb20247030
#1  0x00007fdb45d69417 in remove_node (ht=ht@entry=0x142b0e0, slot=slot@entry=0) at ../src/pj/timer.c:244
       parent = <optimized out>
       moved_node = 0x7fdb506cb730
       removed_node = 0x7fdb20247030
#2  0x00007fdb45d69af1 in pj_timer_heap_poll (ht=0x142b0e0, next_delay=next_delay@entry=0x7fdaecd07d70) at ../src/pj/timer.c:630
       node = <optimized out>
       grp_lock = <optimized out>
       now = {sec = 4458703, msec = 334}
       count = 8
#3  0x00007fdb472742bb in pjsip_endpt_handle_events2 (endpt=0x142adf8, max_timeout=max_timeout@entry=0x7fdaecd07db0, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:713
       timeout = {sec = 0, msec = 0}
       count = 0
       net_event_count = 0
       c = <optimized out>
#4  0x00007fdb47274387 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7fdaecd07db0) at ../src/pjsip/sip_endpoint.c:770
#5  0x00007fdb440aecc8 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4007
       delay = {sec = 0, msec = 10}
#6  0x00007fdb45d5a196 in thread_main (param=0x1433b88) at ../src/pj/os_core_unix.c:541
       rec = 0x1433b88
       result = <optimized out>
#7  0x00007fdb813fa61a in start_thread (arg=0x7fdaecd08700) at pthread_create.c:334
       __res = <optimized out>
       pd = 0x7fdaecd08700
       now = <optimized out>
       unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140578252687104, -1527102281778558953, 140726077707343, 140578252687104, 8388608, 0, 1547968542935517207, 1547451494643304471}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
       not_first_call = <optimized out>
       pagesize_m1 = <optimized out>
       sp = <optimized out>
       freesize = <optimized out>
#8  0x00007fdb807365fd in clone () at /lib64/libc.so.6
{noformat}

While running thread:

{noformat}
Thread 1 (Thread 0x7fdaecd08700 (LWP 29086)):
#0  0x00007fdb45d69417 in remove_node (ht=0x142b0e0, ht=0x142b0e0, moved_node=0x7fdb506cb730, slot=0) at ../src/pj/timer.c:136
#1  0x00007fdb45d69417 in remove_node (ht=ht@entry=0x142b0e0, slot=slot@entry=0) at ../src/pj/timer.c:244
#2  0x00007fdb45d69af1 in pj_timer_heap_poll (ht=0x142b0e0, next_delay=next_delay@entry=0x7fdaecd07d70) at ../src/pj/timer.c:630
#3  0x00007fdb472742bb in pjsip_endpt_handle_events2 (endpt=0x142adf8, max_timeout=max_timeout@entry=0x7fdaecd07db0, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:713
#4  0x00007fdb47274387 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7fdaecd07db0) at ../src/pjsip/sip_endpoint.c:770
#5  0x00007fdb440aecc8 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4007
#6  0x00007fdb45d5a196 in thread_main (param=0x1433b88) at ../src/pj/os_core_unix.c:541
#7  0x00007fdb813fa61a in start_thread (arg=0x7fdaecd08700) at pthread_create.c:334
#8  0x00007fdb807365fd in clone () at /lib64/libc.so.6
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2016-12-23 08:04:48.269-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Mark Michelson (mmichelson) 2016-12-28 13:06:40.598-0600

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

The backtraces are very helpful, but debug logs from Asterisk at the time of the crash can help us to reproduce and fix the issue better.

By: Ross Beer (rossbeer) 2016-12-28 13:42:09.661-0600

I have a patch which I am currently testing for PJSIP. Once confirmed it resolves the crash I will have it committed to gerrit.

By: Ross Beer (rossbeer) 2017-01-13 10:44:56.291-0600

Alexei Gradinari provided the attached patch to resolve this issue, I believe the patch resolves this issue.

By: Rusty Newton (rnewton) 2017-01-14 14:09:10.982-0600

Awesome that it solves the issue. [~alexei gradinari] will need to attach it to the issue and mark it as a contribution for us to run with it, or Alexei can push it onto Gerrit.

By: Joshua C. Colp (jcolp) 2017-01-31 12:07:32.036-0600

[~rossbeer] It would be nice to understand what particularly scenario caused this to happen and how it got there. Is a debug log as originally asked available?

By: Ross Beer (rossbeer) 2017-02-01 04:48:59.346-0600

After getting the latest GIT branch of 13 and using the packaged PJSIP I have not experienced this issue again.

I will continue to monitor and will update you if the issue happens again.