Summary: | ASTERISK-26675: PJSIP Segmentation Fault grp_lock_acquire | ||||
Reporter: | Ross Beer (rossbeer) | Labels: | |||
Date Opened: | 2016-12-23 08:04:47.000-0600 | Date Closed: | 2017-02-13 10:13:55.000-0600 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Channels/chan_pjsip | ||
Versions: | GIT | Frequency of Occurrence | Frequent | ||
Related Issues: |
| ||||
Environment: | Fedora 23, PJSIP SVN 2.5.5 (Commit 5503) | Attachments: | ( 0) pjproject-svn-tsx_timer_callback.patch | ||
Description: | Asterisk 13 GIT segfaults when trying to obtain a group lock when accessing a SIP Transaction:
{noformat} #0 0x00007fe560cd97a8 in grp_lock_acquire (p=0x7fe574848758) at ../src/pj/lock.c:290 glock = 0x7fe574848758 lck = 0x0 #1 0x00007fe560cd97f2 in pj_grp_lock_acquire (grp_lock=<optimized out>) at ../src/pj/lock.c:478 #2 0x00007fe56cd45509 in tsx_timer_callback (theap=<optimized out>, entry=0x7fe57593a1b8) at ../src/pjsip/sip_transaction.c:1170 event = {prev = 0x0, next = 0x7fe560cd57e7 <pj_elapsed_time+71>, type = PJSIP_EVENT_TIMER, body = {timer = {entry = 0x7fe57593a1b8}, tsx_state = {src = {rdata = 0x7fe57593a1b8, tdata = 0x7fe57593a1b8, timer = 0x7fe57593a1b8, status = 1972609464, data = 0x7fe57593a1b8}, tsx = 0x2e1bcb0, prev_state = 1624070230, type = 32741}, tx_msg = {tdata = 0x7fe57593a1b8}, tx_error = {tdata = 0x7fe57593a1b8, tsx = 0x2e1bcb0}, rx_msg = {rdata = 0x7fe57593a1b8}, user = {user1 = 0x7fe57593a1b8, user2 = 0x2e1bcb0, user3 = 0x7fe560cd5856 <pj_gettickcount+54>, user4 = 0x7fe57593a1b8}}} tsx = 0x7fe57593a038 #3 0x00007fe560ce3b1f in pj_timer_heap_poll (ht=0x2e1bcb0, next_delay=next_delay@entry=0x7fe55a8c4d70) at ../src/pj/timer.c:643 node = 0x7fe57593a1b8 grp_lock = 0x0 now = {sec = 4625280, msec = 350} count = 1 #4 0x00007fe56cd342bb in pjsip_endpt_handle_events2 (endpt=0x2e1b9c8, max_timeout=max_timeout@entry=0x7fe55a8c4db0, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:713 timeout = {sec = 0, msec = 0} count = 0 net_event_count = 0 c = <optimized out> #5 0x00007fe56cd34387 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7fe55a8c4db0) at ../src/pjsip/sip_endpoint.c:770 #6 0x00007fe55ec12d18 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4029 delay = {sec = 0, msec = 10} #7 0x00007fe560cd4196 in thread_main (param=0x2e237a8) at ../src/pj/os_core_unix.c:541 rec = 0x2e237a8 result = <optimized out> #8 0x00007fe5eeeb061a in start_thread (arg=0x7fe55a8c5700) at pthread_create.c:334 __res = <optimized out> pd = 0x7fe55a8c5700 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140623043385088, 2214477618423267645, 140725615033839, 140623043385088, 8388608, 0, -2201747081483517635, -2202055161909214915}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> pagesize_m1 = <optimized out> sp = <optimized out> freesize = <optimized out> #9 0x00007fe5ee1ec5fd in clone () at /lib64/libc.so.6 {noformat} The thread running at the time was : {noformat} Thread 1 (Thread 0x7fe55a8c5700 (LWP 26817)): #0 0x00007fe560cd97a8 in grp_lock_acquire (p=0x7fe574848758) at ../src/pj/lock.c:290 #1 0x00007fe560cd97f2 in pj_grp_lock_acquire (grp_lock=<optimized out>) at ../src/pj/lock.c:478 #2 0x00007fe56cd45509 in tsx_timer_callback (theap=<optimized out>, entry=0x7fe57593a1b8) at ../src/pjsip/sip_transaction.c:1170 #3 0x00007fe560ce3b1f in pj_timer_heap_poll (ht=0x2e1bcb0, next_delay=next_delay@entry=0x7fe55a8c4d70) at ../src/pj/timer.c:643 #4 0x00007fe56cd342bb in pjsip_endpt_handle_events2 (endpt=0x2e1b9c8, max_timeout=max_timeout@entry=0x7fe55a8c4db0, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:713 #5 0x00007fe56cd34387 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7fe55a8c4db0) at ../src/pjsip/sip_endpoint.c:770 #6 0x00007fe55ec12d18 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4029 #7 0x00007fe560cd4196 in thread_main (param=0x2e237a8) at ../src/pj/os_core_unix.c:541 #8 0x00007fe5eeeb061a in start_thread (arg=0x7fe55a8c5700) at pthread_create.c:334 #9 0x00007fe5ee1ec5fd in clone () at /lib64/libc.so.6 {noformat} Another segfault happened: {noformat} Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/asterisk -f -vvvg -c'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fdb45d69417 in copy_node (ht=0x142b0e0, ht=0x142b0e0, moved_node=0x7fdb506cb730, slot=0) at ../src/pj/timer.c:136 136 ht->timer_ids[moved_node->_timer_id] = (int)slot; [Current thread is 1 (Thread 0x7fdaecd08700 (LWP 29086))] #0 0x00007fdb45d69417 in remove_node (ht=0x142b0e0, ht=0x142b0e0, moved_node=0x7fdb506cb730, slot=0) at ../src/pj/timer.c:136 parent = <optimized out> moved_node = 0x7fdb506cb730 removed_node = 0x7fdb20247030 #1 0x00007fdb45d69417 in remove_node (ht=ht@entry=0x142b0e0, slot=slot@entry=0) at ../src/pj/timer.c:244 parent = <optimized out> moved_node = 0x7fdb506cb730 removed_node = 0x7fdb20247030 #2 0x00007fdb45d69af1 in pj_timer_heap_poll (ht=0x142b0e0, next_delay=next_delay@entry=0x7fdaecd07d70) at ../src/pj/timer.c:630 node = <optimized out> grp_lock = <optimized out> now = {sec = 4458703, msec = 334} count = 8 #3 0x00007fdb472742bb in pjsip_endpt_handle_events2 (endpt=0x142adf8, max_timeout=max_timeout@entry=0x7fdaecd07db0, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:713 timeout = {sec = 0, msec = 0} count = 0 net_event_count = 0 c = <optimized out> #4 0x00007fdb47274387 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7fdaecd07db0) at ../src/pjsip/sip_endpoint.c:770 #5 0x00007fdb440aecc8 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4007 delay = {sec = 0, msec = 10} #6 0x00007fdb45d5a196 in thread_main (param=0x1433b88) at ../src/pj/os_core_unix.c:541 rec = 0x1433b88 result = <optimized out> #7 0x00007fdb813fa61a in start_thread (arg=0x7fdaecd08700) at pthread_create.c:334 __res = <optimized out> pd = 0x7fdaecd08700 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140578252687104, -1527102281778558953, 140726077707343, 140578252687104, 8388608, 0, 1547968542935517207, 1547451494643304471}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> pagesize_m1 = <optimized out> sp = <optimized out> freesize = <optimized out> #8 0x00007fdb807365fd in clone () at /lib64/libc.so.6 {noformat} While running thread: {noformat} Thread 1 (Thread 0x7fdaecd08700 (LWP 29086)): #0 0x00007fdb45d69417 in remove_node (ht=0x142b0e0, ht=0x142b0e0, moved_node=0x7fdb506cb730, slot=0) at ../src/pj/timer.c:136 #1 0x00007fdb45d69417 in remove_node (ht=ht@entry=0x142b0e0, slot=slot@entry=0) at ../src/pj/timer.c:244 #2 0x00007fdb45d69af1 in pj_timer_heap_poll (ht=0x142b0e0, next_delay=next_delay@entry=0x7fdaecd07d70) at ../src/pj/timer.c:630 #3 0x00007fdb472742bb in pjsip_endpt_handle_events2 (endpt=0x142adf8, max_timeout=max_timeout@entry=0x7fdaecd07db0, p_count=p_count@entry=0x0) at ../src/pjsip/sip_endpoint.c:713 #4 0x00007fdb47274387 in pjsip_endpt_handle_events (endpt=<optimized out>, max_timeout=max_timeout@entry=0x7fdaecd07db0) at ../src/pjsip/sip_endpoint.c:770 #5 0x00007fdb440aecc8 in monitor_thread_exec (endpt=<optimized out>) at res_pjsip.c:4007 #6 0x00007fdb45d5a196 in thread_main (param=0x1433b88) at ../src/pj/os_core_unix.c:541 #7 0x00007fdb813fa61a in start_thread (arg=0x7fdaecd08700) at pthread_create.c:334 #8 0x00007fdb807365fd in clone () at /lib64/libc.so.6 {noformat} | ||||
Comments: | By: Asterisk Team (asteriskteam) 2016-12-23 08:04:48.269-0600 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Mark Michelson (mmichelson) 2016-12-28 13:06:40.598-0600 We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis. Thanks! [1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information The backtraces are very helpful, but debug logs from Asterisk at the time of the crash can help us to reproduce and fix the issue better. By: Ross Beer (rossbeer) 2016-12-28 13:42:09.661-0600 I have a patch which I am currently testing for PJSIP. Once confirmed it resolves the crash I will have it committed to gerrit. By: Ross Beer (rossbeer) 2017-01-13 10:44:56.291-0600 Alexei Gradinari provided the attached patch to resolve this issue, I believe the patch resolves this issue. By: Rusty Newton (rnewton) 2017-01-14 14:09:10.982-0600 Awesome that it solves the issue. [~alexei gradinari] will need to attach it to the issue and mark it as a contribution for us to run with it, or Alexei can push it onto Gerrit. By: Joshua C. Colp (jcolp) 2017-01-31 12:07:32.036-0600 [~rossbeer] It would be nice to understand what particularly scenario caused this to happen and how it got there. Is a debug log as originally asked available? By: Ross Beer (rossbeer) 2017-02-01 04:48:59.346-0600 After getting the latest GIT branch of 13 and using the packaged PJSIP I have not experienced this issue again. I will continue to monitor and will update you if the issue happens again. |