[Home]

Summary:ASTERISK-29196: res_pjsip: Segmentation fault
Reporter:Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus))Labels:patch
Date Opened:2020-12-04 09:03:38.000-0600Date Closed:2021-02-18 10:08:25.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip
Versions:18.1.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) 0080-fix-sdp-neg-modify-local-offer.patch
( 1) ast-coredumper-files.tar.gz
Description:Asterisk instances running 18.1.0 are crashing with segmentation fault, nothing is shown in the logs besides the segfault.

After we upgraded to 18.1.0 this issue is happening multiple times a day, going back to 18.0.1 fixed the issue.

{noformat}
(gdb) bt
#0  0x00007fb40787f5bd in pj_strdup (pool=0x7fb37453c490, dst=0x7fb378140b18, src=0x0) at ../include/pj/string_i.h:40
#1  0x00007fb40782806c in pjmedia_sdp_neg_modify_local_offer2 (pool=0x7fb37453c490, neg=0x7fb3682f0f30, flags=1, local=0x7fb3680d1fb8)
   at ../src/pjmedia/sdp_neg.c:336
#2  0x00007fb4077a1d9e in inv_check_sdp_in_incoming_msg (inv=0x7fb3682f0c68, tsx=0x7fb3a8154208, rdata=0x7fb38832d3b8)
   at ../src/pjsip-ua/sip_inv.c:2084
#3  0x00007fb4077a5bbd in inv_on_state_early (inv=0x7fb3682f0c68, e=0x7fb2f4ad7a70) at ../src/pjsip-ua/sip_inv.c:4447
#4  0x00007fb40779f4a3 in mod_inv_on_tsx_state (tsx=0x7fb3a8154208, e=0x7fb2f4ad7a70) at ../src/pjsip-ua/sip_inv.c:736
#5  0x00007fb4077ec047 in pjsip_dlg_on_tsx_state (dlg=0x7fb3960c48a8, tsx=0x7fb3a8154208, e=0x7fb2f4ad7a70) at ../src/pjsip/sip_dialog.c:2129
#6  0x00007fb4077ec8b9 in mod_ua_on_tsx_state (tsx=0x7fb3a8154208, e=0x7fb2f4ad7a70) at ../src/pjsip/sip_ua_layer.c:178
#7  0x00007fb4077e499a in tsx_set_state (tsx=0x7fb3a8154208, state=PJSIP_TSX_STATE_PROCEEDING, event_src_type=PJSIP_EVENT_RX_MSG,
   event_src=0x7fb38832d3b8, flag=0) at ../src/pjsip/sip_transaction.c:1272
#8  0x00007fb4077e75a6 in tsx_on_state_proceeding_uac (tsx=0x7fb3a8154208, event=0x7fb2f4ad7b60) at ../src/pjsip/sip_transaction.c:2975
#9  0x00007fb4077e58fe in pjsip_tsx_recv_msg (tsx=0x7fb3a8154208, rdata=0x7fb38832d3b8) at ../src/pjsip/sip_transaction.c:1832
#10 0x00007fb4077e3ec1 in mod_tsx_layer_on_rx_response (rdata=0x7fb38832d3b8) at ../src/pjsip/sip_transaction.c:893
#11 0x00007fb4077c8d4d in pjsip_endpt_process_rx_data (endpt=0x55dc325f63b8, rdata=0x7fb38832d3b8, p=0x7fb3a5c32b00, p_handled=0x7fb2f4ad7c94)
   at ../src/pjsip/sip_endpoint.c:938
#12 0x00007fb3a5c00b33 in ?? () from /usr/lib/asterisk/modules/res_pjsip.so
#13 0x0000000000000000 in ?? ()
(gdb) bt full
#0  0x00007fb40787f5bd in pj_strdup (pool=0x7fb37453c490, dst=0x7fb378140b18, src=0x0) at ../include/pj/string_i.h:40
No locals.
#1  0x00007fb40782806c in pjmedia_sdp_neg_modify_local_offer2 (pool=0x7fb37453c490, neg=0x7fb3682f0f30, flags=1, local=0x7fb3680d1fb8)
   at ../src/pjmedia/sdp_neg.c:336
       new_offer = 0x7fb378140b18
       old_offer = 0x0
       media_used = '\000' <repeats 15 times>
       oi = 0
       status = 0
#2  0x00007fb4077a1d9e in inv_check_sdp_in_incoming_msg (inv=0x7fb3682f0c68, tsx=0x7fb3a8154208, rdata=0x7fb38832d3b8)
   at ../src/pjsip-ua/sip_inv.c:2084
       reoffer_sdp = 0x7fb3680d1fb8
       res_tag = {ptr = 0x7fb38833dd10 "_zE6W-HIusQmE8XhyLrtlBPE-PoerQAq\260\t\001", slen = 32}
       st_code = 183
       tsx_inv_data = 0x7fb3696e1ef8
       status = 32690
       msg = 0x7fb38833d710
       sdp_info = 0x7fb3881df488
#3  0x00007fb4077a5bbd in inv_on_state_early (inv=0x7fb3682f0c68, e=0x7fb2f4ad7a70) at ../src/pjsip-ua/sip_inv.c:4447
       tsx = 0x7fb3a8154208
       dlg = 0x7fb3960c48a8
#4  0x00007fb40779f4a3 in mod_inv_on_tsx_state (tsx=0x7fb3a8154208, e=0x7fb2f4ad7a70) at ../src/pjsip-ua/sip_inv.c:736
       dlg = 0x7fb3960c48a8
       inv = 0x7fb3682f0c68
#5  0x00007fb4077ec047 in pjsip_dlg_on_tsx_state (dlg=0x7fb3960c48a8, tsx=0x7fb3a8154208, e=0x7fb2f4ad7a70) at ../src/pjsip/sip_dialog.c:2129
       i = 3
#6  0x00007fb4077ec8b9 in mod_ua_on_tsx_state (tsx=0x7fb3a8154208, e=0x7fb2f4ad7a70) at ../src/pjsip/sip_ua_layer.c:178
       dlg = 0x7fb3960c48a8
#7  0x00007fb4077e499a in tsx_set_state (tsx=0x7fb3a8154208, state=PJSIP_TSX_STATE_PROCEEDING, event_src_type=PJSIP_EVENT_RX_MSG,
   event_src=0x7fb38832d3b8, flag=0) at ../src/pjsip/sip_transaction.c:1272
       e = {prev = 0x7fb2f4ad7aa0, next = 0x7fb40787f60b <pj_strdup+118>, type = PJSIP_EVENT_TSX_STATE, body = {timer = {
             entry = 0x7fb38832d3b8}, tsx_state = {src = {rdata = 0x7fb38832d3b8, tdata = 0x7fb38832d3b8, timer = 0x7fb38832d3b8,
               status = -2009934920, data = 0x7fb38832d3b8}, tsx = 0x7fb3a8154208, prev_state = 3, type = PJSIP_EVENT_RX_MSG}, tx_msg = {
             tdata = 0x7fb38832d3b8}, tx_error = {tdata = 0x7fb38832d3b8, tsx = 0x7fb3a8154208}, rx_msg = {rdata = 0x7fb38832d3b8}, user = {
             user1 = 0x7fb38832d3b8, user2 = 0x7fb3a8154208, user3 = 0x300000003, user4 = 0x7fb2f4ad7ad0}}}
       prev_state = PJSIP_TSX_STATE_PROCEEDING
--Type <RET> for more, q to quit, c to continue without paging--
#8  0x00007fb4077e75a6 in tsx_on_state_proceeding_uac (tsx=0x7fb3a8154208, event=0x7fb2f4ad7b60) at ../src/pjsip/sip_transaction.c:2975
No locals.
#9  0x00007fb4077e58fe in pjsip_tsx_recv_msg (tsx=0x7fb3a8154208, rdata=0x7fb38832d3b8) at ../src/pjsip/sip_transaction.c:1832
       event = {prev = 0x7fb2f4ad7bb0, next = 0x7fb3a897fdf8, type = PJSIP_EVENT_RX_MSG, body = {timer = {entry = 0x7fb38832d3b8}, tsx_state = {
             src = {rdata = 0x7fb38832d3b8, tdata = 0x7fb38832d3b8, timer = 0x7fb38832d3b8, status = -2009934920, data = 0x7fb38832d3b8},
             tsx = 0x7fb2f4ad7bb0, prev_state = 126258385, type = 32692}, tx_msg = {tdata = 0x7fb38832d3b8}, tx_error = {
             tdata = 0x7fb38832d3b8, tsx = 0x7fb2f4ad7bb0}, rx_msg = {rdata = 0x7fb38832d3b8}, user = {user1 = 0x7fb38832d3b8,
             user2 = 0x7fb2f4ad7bb0, user3 = 0x7fb407868cd1 <pj_mutex_unlock+84>, user4 = 0x7fb2f4ad7bb0}}}
#10 0x00007fb4077e3ec1 in mod_tsx_layer_on_rx_response (rdata=0x7fb38832d3b8) at ../src/pjsip/sip_transaction.c:893
       key = {ptr = 0x7fb3881df450 "c$z9hG4bKPjea8a882c-11d6-4b3d-8e1b-4d37e4b6ddc9", slen = 47}
       hval = 2807140748
       tsx = 0x7fb3a8154208
#11 0x00007fb4077c8d4d in pjsip_endpt_process_rx_data (endpt=0x55dc325f63b8, rdata=0x7fb38832d3b8, p=0x7fb3a5c32b00, p_handled=0x7fb2f4ad7c94)
   at ../src/pjsip/sip_endpoint.c:938
       msg = 0x7fb38833d710
       def_prm = {start_prio = 4105010352, start_mod = 0x55dc2fefb633 <__ao2_unlock+245>, idx_after_start = 4105010352, silent = 32690}
       mod = 0x7fb4078d8a20 <mod_tsx_layer>
       handled = 0
       i = 1
       status = 21980
#12 0x00007fb3a5c00b33 in ?? () from /usr/lib/asterisk/modules/res_pjsip.so
No symbol table info available.
#13 0x0000000000000000 in ?? ()
No symbol table info available.
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2020-12-04 09:03:39.454-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Sean Bright (seanbright) 2020-12-07 11:45:14.558-0600

Thank you for the crash report. However, we need more information to investigate the crash. Please provide:

1. A backtrace generated from a core dump using the instructions provided on the Asterisk wiki [1].
2. Specific steps taken that lead to the crash.
3. All configuration information necesary to reproduce the crash.

Thanks!

[1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace



By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-12-08 05:51:27.019-0600

Hi Sean,

1. I've sent via email a link to the ast_coredumper files alongside the core dump itself.
2. The only instances that are crashing with the version 18.1.0 are the ones where calls are placed, the ones that keep the agents are not affected (yes, we have separate instances for each). That's about all the details that I have, they crash randomly and as shown in the logs that I attached earlier there's no relevant log messages.
3. Unfortunatelly I do not have a reproducible case, but for us it's confirmed that is something with the 18.1.0 release, going back to 18.0.1 fixed the issue.

By: Joshua C. Colp (jcolp) 2020-12-08 05:53:37.335-0600

Sean does not work for Sangoma or have access to such information. If you wish to provide him directly with such information, you would have to work it out with him. This is why it is highly preferred to attach information when possible.

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-12-08 06:59:02.356-0600

Thank you for explaining Joshua, I've attached the ast coredumper files to this issue.

By: Joshua C. Colp (jcolp) 2020-12-09 04:19:25.119-0600

If this can be reproduced can you please also attach a SIP trace as well as logging[1]. The issue is in SIP and SDP negotiation, so knowing the precise interaction would be good there to know what is going on.

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2020-12-09 10:38:03.822-0600

Unfortunately we don't have such information, we are unable to reproduce the issue in our staging environment and the production builds are already rolled back to asterisk 18.0.1 where the problem is not present.

By: Joshua C. Colp (jcolp) 2021-02-05 05:47:56.131-0600

I've been able to reproduce the underlying issue and understand what is going on. The cause of this is a race condition that appears to have been around for quite a long time, so something in the newer version likely altered timing for you slightly enough to cause it. It's to do with SDP as I originally suspected and requires particular circumstances.

This is an issue in PJSIP itself and not in Asterisk. I've attached a patch which can be placed into the "third-party/pjproject/patches" directory of Asterisk which should resolve it. Asterisk will need a fresh build though.

I've started an email dialog with Teluu to discuss the patch, and it is also going up for review.

Please do not disclose this or discuss it elsewhere as we are treating this as a security issue.

An additional note - I don't know what your deployment is like exactly, but this does require a failed SDP negotiation to occur as a result of a 183 Session Progress. That is: Asterisk sends an INVITE and receives a 183 Session Progress with SDP that it can't negotiate. This may indicate an issue elsewhere in your deployment.

By: Joshua C. Colp (jcolp) 2021-02-08 04:03:47.544-0600

Attaching patch from Teluu.

By: Diogo Hartmann (Mauri de Souza Meneguzzo (3CPlus)) 2021-02-08 06:04:37.710-0600

Thanks @Joshua. We will test the patch as soon as possible.

By: Friendly Automation (friendly-automation) 2021-02-18 10:08:26.195-0600

Change 15439 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15439|https://gerrit.asterisk.org/c/asterisk/+/15439]

By: Friendly Automation (friendly-automation) 2021-02-18 10:09:13.556-0600

Change 15461 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15461|https://gerrit.asterisk.org/c/asterisk/+/15461]

By: Friendly Automation (friendly-automation) 2021-02-18 10:09:32.324-0600

Change 15438 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15438|https://gerrit.asterisk.org/c/asterisk/+/15438]

By: Friendly Automation (friendly-automation) 2021-02-18 10:09:46.229-0600

Change 15437 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15437|https://gerrit.asterisk.org/c/asterisk/+/15437]

By: Friendly Automation (friendly-automation) 2021-02-18 10:10:03.797-0600

Change 15436 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15436|https://gerrit.asterisk.org/c/asterisk/+/15436]

By: Friendly Automation (friendly-automation) 2021-02-18 10:10:19.467-0600

Change 15435 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15435|https://gerrit.asterisk.org/c/asterisk/+/15435]

By: Friendly Automation (friendly-automation) 2021-02-18 10:10:32.979-0600

Change 15449 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15449|https://gerrit.asterisk.org/c/asterisk/+/15449]

By: Friendly Automation (friendly-automation) 2021-02-18 10:10:58.159-0600

Change 15434 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15434|https://gerrit.asterisk.org/c/asterisk/+/15434]

By: Friendly Automation (friendly-automation) 2021-02-18 10:16:37.930-0600

Change 15440 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15440|https://gerrit.asterisk.org/c/asterisk/+/15440]

By: Friendly Automation (friendly-automation) 2021-02-18 10:16:48.587-0600

Change 15462 merged by Joshua Colp:
pjsip: Make modify_local_offer2 tolerate previous failed SDP.

[https://gerrit.asterisk.org/c/asterisk/+/15462|https://gerrit.asterisk.org/c/asterisk/+/15462]