[Home]

Summary:ASTERISK-29411: Crash in pjsip_msg_find_hdr_by_name
Reporter:LA (learbia)Labels:patch pjsip
Date Opened:2021-04-30 12:42:43Date Closed:2022-03-30 15:13:04
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip_nat
Versions:18.9.0 19.1.0 Frequency of
Occurrence
Constant
Related
Issues:
is duplicated byASTERISK-29919 CRASH ASTERISK
is related toASTERISK-29535 Segmentation fault in libasteriskpj.so.2
Environment:Attachments:( 0) core-brief.txt
( 1) core-info.txt
( 2) core-locks.txt
( 3) core-thread1.txt
( 4) Log_full_asterisk_complete.txt
( 5) use_tdata_pool.diff
Description:Link with original gdb https://drive.google.com/file/d/1TjoMHSIOnhJNg4Q8ytid5g0_SUzMKCrB/view?usp=sharing
Comments:By: Asterisk Team (asteriskteam) 2021-04-30 12:42:43.684-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Joshua C. Colp (jcolp) 2021-05-02 10:19:53.458-0500

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: LA (learbia) 2021-05-02 11:22:49.127-0500

The hour of crash is Apr 30 14:21:07 VERBOSE[35982]

I Started asterisk again in Apr 30 14:23:18


Attachment

/var/log/asterisk/full
core-info.txt
core-locks.txt
core-full.txt
core-brief.txt


By: Joshua C. Colp (jcolp) 2021-05-02 12:22:53.215-0500

The attached full log contains only verbose messages, and not the debug messages as requested.

By: LA (learbia) 2021-05-03 15:19:11.508-0500

Done



By: Benjamin Keith Ford (bford) 2021-05-04 11:38:12.410-0500

The logs didn't have any information on what's going on. From the backtrace, it looks like something is going on in pjproject when checking headers. Can we get a core file to examine in gdb?

By: LA (learbia) 2021-05-04 12:30:08.999-0500

Yes: Link with core GDB https://drive.google.com/file/d/1TjoMHSIOnhJNg4Q8ytid5g0_SUzMKCrB/view?usp=sharing

By: Benjamin Keith Ford (bford) 2021-05-04 14:02:18.123-0500

Thanks! I've uploaded the core file to our internal drive. You can remove it from yours now if you'd like. Also created an internal issue to investigate this further.

By: Nathan Bruning (nathanb) 2021-06-28 06:08:15.489-0500

We believe this is the same segfault as we're seeing in ASTERISK-27952.

This crash happens about once a week for our asterisk cluster. We've seen the crash happening in asterisk 15.3.0, then 16.3.0, then 16.13.0.

What we've gathered so far:
- During most of the crashes, we see some kind of network connectivity issues.
- {{msg}} contains this header list: {{Via,From,To,Contact,Callid,Cseq,Allow,Supported,Session-Expires,Min-SE,<corrupt>}}
- We have set a custom header on the call, which is not present in the header list
- This happens during an outbound dial attempt

We've tried reproducing by simulating packet loss, but no success so far.

Any updates in https://jira.digium.com/browse/SWP-11499 perhaps?


By: Joshua C. Colp (jcolp) 2021-06-28 06:14:41.718-0500

Any updates are posted on the ASTERISK issue, there is nothing on the SWP.

By: LA (learbia) 2021-07-06 16:10:40.627-0500

I did more tests and this happen when I lost connection with Internet. Maybe Reverse DNS , or DNS LOST

By: LA (learbia) 2021-10-22 10:32:35.182-0500

Always we have problem with connections, (DNS, Lost pacteks) we have a crash.

By: LA (learbia) 2021-10-22 10:34:24.918-0500

The same problem related by @Nathan Bruning

By: LA (learbia) 2022-02-02 15:26:59.100-0600

Tests with last version, we have the same problems:
{noformat}
!@!@!@! thread1.txt !@!@!@!

$1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {33, 0 <repeats 27 times>}, _kill = {si_pid = 33, si_uid = 0}, _timer = {si_tid = 33, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 33, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 33, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x21}, _sigpoll = {si_band = 33, si_fd = 0}}}
Signal        Stop      Print   Pass to program Description
SIGSEGV       Yes       Yes     Yes             Segmentation fault

Thread 1 (Thread 0x7fd4f48d2700 (LWP 2756)):
#0  pj_stricmp (str1=str1@entry=0x19, str2=str2@entry=0x7fd4f48d1bc0) at ../include/pj/string_i.h:216
No locals.
#1  0x00007fdabbe25164 in pjsip_msg_find_hdr_by_name (msg=0x7fd88403ea80, name=name@entry=0x7fd4f48d1bc0, start=start@entry=0x0) at ../src/pjsip/sip_msg.c:362
       hdr = 0x1
       end = 0x7fd88403eaa8
#2  0x00007fda21e9778b in add_headers_to_message (headers=headers@entry=0x7fda220b9bc0 <request_headers>, tdata=0x7fd8843e9598) at res_pjsip/pjsip_global_headers.c:93
       name = {ptr = 0x7fd9b026f1aa "Max-Forwards", slen = 12}
       hdr = <optimized out>
       iter = 0x7fd9b026f140
       lock = 0x7fda220b9bc0 <request_headers>
       __PRETTY_FUNCTION__ = "add_headers_to_message"
#3  0x00007fda21e97843 in add_request_headers (tdata=<optimized out>) at res_pjsip/pjsip_global_headers.c:105
No locals.
#4  0x00007fdabbe2df76 in endpt_on_tx_msg (endpt=<optimized out>, tdata=0x7fd8843e9598) at ../src/pjsip/sip_endpoint.c:1116
       status = 0
       mod = 0x7fda220b8e40 <global_header_mod>
#5  0x00007fdabbe35120 in pjsip_transport_send (tr=0x7058c48, tdata=0x7fd8843e9598, addr=0x7fd8843e9788, addr_len=16, token=0x7fd88403f480, cb=0x7fdabbe2f730 <stateless_send_transport_cb>) at ../src/pjsip/sip_transport.c:931
       status = <optimized out>
#6  0x00007fdabbe2f8d0 in stateless_send_transport_cb (token=0x7fd88403f480, tdata=0x7fd8843e9598, sent=16) at ../src/pjsip/sip_util.c:1277
       cont = 1
       cur_addr = 0x7fd8843e9788
       cur_addr_len = 16
       via = 0x7fd88403eb90
       need_update_via = 1
#7  0x00007fdabbe2fc92 in stateless_send_resolver_callback (status=<optimized out>, token=0x7fd88403f480, addr=<optimized out>) at ../src/pjsip/sip_util.c:1378
       stateless_data = 0x7fd88403f480
       tdata = 0x7fd8843e9598
#8  0x00007fda21e7f374 in sip_resolve_invoke_user_callback (data=0x7fd884100eb8) at res_pjsip/pjsip_resolver.c:208
       __FUNCTION__ = "sip_resolve_invoke_user_callback"
       __PRETTY_FUNCTION__ = "sip_resolve_invoke_user_callback"
#9  0x000000000059e11e in ast_taskprocessor_execute (tps=tps@entry=0x7fda6c5ee4f0) at taskprocessor.c:1237
       local = {local_data = 0x7fd9ac2b6420, data = 0x7fda6c5ee4f0}
       t = 0x7fda4c01a4a0
       __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
{noformat}

By: N A (InterLinked) 2022-02-10 06:20:43.599-0600

FYI, this backtrace from ASTERISK-29846 appears to be related to this issue: https://issues.asterisk.org/jira/secure/attachment/61194/core-asterisk-2022-02-10T00-46-55Z-full.txt

By: Kevin Harwell (kharwell) 2022-02-28 12:59:46.484-0600

The previous core dump didn't have any of the supporting Asterisk modules, so I'm unable to load it locally. Next time it crashes please use the {{ast_coredumper}} script to collect the full core file and libraries, and then host the output somewhere accessible. Here's an example of the command to run after a crash:
{noformat}
$ /var/lib/asterisk/scripts/ast_coredumper --tarball-coredumps --no-default-search <path_to_coredump>
{noformat}
Also what OS is in use?

Thanks!

By: LA (learbia) 2022-03-05 14:24:50.101-0600

I use debian

This a full dump

https://drive.google.com/file/d/1F-GO25T-LH5va9A31ujTWkvfWIYBGppV/view?usp=sharing

By: Kevin Harwell (kharwell) 2022-03-09 10:48:15.148-0600

[~learbia]

Thanks for the full dump it helped!

I have still been unable to replicate the issue, but after some investigation I found an area of suspect in the code. Please apply the attached [^use_tdata_pool.diff] patch, and let me know if it fixes the issue for you.

By: LA (learbia) 2022-03-09 11:10:34.927-0600

Ok, I will apply. Tks.

By: LA (learbia) 2022-03-17 10:00:10.202-0500

Today I have a crash:

traps: asterisk[14122] general protection ip:45edfc sp:7fd3c9fc43a0 error:0 in asterisk[400000+2f0000]

By: Kevin Harwell (kharwell) 2022-03-17 10:39:48.340-0500

Argh was hoping that would have fixed it. Thanks for testing.

When you get a chance could you run {{ast_coredumper}} against it and upload the files someplace again?

By: LA (learbia) 2022-03-17 13:09:24.155-0500

Yes, I will to do it.

I think it could be another problem

By: Kevin Harwell (kharwell) 2022-03-17 13:23:25.161-0500

If you feel fairly confident that the problem is something else then go ahead and create a new issue for that, and add the core files/backtrace information there. As well as any relevant logging if you have it.

By: Kevin Harwell (kharwell) 2022-03-23 18:01:34.320-0500

I've put the attached patch up for review as it seems like it might have fixed this problem (even if later found to not be the exact cause it's something that still needed to be fixed).

Note, once the patch is submitted this issue will auto-close. If later it's found the issue is still a problem then commenting as such will reopen the issue.

By: LA (learbia) 2022-03-25 11:28:58.591-0500

What do you think?
{noformat}
!@!@!@! thread1.txt !@!@!@!

$1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {1, 0 <repeats 27 times>}, _kill = {si_pid = 1, si_uid = 0}, _timer = {si_tid = 1, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 1, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 1, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x1}, _sigpoll = {si_band = 1, si_fd = 0}}}
Signal        Stop      Print   Pass to program Description
SIGSEGV       Yes       Yes     Yes             Segmentation fault

Thread 1 (Thread 0x7fb5a98ef700 (LWP 5307)):
#0  0x000000000045ec1c in __ao2_ref (user_data=0x5, delta=1, tag=tag@entry=0x0, file=file@entry=0x7fb682484452 "res_pjsip_session.c", line=3639, func=0x7fb682487f40 <__PRETTY_FUNCTION__.35305> "ast_sip_dialog_get_session") at astobj2.c:501
       obj = 0xffffffffffffffed
       current_value = -2109178048
       weakproxy = 0x7fb6e4245798
       __PRETTY_FUNCTION__ = "__ao2_ref"
#1  0x00007fb682483272 in ast_sip_dialog_get_session (dlg=<optimized out>) at res_pjsip_session.c:3639
       inv_session = <optimized out>
       session = 0x5
       __PRETTY_FUNCTION__ = "ast_sip_dialog_get_session"
#2  0x00007fb6824839a2 in session_outgoing_nat_hook (tdata=0x7fb4993af628, transport=0x7fb6e4245798) at res_pjsip_session.c:5486
       transport_state = 0x7fb6e4245c30
       hook = 0x0
       sdp_info = 0x7fb550843de8
       dlg = 0x7fb550843de8
       session = 0x0
       __FUNCTION__ = "session_outgoing_nat_hook"
       __PRETTY_FUNCTION__ = "session_outgoing_nat_hook"
#3  0x00007fb6822635d9 in nat_invoke_hook (obj=<optimized out>, arg=<optimized out>, flags=<optimized out>) at res_pjsip_nat.c:300
       hook = <optimized out>
       details = <optimized out>
#4  0x00000000004614dd in internal_ao2_traverse (self=self@entry=0x7fb5b13dc348, flags=flags@entry=OBJ_SEARCH_NONE, cb_fn=cb_fn@entry=0x7fb6822635c0 <nat_invoke_hook>, arg=arg@entry=0x7fb5a98eeb00, tag=tag@entry=0x0, file=file@entry=0x7fb6822649b9 "res_pjsip_nat.c", line=471, func=0x7fb682264e5e <__PRETTY_FUNCTION__.26624> "process_nat", type=AO2_CALLBACK_DEFAULT, data=0x0) at astobj2_container.c:328
       match = 3
       ret = 0x0
       cb_default = 0x7fb6822635c0 <nat_invoke_hook>
       node = 0x7fb5b02d45d8
       traversal_state = 0x7fb5a98ee9a0
       orig_lock = AO2_LOCK_REQ_MUTEX
       multi_container = 0x0
       multi_iterator = 0x0
#5  0x0000000000461965 in __ao2_callback (c=c@entry=0x7fb5b13dc348, flags=flags@entry=OBJ_SEARCH_NONE, cb_fn=cb_fn@entry=0x7fb6822635c0 <nat_invoke_hook>, arg=arg@entry=0x7fb5a98eeb00, tag=tag@entry=0x0, file=file@entry=0x7fb6822649b9 "res_pjsip_nat.c", line=471, func=0x7fb682264e5e <__PRETTY_FUNCTION__.26624> "process_nat") at astobj2_container.c:414
No locals.
{noformat}


By: Kevin Harwell (kharwell) 2022-03-25 11:41:19.820-0500

That's crashing in a different spot. It looks like the Asterisk sip session itself has gone way. It is possible though that while the attached patch might have fixed the original issue it uncovered another crash further along.

Either way I'd say go ahead and create a new issue for this new crash. When you do please attach the relevant backtrace and any other information you might think would be helpful.

Thanks!

By: LA (learbia) 2022-03-25 14:58:23.974-0500

Thank you so much. I will create a new issue.

By: Friendly Automation (friendly-automation) 2022-03-30 15:13:05.568-0500

Change 18253 merged by Friendly Automation:
res_pjsip_header_funcs: wrong pool used tdata headers

[https://gerrit.asterisk.org/c/asterisk/+/18253|https://gerrit.asterisk.org/c/asterisk/+/18253]

By: Friendly Automation (friendly-automation) 2022-03-30 15:13:56.959-0500

Change 18273 merged by Friendly Automation:
res_pjsip_header_funcs: wrong pool used tdata headers

[https://gerrit.asterisk.org/c/asterisk/+/18273|https://gerrit.asterisk.org/c/asterisk/+/18273]

By: Friendly Automation (friendly-automation) 2022-03-30 15:15:53.058-0500

Change 18272 merged by Kevin Harwell:
res_pjsip_header_funcs: wrong pool used tdata headers

[https://gerrit.asterisk.org/c/asterisk/+/18272|https://gerrit.asterisk.org/c/asterisk/+/18272]

By: Friendly Automation (friendly-automation) 2022-03-30 16:09:31.675-0500

Change 18274 merged by Kevin Harwell:
res_pjsip_header_funcs: wrong pool used tdata headers

[https://gerrit.asterisk.org/c/asterisk/+/18274|https://gerrit.asterisk.org/c/asterisk/+/18274]