[Home]

Summary:ASTERISK-25823: SIGSEGV, Segmentation fault. - ../sysdeps/x86_64/strlen.S: No such file or directory.
Reporter:Andreas Krüger (woopstar)Labels:
Date Opened:2016-03-01 06:35:22.000-0600Date Closed:2017-04-27 16:18:09
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip Resources/res_pjsip_caller_id
Versions:13.7.2 Frequency of
Occurrence
Constant
Related
Issues:
duplicatesASTERISK-25337 Crash on PJSIP_HEADER Add P-Asserted-Identity when calling from Gosub
Environment:Ubuntu 14.04 - Asterisk 13.7.2Attachments:
Description:Asterisk is crashing with the following error, when we're trying to transfer a call. It seems it tries to call strlen which is either not available or the variable is null?

{code}
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x00007fffaa40e64f in modify_id_header (pool=0x7fffec006ea0, id=id@entry=0x7fffb135fa90, id_hdr=0x7fffec007588) at res_pjsip_caller_id.c:415
#2  0x00007fffaa40ee6a in caller_id_outgoing_request (session=0x7fffec00d3c8, tdata=0x7fffec006f48) at res_pjsip_caller_id.c:683
#3  0x00007fffb2513e98 in handle_outgoing_request (session=0x7fffec00d3c8, tdata=0x7fffec006f48) at res_pjsip_session.c:2251
#4  0x00007fffb25157f3 in ast_sip_session_send_request_with_cb (session=0x7fffec00d3c8, tdata=0x7fffec006f48, on_response=<optimized out>) at res_pjsip_session.c:1089
#5  0x00007fff9d2a68d6 in call (data=0x7fffb800fe18) at chan_pjsip.c:1658
#6  0x00000000005e936c in ast_taskprocessor_execute (tps=0x7fffec00ddd8) at taskprocessor.c:784
#7  0x00000000005f22cb in execute_tasks (data=0x7fffec00ddd8) at threadpool.c:1320
#8  0x00000000005e936c in ast_taskprocessor_execute (tps=0xaaaa08) at taskprocessor.c:784
#9  0x00000000005f0412 in threadpool_execute (pool=0xaaac08) at threadpool.c:351
#10 0x00000000005f1be6 in worker_active (worker=0x7fffe0000f28) at threadpool.c:1103
#11 0x00000000005f19a3 in worker_start (arg=0x7fffe0000f28) at threadpool.c:1023
#12 0x00000000005fdf6e in dummy_start (data=0x7fffe0000eb0) at utils.c:1237
#13 0x00007ffff60580a5 in start_thread (arg=0x7fffb1360700) at pthread_create.c:309
#14 0x00007ffff563bcfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{code}


And the full bt is here:

{code}
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
No locals.
#1  0x00007fffaa40e64f in modify_id_header (pool=0x7fffec006ea0, id=id@entry=0x7fffb135fa90, id_hdr=0x7fffec007588) at res_pjsip_caller_id.c:415
       name_buf_len = <optimized out>
       name_buf = <optimized out>
       id_name_addr = 0x7fffec007610
#2  0x00007fffaa40ee6a in caller_id_outgoing_request (session=0x7fffec00d3c8, tdata=0x7fffec006f48) at res_pjsip_caller_id.c:683
       from = 0x7fffec007588
       dlg = 0x7fffec00e3c8
       effective_id = {name = {str = 0x0, char_set = 1, presentation = 0, valid = 1 '\001'}, number = {str = 0x7fffb8024f70 "22343661", plan = 0, presentation = 0, valid = 1 '\001'}, subaddress = {str = 0x0,
           type = 0, odd_even_indicator = 0 '\000', valid = 0 '\000'}, tag = 0x0}
       connected_id = {name = {str = 0x0, char_set = 1, presentation = 0, valid = 1 '\001'}, number = {str = 0x7fffec0205a0 "22343661", plan = 0, presentation = 0, valid = 1 '\001'}, subaddress = {str = 0x0,
           type = 0, odd_even_indicator = 0 '\000', valid = 0 '\000'}, tag = 0x0}
       tdata = 0x7fffec006f48
       session = 0x7fffec00d3c8
#3  0x00007fffb2513e98 in handle_outgoing_request (session=0x7fffec00d3c8, tdata=0x7fffec006f48) at res_pjsip_session.c:2251
       supplement = 0x7fffec00e270
       req = {method = {id = PJSIP_INVITE_METHOD, name = {ptr = 0x7fffb3bb819b "INVITE", slen = 6}}, uri = 0x7fffec007480}
       __PRETTY_FUNCTION__ = "handle_outgoing_request"
#4  0x00007fffb25157f3 in ast_sip_session_send_request_with_cb (session=0x7fffec00d3c8, tdata=0x7fffec006f48, on_response=<optimized out>) at res_pjsip_session.c:1089
       on_response = <optimized out>
       tdata = 0x7fffec006f48
       session = 0x7fffec00d3c8
       inv_session = <optimized out>
#5  0x00007fff9d2a68d6 in call (data=0x7fffb800fe18) at chan_pjsip.c:1658
       channel = 0x7fffb800fe18
       session = 0x7fffec00d3c8
       pvt = <optimized out>
       tdata = 0x7fffec006f48
       res = 0
#6  0x00000000005e936c in ast_taskprocessor_execute (tps=0x7fffec00ddd8) at taskprocessor.c:784
       local = {local_data = 0x0, data = 0x5fc633 <ast_threadstorage_set_ptr+60>}
       t = 0x7fffb80015f0
       size = 1
       __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#7  0x00000000005f22cb in execute_tasks (data=0x7fffec00ddd8) at threadpool.c:1320
       tps = 0x7fffec00ddd8
#8  0x00000000005e936c in ast_taskprocessor_execute (tps=0xaaaa08) at taskprocessor.c:784
       local = {local_data = 0x0, data = 0xaaabe8}
       t = 0x7fffb8024f00
       size = 11185160
       __PRETTY_FUNCTION__ = "ast_taskprocessor_execute"
#9  0x00000000005f0412 in threadpool_execute (pool=0xaaac08) at threadpool.c:351
       __PRETTY_FUNCTION__ = "threadpool_execute"
#10 0x00000000005f1be6 in worker_active (worker=0x7fffe0000f28) at threadpool.c:1103
       alive = 0
#11 0x00000000005f19a3 in worker_start (arg=0x7fffe0000f28) at threadpool.c:1023
       worker = 0x7fffe0000f28
       __PRETTY_FUNCTION__ = "worker_start"
#12 0x00000000005fdf6e in dummy_start (data=0x7fffe0000eb0) at utils.c:1237
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {140736166496000, -2886013801213841322, 1, 0, 140736166496704, 140736166496000, -2886013801239007146, 2886160201149067350}, __mask_was_saved = 0}},
         __pad = {0x7fffb135fef0, 0x0, 0x0, 0x0}}
       __cancel_routine = 0x451320 <ast_unregister_thread>
       __cancel_arg = 0x7fffb1360700
       __not_first_call = 0
       ret = 0x0
       a = {start_routine = 0x5f191c <worker_start>, data = 0x7fffe0000f28, name = 0x7fffe00008f0 "worker_start         started at [ 1077] threadpool.c worker_thread_start()"}
#13 0x00007ffff60580a5 in start_thread (arg=0x7fffb1360700) at pthread_create.c:309
       __res = <optimized out>
       pd = 0x7fffb1360700
       now = <optimized out>
       unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140736166496000, 2886160671884849238, 1, 0, 140736166496704, 140736166496000, -2886013801215938474, -2886139004426899370}, mask_was_saved = 0}}, priv = {pad = {
             0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
       not_first_call = <optimized out>
       pagesize_m1 = <optimized out>
       sp = <optimized out>
       freesize = <optimized out>
       __PRETTY_FUNCTION__ = "start_thread"
#14 0x00007ffff563bcfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{code}

It seems it was trigged, if we did not set the name on the channel. When we configured the name property on the channel, everything worked. So i guess according to Asterisk, the problem is in the file {code}res_pjsip_caller_id.c{code} at line 415. The validation check {code}if (id->name.valid) {{code} is true, but {code}id->name.str{code} is 0x0 when name is not set. This causes strlen to fail in the line {code}int name_buf_len = strlen(id->name.str) * 2 + 1;{code}


The following code was bogus:
{code}
// OutboundCallerID is fetched from mysql though ODBC, example: 22556644
if ("${OutboundCallerID}" != "") {
   Set(CALLERID(num)=${OutboundCallerID});
}
       
Dial(PJSIP/${number}@${TrunkName},${DIALTIMEOUT},${DIALOPTIONS}U(onConnect,${CallInfoId}));
{code}

And the following works:
{code}
// OutboundCallerID is fetched from mysql though ODBC, example: 22556644
if ("${OutboundCallerID}" != "") {
   Set(CALLERID(num)=${OutboundCallerID});
   Set(CALLERID(name)=${OutboundCallerID});
}
       
Dial(PJSIP/${number}@${TrunkName},${DIALTIMEOUT},${DIALOPTIONS}U(onConnect,${CallInfoId}));
{code}


Regarding the "pjsip set logger on" - it never reaches to this. No SIP packages is set, so this is just empty.
Comments:By: Asterisk Team (asteriskteam) 2016-03-01 06:35:23.622-0600

The severity of this issue has been automatically downgraded from "Blocker" to "Major". The "Blocker" severity is reserved for issues which have been determined to block the next release of Asterisk. This severity can only be set by privileged users. If this issue is deemed to block the next release it will be updated accordingly during the triage process.

By: Asterisk Team (asteriskteam) 2016-03-01 06:35:24.031-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Andreas Krüger (woopstar) 2016-03-01 06:36:58.557-0600

Forgot to add, that GDB said, that strlen.S was not found as file.

{code}
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb1360700 (LWP 47935)]
strlen () at ../sysdeps/x86_64/strlen.S:106
106 ../sysdeps/x86_64/strlen.S: No such file or directory.
{code}

By: Rusty Newton (rnewton) 2016-03-01 15:04:44.383-0600

Sounds like are able to reproduce the crash on a transfer. Can you provide the dialplan that is used for the transfer along with a debug trace of the call in progress?

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

Please follow those instructions exactly for the debug trace. Make sure to include the "pjsip set logger on" output and verify "DEBUG" messages are in the log.

By: Andreas Krüger (woopstar) 2016-03-02 04:01:39.952-0600

Hi Rusty,

You're right that this is caused on a transfer. <snip>

[Edit by Rusty - Moved your comment up into the description since it helps a ton. Thanks!]

By: Rusty Newton (rnewton) 2016-03-02 07:36:35.118-0600

Thanks!

By: Andreas Krüger (woopstar) 2016-03-02 07:40:44.556-0600

Sure! :) Hope it will be fixed soon and released

It prop. not that a common error though.

By: George Joseph (gjoseph) 2016-03-14 19:50:41.194-0500

I fixed a similar issue last month.  Can you check if that fix resolved your issue as well?

Pull the current 13 git branch to check.



By: Andreas Krüger (woopstar) 2016-03-15 03:31:23.503-0500

Do you remember the commit id? Would else have to check a lot :(

By: George Joseph (gjoseph) 2016-03-15 08:39:36.661-0500

34c64707d1aa346fb0e9c7f97e375d22dedf67d9
res_pjsip_caller_id: Fix segfault when replacing rpid or pai header


By: Andreas Krüger (woopstar) 2016-03-29 08:56:41.436-0500

Problem still exists. And now it has been moved to:

https://github.com/asterisk/asterisk/commit/f0799da3ac7e13638838063fe3cf0d041daea520#diff-dfef39ef55b6e289b29d9093cee50662R3917

The strlen function must validate, that it actually contains a string to do strlen on.

By: Kevin Harwell (kharwell) 2017-04-26 13:11:53.845-0500

I have been unable to duplicate this problem.

[~woopstar] Do you know if this still happens for you in the latest release of Asterisk 13 (13.7.2 is a few versions back now and some work has been done in the pjsip callerid area since then)?

Also, I've been looking through the code to see how the caller id name string could be NULL, but considered valid. One way is if in the dialplan it was specified as such. Do you see or have anywhere in your dialplan that sets the valid flag? Something similar to the following:
{noformat}
Set(CALLERID(name-valid)=1
{noformat}

By: Kevin Harwell (kharwell) 2017-04-26 13:53:01.996-0500

I take back some of the previous comment. There does appear to be other checks against caller id being NULL strewn throughout the code even when marked valid. So doing the same in this instance would not be unprecedented. That is also the quick and easy fix. It'd be nice to know the root cause of how it came to be in that state, but that may not be possible.

So barring that and/or more details I'll put a NULL check in. Which no matter the root cause is probably the right fix anyway given that via dialplan one can specify the caller id name to be valid when it is not.

By: Andreas Krüger (woopstar) 2017-04-26 13:55:53.783-0500

The problem was us not setting the name right. I believe you still need to do the strlen check and null check, but after we were setting the name the problem was never happening again.

Does that answer suffice ?

By: Kevin Harwell (kharwell) 2017-04-26 14:28:10.372-0500

Yeah that's fine. I have the info I need. Patch is up for review: https://gerrit.asterisk.org/#/c/5541

By: Friendly Automation (friendly-automation) 2017-04-27 16:18:09.934-0500

Change 5541 merged by Jenkins2:
res_pjsip/res_pjsip_callerid: NULL check on caller id name string

[https://gerrit.asterisk.org/5541|https://gerrit.asterisk.org/5541]

By: Friendly Automation (friendly-automation) 2017-04-27 16:43:27.004-0500

Change 5542 merged by Jenkins2:
res_pjsip/res_pjsip_callerid: NULL check on caller id name string

[https://gerrit.asterisk.org/5542|https://gerrit.asterisk.org/5542]

By: Friendly Automation (friendly-automation) 2017-04-27 16:48:56.659-0500

Change 5543 merged by Jenkins2:
res_pjsip/res_pjsip_callerid: NULL check on caller id name string

[https://gerrit.asterisk.org/5543|https://gerrit.asterisk.org/5543]