[Home]

Summary:ASTERISK-27952: Segfault after pjsip hdr linked list corruption
Reporter:laszlovl (lvl)Labels:pjsip
Date Opened:2018-07-03 09:00:24Date Closed:2020-01-14 11:13:38.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:pjproject/pjsip
Versions:15.3.0 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:Attachments:( 0) backtrace.txt
Description:Twice now, we've experienced an Asterisk segfault which was caused by a corrupted "hdr" linked list. This only happens once every thousands of calls so I'm not able to draw a connection yet, but as far as I can see there was nothing out of the ordinary for the affected calls.

Might be related to ASTERISK-27792 and ASTERISK-26832.

As you can see the first couple of header entries in the list are completely normal, but eventually hdr.next points to an invalid memory address.

Case 1:

{code}
#0  pj_stricmp (str1=str1@entry=0x1e, str2=str2@entry=0x7f17a4baead0) at ../include/pj/string_i.h:216

#1  0x00007f18d1d549a5 in pjsip_msg_find_hdr_by_name (msg=0x7f181c245180, name=name@entry=0x7f17a4baead0, start=start@entry=0x0) at ../src/pjsip/sip_msg.c:363
       hdr = 0x6
       end = 0x7f181c2451a8
{code}

{code}
(gdb) print *msg.hdr.next.next.next.next.next.next.next.next.next.next
$23 = {
 prev = 0x7f187406d368,
 next = 0x7f18741e3568,
 type = PJSIP_H_OTHER,
 name = {
   ptr = 0x7f18d1ddfedd "Min-SE",
   slen = 6
 },
 sname = {
   ptr = 0x0,
   slen = 0
 },
 vptr = 0x7f18d2022d00 <min_se_hdr_vptr>
}
(gdb) print *msg.hdr.next.next.next.next.next.next.next.next.next.next.next
$24 = {
 prev = 0x7f1874041098,
 next = 0x6,
 type = 177,
 name = {
   ptr = 0x7f1874036040 "al-queuemember-0006aa1a;2\033[0m\", \"\033[1;35mARRAY(target_username xxxx \001",
   slen = 139743010334336
 },
 sname = {
   ptr = 0x0,
   slen = 4294967295
 },
 vptr = 0x0
}
{code}

Case 2:

{code}
#0  pjsip_hdr_print_on (hdr_ptr=0x7f3133322e36, buf=0x7f0fbc1f3430 "Content-Type: application/sdprnContent-Length:   261rnrnv=0rno=- 572496747 572496749 xxxx"..., len=31096) at ../src/pjsip/sip_msg.c:584
       hdr = 0x7f3133322e36
#1  0x00007f1034e4ac85 in pjsip_msg_print (msg=0x7f0f180bff30, buf=0x7f0fbc1f30a8 "SIP/2.0 183 Session ProgressrnVia: xxxx"..., size=<optimized out>) at ../src/pjsip/sip_msg.c:464
       p = 0x7f0fbc1f3430 "Content-Type: application/sdprnContent-Length:   261rnrnv=0rno=- 572496747 572496749 xxxx"...
       end = 0x7f0fbc1fada8 "250255037274017177"
       len = <optimized out>
       hdr = 0x7f3133322e36
       clen_hdr = {ptr = 0x7f1034ed8eef "Content-Length: ", slen = 16}
{code}

{code}
(gdb) p *msg.hdr.next.next.next.next.next.next.next.next.next.next.next
$14 = {
 prev = 0x7f0fbc34a730,
 next = 0x7f0fbc1fb9e8,
 type = PJSIP_H_ALLOW,
 name = {
   ptr = 0x7f1034ed9284 "Allow",
   slen = 5
 },
 sname = {
   ptr = 0x7f1034ed9284 "Allow",
   slen = 5
 },
 vptr = 0x7f103511b3a0 <generic_array_hdr_vptr>
}
(gdb) p *msg.hdr.next.next.next.next.next.next.next.next.next.next.next.next
$15 = {
 prev = 0x362e3333322e3738,
 next = 0x7f3133322e36,
 type = PJSIP_H_CONTACT,
 name = {
   ptr = 0x7f1034ed27eb "Contact",
   slen = 7
 },
 sname = {
   ptr = 0x7f1034eec21a "m",
   slen = 1
 },
 vptr = 0x7f103511b340 <contact_hdr_vptr>
}
(gdb) p *msg.hdr.next.next.next.next.next.next.next.next.next.next.next.next.next
Cannot access memory at address 0x7f3133322e36
{code}
Comments:By: Asterisk Team (asteriskteam) 2018-07-03 09:00:26.022-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Sean Bright (seanbright) 2018-07-03 12:25:07.782-0500

Can you attach the full backtrace? Or at least a few more frames so we can see where this is being called from.

By: Asterisk Team (asteriskteam) 2018-07-19 12:00:01.489-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: laszlovl (lvl) 2018-07-24 06:23:55.983-0500

Sorry, didn't receive a notification for your comment.

I suspect that the full backtrace isn't very interesting because it's just the result of a corruption that previously happened elsewhere. Nonetheless, attaching it now.

By: Asterisk Team (asteriskteam) 2018-07-24 06:23:56.255-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Joshua C. Colp (jcolp) 2018-07-24 07:02:07.086-0500

What is the configuration? Do you have any of the external signaling transport settings set? Do you have console output leading up to the crash?

By: laszlovl (lvl) 2018-07-24 08:02:45.913-0500

Either there was no particular console output, or it didn't make it to the syslog. The Pjsip configuration is mostly the defaults:

{code}
[proxy](!)
endpoint/context = from-proxy
endpoint/allow = opus,g722,alaw,ulaw,gsm,h263,h263p,h264
endpoint/dtmf_mode = auto
endpoint/rtp_timeout = 600
endpoint/rtp_timeout_hold = 3600

aor/qualify_frequency = 10

endpoint/rtp_symmetric = yes
endpoint/direct_media = no
{code}

By: Joshua C. Colp (jcolp) 2018-07-24 08:10:04.870-0500

That configuration is incomplete, there has to be a transport section in order for traffic to flow.

By: laszlovl (lvl) 2018-07-24 08:13:50.657-0500

OK, yes, but it's completely "default".

{code}
[transport-udp]
type = transport
protocol = udp
bind = 0.0.0.0:8066
{code}

By: Joshua C. Colp (jcolp) 2018-07-24 08:21:32.829-0500

PJSIP uses explicit configuration, so unlike chan_sip there is no "default" in the sense that if you don't specify something like a transport it exists anyway - that's why full configuration generally has to be provided like that.

By: Joshua C. Colp (jcolp) 2018-07-25 04:31:41.636-0500

What is the dialplan involved for this? Are you using the PJSIP_HEADER dialplan function at all?

By: Asterisk Team (asteriskteam) 2018-08-08 12:00:03.089-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: laszlovl (lvl) 2018-08-24 09:23:32.619-0500

I no longer have the logs to replicate the exact dialplan used for these two cases. But yes, we use PJSIP_HEADER in nearly every call, both in "read" mode and in "add" mode.

By: Asterisk Team (asteriskteam) 2018-08-24 09:23:32.967-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Benjamin Keith Ford (bford) 2018-08-27 11:32:39.399-0500

There isn't much to go off of here as far as replication goes since we don't have a scenario, and the backtrace looks like it's missing some information. Is Asterisk compiled with DONT_OPTIMIZE and BETTER_BACKTRACES [1], or is this something you will be able to run on your system? It will help pinpoint the problem if it occurs in the future. Make sure you run Asterisk with the -g option. If the problem pops up again, be sure to include the core files you get from ast_coredumper and the scenario that caused it.

[1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Richard Mudgett (rmudgett) 2018-08-27 11:42:02.629-0500

The patch for ASTERISK-27966 might fix this too.

By: laszlovl (lvl) 2018-09-03 05:13:04.065-0500

The Asterisk was compiled with DONT_OPTIMIZE, but not with BETTER_BACKTRACES. I'll look into that.

I understand that it's a tricky issue to comment on. Will pull in 15.6 as soon as possible after it has landed, and report back here when the problem reappears.

By: Asterisk Team (asteriskteam) 2018-09-18 12:00:01.921-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines