Summary: | ASTERISK-27952: Segfault after pjsip hdr linked list corruption | ||
Reporter: | laszlovl (lvl) | Labels: | pjsip |
Date Opened: | 2018-07-03 09:00:24 | Date Closed: | 2020-01-14 11:13:38.000-0600 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | pjproject/pjsip |
Versions: | 15.3.0 | Frequency of Occurrence | Occasional |
Related Issues: | |||
Environment: | Attachments: | ( 0) backtrace.txt | |
Description: | Twice now, we've experienced an Asterisk segfault which was caused by a corrupted "hdr" linked list. This only happens once every thousands of calls so I'm not able to draw a connection yet, but as far as I can see there was nothing out of the ordinary for the affected calls.
Might be related to ASTERISK-27792 and ASTERISK-26832. As you can see the first couple of header entries in the list are completely normal, but eventually hdr.next points to an invalid memory address. Case 1: {code} #0 pj_stricmp (str1=str1@entry=0x1e, str2=str2@entry=0x7f17a4baead0) at ../include/pj/string_i.h:216 #1 0x00007f18d1d549a5 in pjsip_msg_find_hdr_by_name (msg=0x7f181c245180, name=name@entry=0x7f17a4baead0, start=start@entry=0x0) at ../src/pjsip/sip_msg.c:363 hdr = 0x6 end = 0x7f181c2451a8 {code} {code} (gdb) print *msg.hdr.next.next.next.next.next.next.next.next.next.next $23 = { prev = 0x7f187406d368, next = 0x7f18741e3568, type = PJSIP_H_OTHER, name = { ptr = 0x7f18d1ddfedd "Min-SE", slen = 6 }, sname = { ptr = 0x0, slen = 0 }, vptr = 0x7f18d2022d00 <min_se_hdr_vptr> } (gdb) print *msg.hdr.next.next.next.next.next.next.next.next.next.next.next $24 = { prev = 0x7f1874041098, next = 0x6, type = 177, name = { ptr = 0x7f1874036040 "al-queuemember-0006aa1a;2\033[0m\", \"\033[1;35mARRAY(target_username xxxx \001", slen = 139743010334336 }, sname = { ptr = 0x0, slen = 4294967295 }, vptr = 0x0 } {code} Case 2: {code} #0 pjsip_hdr_print_on (hdr_ptr=0x7f3133322e36, buf=0x7f0fbc1f3430 "Content-Type: application/sdprnContent-Length: 261rnrnv=0rno=- 572496747 572496749 xxxx"..., len=31096) at ../src/pjsip/sip_msg.c:584 hdr = 0x7f3133322e36 #1 0x00007f1034e4ac85 in pjsip_msg_print (msg=0x7f0f180bff30, buf=0x7f0fbc1f30a8 "SIP/2.0 183 Session ProgressrnVia: xxxx"..., size=<optimized out>) at ../src/pjsip/sip_msg.c:464 p = 0x7f0fbc1f3430 "Content-Type: application/sdprnContent-Length: 261rnrnv=0rno=- 572496747 572496749 xxxx"... end = 0x7f0fbc1fada8 "250255037274017177" len = <optimized out> hdr = 0x7f3133322e36 clen_hdr = {ptr = 0x7f1034ed8eef "Content-Length: ", slen = 16} {code} {code} (gdb) p *msg.hdr.next.next.next.next.next.next.next.next.next.next.next $14 = { prev = 0x7f0fbc34a730, next = 0x7f0fbc1fb9e8, type = PJSIP_H_ALLOW, name = { ptr = 0x7f1034ed9284 "Allow", slen = 5 }, sname = { ptr = 0x7f1034ed9284 "Allow", slen = 5 }, vptr = 0x7f103511b3a0 <generic_array_hdr_vptr> } (gdb) p *msg.hdr.next.next.next.next.next.next.next.next.next.next.next.next $15 = { prev = 0x362e3333322e3738, next = 0x7f3133322e36, type = PJSIP_H_CONTACT, name = { ptr = 0x7f1034ed27eb "Contact", slen = 7 }, sname = { ptr = 0x7f1034eec21a "m", slen = 1 }, vptr = 0x7f103511b340 <contact_hdr_vptr> } (gdb) p *msg.hdr.next.next.next.next.next.next.next.next.next.next.next.next.next Cannot access memory at address 0x7f3133322e36 {code} | ||
Comments: | By: Asterisk Team (asteriskteam) 2018-07-03 09:00:26.022-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Sean Bright (seanbright) 2018-07-03 12:25:07.782-0500 Can you attach the full backtrace? Or at least a few more frames so we can see where this is being called from. By: Asterisk Team (asteriskteam) 2018-07-19 12:00:01.489-0500 Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1]. [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines By: laszlovl (lvl) 2018-07-24 06:23:55.983-0500 Sorry, didn't receive a notification for your comment. I suspect that the full backtrace isn't very interesting because it's just the result of a corruption that previously happened elsewhere. Nonetheless, attaching it now. By: Asterisk Team (asteriskteam) 2018-07-24 06:23:56.255-0500 This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable. By: Joshua C. Colp (jcolp) 2018-07-24 07:02:07.086-0500 What is the configuration? Do you have any of the external signaling transport settings set? Do you have console output leading up to the crash? By: laszlovl (lvl) 2018-07-24 08:02:45.913-0500 Either there was no particular console output, or it didn't make it to the syslog. The Pjsip configuration is mostly the defaults: {code} [proxy](!) endpoint/context = from-proxy endpoint/allow = opus,g722,alaw,ulaw,gsm,h263,h263p,h264 endpoint/dtmf_mode = auto endpoint/rtp_timeout = 600 endpoint/rtp_timeout_hold = 3600 aor/qualify_frequency = 10 endpoint/rtp_symmetric = yes endpoint/direct_media = no {code} By: Joshua C. Colp (jcolp) 2018-07-24 08:10:04.870-0500 That configuration is incomplete, there has to be a transport section in order for traffic to flow. By: laszlovl (lvl) 2018-07-24 08:13:50.657-0500 OK, yes, but it's completely "default". {code} [transport-udp] type = transport protocol = udp bind = 0.0.0.0:8066 {code} By: Joshua C. Colp (jcolp) 2018-07-24 08:21:32.829-0500 PJSIP uses explicit configuration, so unlike chan_sip there is no "default" in the sense that if you don't specify something like a transport it exists anyway - that's why full configuration generally has to be provided like that. By: Joshua C. Colp (jcolp) 2018-07-25 04:31:41.636-0500 What is the dialplan involved for this? Are you using the PJSIP_HEADER dialplan function at all? By: Asterisk Team (asteriskteam) 2018-08-08 12:00:03.089-0500 Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1]. [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines By: laszlovl (lvl) 2018-08-24 09:23:32.619-0500 I no longer have the logs to replicate the exact dialplan used for these two cases. But yes, we use PJSIP_HEADER in nearly every call, both in "read" mode and in "add" mode. By: Asterisk Team (asteriskteam) 2018-08-24 09:23:32.967-0500 This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable. By: Benjamin Keith Ford (bford) 2018-08-27 11:32:39.399-0500 There isn't much to go off of here as far as replication goes since we don't have a scenario, and the backtrace looks like it's missing some information. Is Asterisk compiled with DONT_OPTIMIZE and BETTER_BACKTRACES [1], or is this something you will be able to run on your system? It will help pinpoint the problem if it occurs in the future. Make sure you run Asterisk with the -g option. If the problem pops up again, be sure to include the core files you get from ast_coredumper and the scenario that caused it. [1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace By: Richard Mudgett (rmudgett) 2018-08-27 11:42:02.629-0500 The patch for ASTERISK-27966 might fix this too. By: laszlovl (lvl) 2018-09-03 05:13:04.065-0500 The Asterisk was compiled with DONT_OPTIMIZE, but not with BETTER_BACKTRACES. I'll look into that. I understand that it's a tricky issue to comment on. Will pull in 15.6 as soon as possible after it has landed, and report back here when the problem reappears. By: Asterisk Team (asteriskteam) 2018-09-18 12:00:01.921-0500 Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1]. [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines |