[Home]

Summary:ASTERISK-25274: A11 SIGSEGV 'Double free or corruption' in backtrace from pj_pool_release (sip_destroy -> pj_ice_sess_destroy)
Reporter:Dade Brandon (dade)Labels:
Date Opened:2015-07-22 12:37:40Date Closed:2020-01-14 11:13:45.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:
Versions:11.18.0 Frequency of
Occurrence
Frequent
Related
Issues:
is related toASTERISK-25275 A11 SIGSEGV from pjnpath check_cached_response (ast_rtcp_read -> pj_stun_session_on_rx_pkt)
Environment:Ubuntu 14.04.2; Linux 3.13.0-24-generic SMP; Intel E3-1231 Openssl 1.0.1f-1ubuntu2.15 (Jun 11 2015; most recent available) libsrtp0 / libsrtp0-dev 1.4.5~20130609~dfsg-1Attachments:( 0) 7-2-phx-debug-aug18c.txt.gz
( 1) 7-2-phx-fullbt-aug18c.txt
( 2) fenrir-debug-july23.txt.gz
( 3) fenrir-fullbt-jul23.txt
( 4) narvi-backtrace-july_22_2015.txt
( 5) Narvi_debug_log_jul_22_917.p.txt.gz
Description:We have the patch from ASTERISK-25103 added to trunk 11 with a few custom patches (mostly just debug messages).  The following crash occurs infrequently (1-5 times per week, usually batched together and on the same server(s); based on the pattern I imagine that there is a remote factor in whether or not the crash occurs, such as a slow peer )

The full backtrace with some added print *var's attached, as well as debug log will be attached in a sec after I create this issue, below is the top chunk from the backtrace to assist with reviewing this issue.

{noformat}
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6)
#1  __GI_abort ()
#2  __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7f548a7b6b28 "*** Error in `%s': %s: 0x%s ***\n")
#3  malloc_printerr (ptr=<optimized out>, str=0x7f548a7b6c58 "double free or corruption (out)", action=1)
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0)
#5  default_block_free ()
#6  pj_pool_destroy_int ()
#7  cpool_release_pool ()
#8  pj_pool_release ()
#9  destroy_tdata ()
#10 pj_stun_session_destroy ()
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2015-07-22 12:37:42.389-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Dade Brandon (dade) 2015-07-22 12:47:09.800-0500

Gzip of the debug log --- this is the last five minutes before the crash (identified by asterisk starting back up on the last line)   -- the spam of "No remote address on RTP instance '....' so dropping frame" is unique to this issue, noting that the call IDs and RTP instances are different - we occasionally see this on one RTP instance, but lately we've been getting this across multiple RTP instances right before a crash.

By: Nicole McIntosh (atna99) 2015-07-23 16:05:36.681-0500

Another crash, looks like the same source issue.

Debug and full backtrace added.

By: Rusty Newton (rnewton) 2015-07-23 17:37:49.064-0500

In the case of potential memory corruption we typically need Valgrind or MALLOC_DEBUG output to make any progress.

If the issue only occurs on a production system then MALLOC_DEBUG may be your only option.

https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag



By: Dade Brandon (dade) 2015-07-23 19:49:05.165-0500

We will need to sleep this issue for ~ a week when we can get MALLOC_DEBUG in on all servers, and then from there until the crash is reproduced.

By: Rusty Newton (rnewton) 2015-07-24 09:05:08.873-0500

I'll ask a developer to take a look at it in the meantime as well.

By: Mark Michelson (mmichelson) 2015-07-24 09:56:04.132-0500

I'm going to jump in here and say that MALLOC_DEBUG is not going to help here since the malloc error is down inside PJLib. MALLOC_DEBUG does not intercept those allocations.

By: Rusty Newton (rnewton) 2015-07-24 18:51:18.732-0500

Dade can you post new logs when the issue occurs next, with the new logs including a SIP trace? (sip set debug on)

By: Asterisk Team (asteriskteam) 2015-08-15 12:00:23.010-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: Nicole McIntosh (atna99) 2015-08-18 17:16:03.310-0500

Same issue "Double free or corruption" in backtrace.

Full debug with sip tracing on, also full backtrace attached.