[Home]

Summary:ASTERISK-25099: res_rtp_asterisk: Crash when using DTLS
Reporter:Cy Sly (themrrobert)Labels:
Date Opened:2015-05-18 10:59:18Date Closed:2015-07-07 15:07:35
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_rtp_asterisk
Versions:11.17.1 Frequency of
Occurrence
Frequent
Related
Issues:
is related toASTERISK-25103 Roundup - investigate Asterisk DTLS crashes
Environment:[Amazon AWS] Linux 3.14.23-22.44.amzn1.x86_64 #1 SMP Tue Nov 11 23:07:48 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux libsrtp-1.4.4-10.20101004cvs.el6.x86_64 glibc-2.17-55.142.amzn1.x86_64 openssl-devel.x86_64 1:1.0.1k-1.82.amzn1 libuuid-2.23.2-16.22.amzn1.x86_64 uuid-1.6.2-27.22.amzn1.x86_64 libxml2-2.9.1-3.1.35.amzn1.x86_64 mysql Ver 14.14 Distrib 5.5.40, for Linux (x86_64) using readline 5.1 Attachments:( 0) backtrace518-1049a.txt
( 1) backtrace518-10a.txt
( 2) backtrace518-1513.txt
( 3) backtrace518b.txt
( 4) backtrace519-1701.txt
( 5) backtrace-malloc.txt
( 6) backtrace-malloc-907.txt
( 7) backtrace-malloc-907-cli.txt
( 8) mydebug.log.gz
Description:After running for a while, sometimes 5 minutes, sometimes a few hours, but typically multiple times per day, asterisk segfaults, with no hint as to why in the logs. I've checked many backtraces, they are almost all in malloc. One was in another libc module, so i reinstall glibc and haven't seen that one since.

The most recent SF'd in free()

I have uploaded the backtrace here: http://hackrr.com/files/backtrace518b.txt (7:29am)
I have uploaded the debug log here:
http://hackrr.com/files/mydebug.log
and here:
http://hackrr.com/files/mydebug.log.gz
Here is a screenshot of the segfault in the terminal (same tty that started amportal)
http://hackrr.com/files/segfault.png

Update: Here is the 2nd crash of the day. Segmentation fault, address OOB. http://hackrr.com/files/backtrace518-10a.txt  (9:29am)
Update: Here is the 3rd crash of the day. "Aborted" http://hackrr.com/files/backtrace518-1049a.txt (10:49am) (this time it crashed in openssl, so probably different from above SF's but idk...) (open ssl version: OpenSSL 1.0.1k-fips 8 Jan 2015 ) I'm guessing a NULL or invalid pointer was passed to openssl... so back to memory corruption

I am using freepbx, however I compiled asterisk myself, with-srtp, with-uuid, with-ssl, and with-crypto. This is for WebRTC, everything works fine except for the segfaults. Yes, I deleted /usr/lib/asterisk/modules before installing the new version.

Sorry sip/rtp debugging are not in this file, if you think it will help I can do it again. I did it with other core dumps without BETTER_BACKTRACES and DONT_OPTIMIZE, but there was no useful info there, and i forgot to enable this morning with those flags setup.

This is an AWS / Amazon instance. I have run "memtester" in userspace to test the RAM, it found nothing, however I know it's not as good as a kernel level memtest. However I haven't noticed any issues in other software.

I am going to fire up a new instance and start from scratch and see if that helps in any way.

Update: I attached the backtrace from the 5th crash of the day. (@15:13 PDT), the 4th one was very similar to another so I left it out.

It might help you to know, that this is between Philippines and Japan, and in packet traces, I've often seen duplicate packets especially with regards to DTLS/OpenSSL, and frequently out-of-order packets with RTP, so this may be a contributing factor.
Comments:By: Cy Sly (themrrobert) 2015-05-18 13:01:33.653-0500

2nd backtrace of the day, 9:29am

By: Cy Sly (themrrobert) 2015-05-18 13:02:04.758-0500

3rd crash, 10:49am

By: Cy Sly (themrrobert) 2015-05-18 13:08:41.091-0500

Sorry those comments were supposed to be tied to the attachments I uploaded.

@Joshua Colp: Nice work tracing that to dtls, I haven't used backtraces for debugging much, but now that you've pointed it out I can definitely see the correlation. I will definitely use that insight in the future, thank you :)

By: Rusty Newton (rnewton) 2015-05-19 13:41:35.843-0500

Your backtrace appears to contain a memory corruption. We need one or both of the following items to continue investigation of the issue:
1. Valgrind output. See https://wiki.asterisk.org/wiki/display/AST/Valgrind for instructions on how to use Valgrind with Asterisk.
2. MALLOC_DEBUG output. See https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag for instructions on how to use the MALLOC_DEBUG option.

Note that MALLOC_DEBUG and Valgrind are mutually exclusive options. Valgrind output is preferable, but will be more system resource intensive and may be difficult to get on a production system. In such a case, you may have better luck getting the necessary output from MALLOC_DEBUG.



By: Rusty Newton (rnewton) 2015-05-19 13:42:50.856-0500

You might also test the patch on ASTERISK-24832 since it sounds like you were getting a crash in openssl as well.

By: Rusty Newton (rnewton) 2015-05-19 13:49:12.475-0500

Please update your environment with the versions of all your dependent libraries, srtp, ssl, etc

By: Cy Sly (themrrobert) 2015-05-20 06:26:42.264-0500

I am working on getting the info with MALLOC_DEBUG

In the meantime, here is a crash that seems to have originated outside the res_rtp_asterisk module so it may shed some light (hopefully it's not an unrelated crash).

So just get another backtrace with MALLOC_DEBUG or do I need to do something else?

By: Cy Sly (themrrobert) 2015-05-20 11:01:39.025-0500

Here is a good core dump with MALLOC_DEBUG enabled.

I'll post another one if i can get one that crashes in free or malloc

By: Cy Sly (themrrobert) 2015-05-20 11:13:38.589-0500

These malloc-907 files seem to have crashed in malloc, it left a backtrace on the console which i've included as -cli

By: Joshua C. Colp (jcolp) 2015-07-06 06:01:23.731-0500

A change is now up for review at the following addresses for a fix to this problem. While our code review process is pretty fast these days if anyone would like to test the change and provide feedback on this issue it would be welcome:

11: https://gerrit.asterisk.org/#/c/786/
13: https://gerrit.asterisk.org/#/c/787/
master: https://gerrit.asterisk.org/#/c/788/

The patch can be downloaded by clicking the "Download" dropdown and selecting the method you wish.