[Home]

Summary:ASTERISK-19923: Asterisk crashing due to memory corruptions in chan_sip/voicemail
Reporter:Dan Delaney (drdelaney)Labels:
Date Opened:2012-05-29 14:52:45Date Closed:2012-06-21 12:38:30
Priority:MajorRegression?
Status:Closed/CompleteComponents:Applications/app_voicemail
Versions:1.8.13.0 Frequency of
Occurrence
Frequent
Related
Issues:
must be completed before resolvingASTERISK-20018 Asterisk 1.8.14.0 Blockers
must be completed before resolvingASTERISK-20019 Asterisk 10.6.0 Blockers
is related toASTERISK-19889 Asterisk crashes due to memory corruption
is related toASTERISK-20052 Security Vulnerability: remote crash vulnerability in app_voicemail
is related toASTERISK-19837 Asterisk crashing regularly in 1.8.11.1 due to memory corruption
Environment:CentOS 5.4 . Intel(R) Xeon(R) CPU E31220 @ 3.10GHz, 4g RAM. asterisk 1.8.13.0-RC1Attachments:( 0) core.10435.backtrace.txt
( 1) core.10435.console.txt
( 2) core.32003.backtrace.txt
( 3) core.32003.console.txt
( 4) core.3467.backtrace.txt
( 5) core.3467.console.txt
( 6) core.3698.backtrace.txt
( 7) core.5849.backtrace.txt
( 8) core.7477.backtrace.txt
( 9) core.7477.console.txt
(10) dlstest-valgrind.txt
(11) skip_sip.conf
(12) skip_voicemail.conf
(13) skipper-valgrind.txt
(14) vm_alloc_fix.diff
Description:Intermittent crashes while using voicemail.  Have been unable to intentionally reproduce.  Backtraces and consoles will be attached.
Comments:By: Dan Delaney (drdelaney) 2012-05-29 15:19:33.270-0500

all backtraces attached, and consoles for some matching backtraces.

By: Rusty Newton (rnewton) 2012-05-30 16:01:12.148-0500

Please provide any applicable configuration files. voicemail.conf and config files for any channels that have related mailboxes, such as sip.conf.



By: Dan Delaney (drdelaney) 2012-05-30 18:05:21.296-0500

skip_sip.conf is the sip configuration for known problematic extensions.
skip_voicemail.conf is the vm configuration.

Entries for the pin, domains, ips and sip secrets have been changed for security reasons.

By: Dan Delaney (drdelaney) 2012-05-30 18:08:18.870-0500

attached requested information.

By: Rusty Newton (rnewton) 2012-06-04 09:27:32.336-0500

Your backtrace appears to contain memory corruption and we require valgrind output in order to move this issue forward. Please see https://wiki.asterisk.org/wiki/display/AST/Valgrind for more information about how to produce debugging information. Thanks!



By: Dan Delaney (drdelaney) 2012-06-04 11:23:27.957-0500

Using 1.8.13.0-rc2, on a different machine, was not able to get it to crash under valgrind.  I was able to reproduce on same machine with RC1, however, it was sporadic.

I will attempt to gather data on the production system today and see if I can gather the crashed memory info.

Attached is the non-crashed valgind data.

By: Dan Delaney (drdelaney) 2012-06-04 22:32:29.291-0500

attached is a copy of the valgrind data.
i was not able to get the process to crash, however my test call did end once (not sure if it was related or not)

this is with 1.8.13.0-rc2

the only way known to reproduce is to listen to a bunch of voicemail files, and it will randomly crash.  In this instance i listened to about 36 voicemails.

Will try to obtain crash data again if needed. since this is a production system I have to do testing after hours due to strain valgrind puts on system.

By: Dan Delaney (drdelaney) 2012-06-04 22:33:22.257-0500

attached a few valgrind data dumps.  was not crashing in tests, however the attached data may help.

By: Dan Delaney (drdelaney) 2012-06-12 15:40:04.700-0500

I have been attempting to further replicate this.  Using valgrind makes this almost impossible, as asterisk can barely process the calls.  I am using the details from the above link, and CPU goes to 100% from memcheck.

A symptom of this shows up as a large number of new messages, when only two exist, then it crashes afterwards.

As of this time, all I can do is provide more crash reports, and not valgrind memory dumps (unless the non-crashed mem dumps are helping).

If there are any other known tweaks to get this working without a high cpu load, I can attempt to run this in a production environment to get the needed data.

By: Kinsey Moore (kmoore) 2012-06-13 13:43:36.588-0500

Would you mind giving the attached patch a try?  It was created for another issue, but this seems to be very similar.

By: Dan Delaney (drdelaney) 2012-06-13 14:26:17.820-0500

Also something to note is we have upgraded to 1.8.13.0rc2 and 1.8.14.0rc1 respectfully and still seeing same issue.   This patch applies cleanly to 14.0rc1.  Will roll out to system and check if it resolves issue.

By: Kinsey Moore (kmoore) 2012-06-15 13:13:40.008-0500

Hi Dan,
How are things looking on your end with the patch applied?

By: Dan Delaney (drdelaney) 2012-06-15 14:07:46.182-0500

So far so good.  The patch has been applied cleaning to 14rc1 and 13.0 stable (we had to downgrade due to a bug with the parking groups).

There has been no crashes yet.  I would like to keep this open over the weekend as the customer this is fixing stated they do more VM work on Mondays.  I have not been able to reproduce this on an other system on demand.  I would say if theres no updates by Monday we are good.

By: Matt Jordan (mjordan) 2012-06-15 15:32:26.411-0500

Dan:

Reading your last comment - is there a bug in 1.8.14.0-rc1 with respect to parking?  If so, would you mind opening another JIRA issue?

Matt

By: Dan Delaney (drdelaney) 2012-06-15 16:10:26.161-0500

Opened up at ASTERISK-20012

By: Dan Delaney (drdelaney) 2012-06-19 11:03:15.256-0500

so far this appears to have fixed the issue.  I am going to keep an eye on it some more as the customer sometimes went a day or two without issues.

By: Julian Yap (jyap) 2012-06-21 00:31:53.063-0500

I have fully tested this patch on a system which had this issue on certified-asterisk-1.8.11-cert2 as well as Asterisk 1.8.12.0. This patch has been tested and is working fine on a production system running: certified-asterisk-1.8.11-cert2


By: Dan Delaney (drdelaney) 2012-06-21 11:24:30.123-0500

has been working for almost a week with no issues or crashes.  i would say this can be closed.

By: Kinsey Moore (kmoore) 2012-06-21 12:21:26.763-0500

Added link to relevant issue.  

By: Dan Delaney (drdelaney) 2012-06-21 12:38:30.992-0500

the patch resolved the issue