[Home]

Summary:ASTERISK-27447: MOH: Crash scanning MOH files using realtime.
Reporter:Sebastian Gutierrez (sum)Labels:
Date Opened:2017-11-28 06:48:42.000-0600Date Closed:2020-09-09 10:49:00
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_musiconhold
Versions:13.17.0 13.18.3 Frequency of
Occurrence
Related
Issues:
duplicatesASTERISK-28927 Asterisk crash in music on hold
Environment:ubuntu 16.04Attachments:( 0) core.uc-tennant4-2017-11-28T09-13-39+0100-brief.txt
( 1) core.uc-tennant4-2017-11-28T09-13-39+0100-full.txt
( 2) core.uc-tennant4-2017-11-28T09-13-39+0100-locks.txt
( 3) core.uc-tennant4-2017-11-28T09-13-39+0100-thread1.txt
( 4) core-brief.txt
( 5) core-brief.txt
( 6) core-full.txt
( 7) core-full.txt
( 8) core-locks.txt
( 9) core-locks.txt
(10) core-thread1.txt
(11) core-thread1.txt
(12) mmlog
Description:asterisk crash on double free or corruption

Comments:By: Asterisk Team (asteriskteam) 2017-11-28 06:48:43.634-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Benjamin Keith Ford (bford) 2017-11-28 12:25:49.967-0600

[~sum], can you try upgrading to the latest version of 13 (13.18) and see if the issue persists? If it does, please attach your relevant configuration files (pjsip, extensions, musiconhold) and describe the steps you took to produce the issue. Thanks!

By: Sebastian Gutierrez (sum) 2017-11-28 12:49:48.770-0600

this is chan_sip, is there any relevant commit for this? or is just to try out the latest?


By: Benjamin Keith Ford (bford) 2017-11-28 13:49:06.577-0600

Using chan_sip or pjsip shouldn't be an issue since it appears to be a memory issue, but 13.18 had many changes go into it, including memory related patches. But if upgrading to 13.18 does not fix your issue, we will still need steps on how you are producing the issue, as well as debug with Valgrind. To do this, you can follow the steps on the wiki page here: https://wiki.asterisk.org/wiki/display/AST/Valgrind

By: Sebastian Gutierrez (sum) 2017-12-05 11:00:21.993-0600

I will close this issue by the end of the week but so far so good

By: Joshua C. Colp (jcolp) 2017-12-05 12:38:37.772-0600

Assigning to you until then.

By: Sebastian Gutierrez (sum) 2017-12-07 08:19:27.096-0600

Crashing again I attach dump

By: Richard Mudgett (rmudgett) 2017-12-07 10:32:43.649-0600

Your backtrace appears to contain a memory corruption. We need one or both of the following items to continue investigation of the issue:
1. Valgrind output. See https://wiki.asterisk.org/wiki/display/AST/Valgrind for instructions on how to use Valgrind with Asterisk.
2. MALLOC_DEBUG output. See https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag for instructions on how to use the MALLOC_DEBUG option.

Note that MALLOC_DEBUG and Valgrind are mutually exclusive options. Valgrind output is preferable, but will be more system resource intensive and may be difficult to get on a production system. In such a case, you may have better luck getting the necessary output from MALLOC_DEBUG.

Both of your collected backtrace files show memory corruption.  The memory corruption may be the same as ASTERISK-27238 and ASTERISK-27412.  The fix for those issues is not in a release yet.  The patch for ASTERISK-27413 may be needed to find the cause of the memory corruption if the fix for the other issues doesn't work here.

By: Sebastian Gutierrez (sum) 2017-12-07 10:44:46.527-0600

Valgrind will be impossible, the characteristic of this environment is that queues get a lot of clients on hold for a long time.

should I try compiling the latest 13 branch to get those fixes? later on if this still happening I can try to use MALLOC_DEBUG with the option of ASTERISK-27413.

By: Richard Mudgett (rmudgett) 2017-12-07 11:54:11.788-0600

Yes, please try the 13 branch for those patches.  I saw in your backtrace that you have mixmonitor in use.  Mixmonitor uses audio hooks.  Audio hooks is where the memory corruption was happening that those patches fix.

By: Sebastian Gutierrez (sum) 2017-12-11 07:52:00.200-0600

attached a zip file containing new logs, because dumped with the latest 13 branch, Im going to add malloc debug for tomorrow crash

By: Sebastian Gutierrez (sum) 2017-12-11 07:54:03.642-0600

any problem that I could have using cache_media_frames=no?? and malloc_debug?


By: Richard Mudgett (rmudgett) 2017-12-11 11:31:44.753-0600

Today's backtrace files have no symbols and are useless.  Please do not include core files unless specifically asked.  Core files are large and for the most part only usable by the machine that generated them.

No.  I don't know of any problems disabling cache_media_frames.  Caches simply trade memory for speed.  cache_media_frames=no was specifically created to allow valgrind or MALLOC_DEBUG to track media frame buffer use better.  Since the MOH crash you experience seems to be related to media frame buffer use, you need to set cache_media_frames=no when running with MALLOC_DEBUG.

By: Sebastian Gutierrez (sum) 2017-12-12 05:57:04.508-0600

attach all logs including mmlog



By: Richard Mudgett (rmudgett) 2017-12-12 14:12:27.206-0600

Todays backtrace and mmlog are showing useful information.

Please supply the MOH configuration you are using.  (musiconhold.conf)
Also how many MOH files are in your music classes?

By: Sebastian Gutierrez (sum) 2017-12-12 14:22:16.587-0600

music on hold is realtime
{noformat}
musiconhold => odbc,RESData,musiconhold
{noformat}

{noformat}
name    directory                                        application    mode    digit   sort    format
060     /var/lib/asterisk/moh/060/                                      files           random   wav
default /var/lib/asterisk/moh/                                          files           random   wav
{noformat}

default: 12 files
060: 2 files (the same wav and alaw)

musiconhold.conf
{noformat}
[general]
cachertclasses=yes ; use 1 instance of moh class for all users who are using it,
                   ; decrease consumable cpu cycles and memory
                   ; disabled by default
[default]
mode=files
directory=/var/lib/asterisk/moh
random=yes
{noformat}


By: Richard Mudgett (rmudgett) 2017-12-12 14:46:30.202-0600

There is something strange going on with the class->filearray when {{cachertclasses=yes}} is set.  ASTERISK-25974 may have something to do with this problem.

I'm opening up the issue as there should be enough information for someone to find the problem.

By: Sebastian Gutierrez (sum) 2018-02-06 09:36:22.712-0600

I´ve moved from realtime moh to a file based moh so this problem is not an issue for me anymore, If you dont want to track this you can close it

By: Sean Bright (seanbright) 2020-09-09 10:49:01.048-0500

Technically ASTERISK-28927 is the duplicate, but the problem should be fixed and released regardless.