[Home]

Summary:ASTERISK-22083: res_musiconhold segfault in free, in moh_scan_files
Reporter:Sébastien Couture (sysreq)Labels:
Date Opened:2013-07-16 15:15:18Date Closed:2020-08-26 16:02:54
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_musiconhold
Versions:13.11.2 14.0.2 Frequency of
Occurrence
Occasional
Related
Issues:
is duplicated byASTERISK-26474 res_musiconhold: Crash when scanning files with realtime
is duplicated byASTERISK-28927 Asterisk crash in music on hold
is related toASTERISK-21775 FPE during MOH playback
Environment:Linux vgw1 2.6.32-35-generic #78-Ubuntu SMP Tue Oct 11 16:11:24 UTC 2011 x86_64 GNU/LinuxAttachments:( 0) backtrace.txt
Description:We've had Asterisk segfault with an error in res_musiconhold.c; more specifically in the 'moh_scan_files' function (line 1066):
{code}
for (i = 0; i < class->total_files; i++)
 ast_free(class->filearray[i]);
{code}

I've attached a backtrace of the core dump. I'm wondering if this could be related to ASTERISK-21775. I haven't yet tried the patch mentioned in that issue.
Comments:By: Rusty Newton (rnewton) 2013-07-25 16:03:32.423-0500

The crash looks different than the one in ASTERISK-21775.

The segfault happens in free. There is likely memory corruption.

Please follow the instructions on https://wiki.asterisk.org/wiki/display/AST/Valgrind and provide Valgrind debug.

Get an Asterisk log leading up to the crash if possible. (with VERBOSE and DEBUG turned up to 5)

Attach it to the issue and hit Send Back when you've got those items.

By: Rusty Newton (rnewton) 2013-08-14 09:42:04.088-0500

Suspended due to lack of activity. Please request a bug marshal in #asterisk-bugs on the IRC network irc.freenode.net to reopen the issue should you have the additional information requested.  Further information can be found at http://www.asterisk.org/developers/bug-guidelines



By: Shlomi Gutman (voicenter) 2014-04-22 03:47:00.208-0500

Have this issues, it happens very rarely, so it's hard to reproduce.
Asterisk 1.8.23.0
Not sure if i would be able to use high debug and verbose and valgrind on productive server.
I have few core dumps i can attach if it would help

By: Shlomi Gutman (voicenter) 2014-04-27 06:07:15.199-0500

occurred on 11.6-cert1 as well

By: Rusty Newton (rnewton) 2014-05-22 08:29:42.568-0500

Please attach the additional traces in .txt format. Though if you can't provide the MALLOC_DEBUG output or valgrind output there may not be much we can do.

By: Walter Doekes (wdoekes) 2014-05-22 08:48:38.509-0500

Okay. So that backtrace from July was useful after all:
{noformat}
Thread 101 (Thread 17685):
#0  0x00007faf5d6de16e in ?? () from /lib/libc.so.6
#1  0x00007faf5d6673dc in ?? () from /lib/libc.so.6
#2  0x00007faf5d665e78 in free () from /lib/libc.so.6
#3  0x00007faf4876099f in moh_scan_files (class=0x353c898) at res_musiconhold.c:1066
#4  0x00007faf4876265e in local_ast_moh_start (chan=0x7faf4d526f18, mclass=0x7faf3fc19268 "vascomain", interpclass=0x0) at res_musiconhold.c:1516
#5  0x00000000004894cd in ast_moh_start (chan=0x7faf4d526f18, mclass=0x7faf3fc19268 "vascomain", interpclass=0x0) at channel.c:7964
#6  0x00007faf4772485f in queue_exec (chan=0x7faf4d526f18, data=0x7faf3fc1c9a0 "vascomain-premium,i") at app_queue.c:6228
#7  0x000000000050e1fb in pbx_exec (c=0x7faf4d526f18, app=0x7faf54ac10e0, data=0x7faf3fc1c9a0 "vascomain-premium,i") at pbx.c:1446
#8  0x00000000005184f5 in pbx_extension_helper (c=0x7faf4d526f18, con=0x0, context=0x7faf4d527470 "func-queue", exten=0x7faf4d5274c0 "s", priority=15, label=0x0, callerid=0x7faf4cf34bb0 "4504315181", action=E_SPAWN, found=0x7faf3fc1ec8c, combined_find_spawn=1) at pbx.c:4489
...
{noformat}

{noformat}
Thread 57 (Thread 17682):
#0  0x00007faf5d6de16e in ?? () from /lib/libc.so.6
#1  0x00007faf5d6673dc in ?? () from /lib/libc.so.6
#2  0x00007faf5d665e78 in free () from /lib/libc.so.6
#3  0x00007faf4876099f in moh_scan_files (class=0x353c898) at res_musiconhold.c:1066
#4  0x00007faf4876265e in local_ast_moh_start (chan=0x7faf4d164618, mclass=0x7faf3fb9d268 "vascomain", interpclass=0x0) at res_musiconhold.c:1516
#5  0x00000000004894cd in ast_moh_start (chan=0x7faf4d164618, mclass=0x7faf3fb9d268 "vascomain", interpclass=0x0) at channel.c:7964
#6  0x00007faf4772485f in queue_exec (chan=0x7faf4d164618, data=0x7faf3fba09a0 "vascomain-premium,i") at app_queue.c:6228
#7  0x000000000050e1fb in pbx_exec (c=0x7faf4d164618, app=0x7faf54ac10e0, data=0x7faf3fba09a0 "vascomain-premium,i") at pbx.c:1446
#8  0x00000000005184f5 in pbx_extension_helper (c=0x7faf4d164618, con=0x0, context=0x7faf4d164b70 "func-queue", exten=0x7faf4d164bc0 "s", priority=15, label=0x0, callerid=0x7faf4d4377d0 "5143527328", action=E_SPAWN, found=0x7faf3fba2c8c, combined_find_spawn=1) at pbx.c:4489
#9  0x000000000051a26d in ast_spawn_extension (c=0x7faf4d164618, context=0x7faf4d164b70 "func-queue", exten=0x7faf4d164bc0 "s", priority=15, callerid=0x7faf4d4377d0 "5143527328", found=0x7faf3fba2c8c, combined_find_spawn=1) at pbx.c:5127
...
{noformat}

{noformat}
Thread 1 (Thread 17758):
#0  0x00007faf5d61ba75 in raise () from /lib/libc.so.6
#1  0x00007faf5d61f5c0 in abort () from /lib/libc.so.6
#2  0x00007faf5d6554fb in ?? () from /lib/libc.so.6
#3  0x00007faf5d65f5b6 in ?? () from /lib/libc.so.6
#4  0x00007faf5d665e83 in free () from /lib/libc.so.6
#5  0x00007faf4876099f in moh_scan_files (class=0x353c898) at res_musiconhold.c:1066
#6  0x00007faf4876265e in local_ast_moh_start (chan=0x3923dd8, mclass=0x7faf3ee0d268 "vascomain", interpclass=0x0) at res_musiconhold.c:1516
#7  0x00000000004894cd in ast_moh_start (chan=0x3923dd8, mclass=0x7faf3ee0d268 "vascomain", interpclass=0x0) at channel.c:7964
#8  0x00007faf4772485f in queue_exec (chan=0x3923dd8, data=0x7faf3ee109a0 "vascomain-premium,i") at app_queue.c:6228
#9  0x000000000050e1fb in pbx_exec (c=0x3923dd8, app=0x7faf54ac10e0, data=0x7faf3ee109a0 "vascomain-premium,i") at pbx.c:1446
#10 0x00000000005184f5 in pbx_extension_helper (c=0x3923dd8, con=0x0, context=0x3924330 "func-queue", exten=0x3924380 "s", priority=15, label=0x0, callerid=0x3874a40 "4505158115", action=E_SPAWN, found=0x7faf3ee12c8c, combined_find_spawn=1) at pbx.c:4489
#11 0x000000000051a26d in ast_spawn_extension (c=0x3923dd8, context=0x3924330 "func-queue", exten=0x3924380 "s", priority=15, callerid=0x3874a40 "4505158115", found=0x7faf3ee12c8c, combined_find_spawn=1) at pbx.c:5127
...
{noformat}

Observe how class is the same in all cases. Three threads are attempting to free the same memory at once.

{noformat}
       for (i = 0; i < class->total_files; i++) {
               ast_free(class->filearray[i]);
       }
       class->total_files = 0;
{noformat}

This doesn't look like the same bug as ASTERISK-21775 because that memory isn't used in multiple threads at once there, according to the backtraces. Although I'm not sure.
{noformat}
#0  0x00002b9ac0a708e0 in moh_files_alloc (chan=0x2b9ae0071fc8, params=0x2b9ae0071268) at res_musiconhold.c:432
432 state->pos = ast_random() % class->total_files;
{noformat}

The fix:
- add a mutex in class so we can lock the scanning/reinitialization
- explain why the same memory can be used and don't do it

By: Shlomi Gutman (voicenter) 2014-05-22 09:39:52.148-0500

Is there any relation to that all crashes of music on hold is happening when using from queue application?




{noformat}
#0  0x0000003a5ca30285 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003a5ca30285 in raise () from /lib64/libc.so.6
#1  0x0000003a5ca31d30 in abort () from /lib64/libc.so.6
#2  0x0000003a5ca6971b in __libc_message () from /lib64/libc.so.6
#3  0x0000003a5ca711df in _int_free () from /lib64/libc.so.6
#4  0x0000003a5ca7163b in free () from /lib64/libc.so.6
#5  0x00002aaace58669d in moh_scan_files (class=0x2aaab12ed278) at res_musiconhold.c:1066
#6  0x00002aaace58add7 in local_ast_moh_start (chan=0x2aab680ac3b8, mclass=0x42692668 "moh_389", interpclass=0x0) at res_musiconhold.c:1518
#7  0x00002aaab40176d2 in say_periodic_announcement (qe=0x42692660, ringing=0) at app_queue.c:3602
#8  0x00002aaab40201c7 in wait_our_turn (chan=0x2aab680ac3b8, data=<value optimized out>) at app_queue.c:4334
#9  queue_exec (chan=0x2aab680ac3b8, data=<value optimized out>) at app_queue.c:6325
#10 0x00000000004ee7bf in pbx_exec (c=0x2aab680ac3b8, app=0x2aaaac0ddb70, data=0x42695990 "queue_10131,t,,blank,1800,,,sub-queue-onanswer") at pbx.c:1446
#11 0x00000000004fe99c in pbx_extension_helper (c=0x2aab680ac3b8, con=<value optimized out>, context=0x2aab680ac910 "asterisk-queue-blc", exten=0x2aab680ac960 "036476661", priority=17, label=0x0,
   callerid=0x2aab680a7310 "05000000", action=E_SPAWN, found=0x4269800c, combined_find_spawn=1) at pbx.c:4489
#12 0x000000000050198b in ast_spawn_extension (c=0x2aab680ac3b8, args=0x0) at pbx.c:5127
#13 __ast_pbx_run (c=0x2aab680ac3b8, args=0x0) at pbx.c:5230
#14 0x0000000000503e6b in pbx_thread (data=0x52e7) at pbx.c:5571
#15 0x000000000053beac in dummy_start (data=<value optimized out>) at utils.c:1075
#16 0x0000003a5d20677d in start_thread () from /lib64/libpthread.so.0
#17 0x0000003a5cad3c1d in clone () from /lib64/libc.so.6
(gdb) bt full
#0  0x0000003a5ca30285 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003a5ca31d30 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x0000003a5ca6971b in __libc_message () from /lib64/libc.so.6
No symbol table info available.
#3  0x0000003a5ca711df in _int_free () from /lib64/libc.so.6
No symbol table info available.
#4  0x0000003a5ca7163b in free () from /lib64/libc.so.6
No symbol table info available.
#5  0x00002aaace58669d in moh_scan_files (class=0x2aaab12ed278) at res_musiconhold.c:1066
       files_DIR = 0x2aab2802e070
       files_dirent = <value optimized out>
       dir_path = "/var/lib/asterisk/mohmp3/moh_389/\000\000B\000\000\000\000@\021iB\000\000\000\000`#iB\000\000\000\000\035XK\000\000\000\000\000\070?DS\000hC\024 \313\304\364f\366\304\364\212\250\304\372r\306\304\354\272\347?\002\301\304\020p\"\305\034\232^\305P;\004\305@\256\033\303\030H@D\000\236A¼\201\275D\351\205ZE\216\017\260E\260\365\315E\357\370\300E\211v\205E<\005?D\323\064\227?\344\067\305D\352U\305\376\321+\305$\201\262\304V*\243\304 8_\302=qmD\274\310\361D\233\333fD\320a\025½\004\314\304\000\244\357\304!\027:\305\000m>\305,|N?\301\244\304?\254\272\304D\372\026\304X\346\307\303\360\327\020CP\370\332BC\b\260C\344\375-EhshE\317&\233E\250\022\237E,\367\262ER\343\312E<\003\304E4\363\346E\202\316\bF&\363\017F-\207\fF<\203\322Esiren14\000\334\000\n \000\000\000\000\334\000\n \000\000\000\000"...
       path = "/var/lib/asterisk/mohmp3/moh_389\000L9,\253*\000\000\001\000\000\000\000\000\000\000\200\002iB\000\000\000\000;\026\247\\:\000\000\000/tmp\000\000\000\200\370\016iB\000\000\000\000A\000\000\000\000\000\000\000\377\037\000\000\000\000\000\000\260L9,\253*\000\000\370\016iB", '\000' <repeats 12 times>"\266, \346B\000\000\000\000\000\300\026iB\000\000\000\000\060\025iB\000\000\000\000 \000\000\000\060\000\000\000\002\000\000\000\000\000\000\000\320\027iB\000\000\000\000\240\002iB\000\000\000\000\000\000\000\000\376\377\377\377\v\000\000\000\000\000\000\000\060\025iB\000\000\000\000\246\376\243]:", '\000' <repeats 17 times>, "`A\000\000\000\000\000\000\000\000"...
       filepath = "/var/lib/asterisk/mohmp3/moh_389//26905_389_12032013215341\000g729\000/var/lib/asterisk/mohmp3/moh_389//26905_389_12032013215341\000g729\000\220\362hB", '\000' <repeats 36 times>"\226, \000\000\000\n", '\000' <repeats 11 times>"\360, \361hB", '\000' <repeats 12 times>"\357, \023iB\000\000\000\000)\364hB\000\000\000\000)\364hB\000\000\000\000\200$iB\000\000\000\000b\037\002\264\252*\000\000\306"...
       statbuf = {st_dev = 1114186256, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, pad0 = 53, st_rdev = 227633266688, st_size = 46912605442680, st_blksize = 46913094705280, st_blocks = 0, st_atim = {tv_sec = 0,
           tv_nsec = 0}, st_mtim = {tv_sec = 525996840, tv_nsec = 0}, st_ctim = {tv_sec = 8394054, tv_nsec = 1114187368}, __unused = {0, 0, 1114186256}}
       i = 1
       __PRETTY_FUNCTION__ = "moh_scan_files"
#6  0x00002aaace58add7 in local_ast_moh_start (chan=0x2aab680ac3b8, mclass=0x42692668 "moh_389", interpclass=0x0) at res_musiconhold.c:1518
       mohclass = 0x2aaab12ed278
       var = 0x0
       res = -1
       realtime_possible = 1
       __PRETTY_FUNCTION__ = "local_ast_moh_start"
       __FUNCTION__ = "local_ast_moh_start"
#7  0x00002aaab40176d2 in say_periodic_announcement (qe=0x42692660, ringing=0) at app_queue.c:3602
       res = 0
       now = 1397039323
       __PRETTY_FUNCTION__ = "say_periodic_announcement"
#8  0x00002aaab40201c7 in wait_our_turn (chan=0x2aab680ac3b8, data=<value optimized out>) at app_queue.c:4334
No locals.
#9  queue_exec (chan=0x2aab680ac3b8, data=<value optimized out>) at app_queue.c:6325
       res = 0
       ringing = 0
       user_priority = <value optimized out>
       max_penalty_str = <value optimized out>
       min_penalty_str = <value optimized out>
       prio = 0
       qcontinue = 0
       max_penalty = 0
       min_penalty = 0
       reason = QUEUE_UNKNOWN
       tries = 0
       noption = 0
       makeannouncement = <value optimized out>
       position = <value optimized out>
       args = {argc = 8, argv = 0x426937f8, queuename = 0x42692610 "queue_10131", options = 0x4269261c "t", url = 0x4269261e "", announceoverride = 0x4269261f "blank", queuetimeoutstr = 0x42692625 "1800", agi = 0x4269262a "",
         macro = 0x4269262b "", gosub = 0x4269262c "sub-queue-onanswer", rule = 0x0, position = 0x0}
---Type <return> to continue, or q <return> to quit---
       qe = {parent = 0x2aaad4334fd8, moh = "moh_389", '\000' <repeats 72 times>, announce = "/var/lib/asterisk/sounds/blc/user_rec/26905_04032013170704", '\000' <repeats 4037 times>, context = '\000' <repeats 79 times>,
         digits = '\000' <repeats 79 times>, valid_digits = 0, pos = 2, prio = 0, last_pos_said = 0, ring_when_ringing = 0, last_periodic_announce_time = 1397039293, last_periodic_announce_sound = 0, last_pos = 0, opos = 4,
         handled = 0, pending = 0, max_penalty = 0, min_penalty = 0, linpos = 0, linwrapped = 0, start = 1397039113, expire = 1397040913, cancel_answered_elsewhere = 0, chan = 0x2aab680ac3b8, qe_rules = {first = 0x0, last = 0x0},
         pr = 0x0, next = 0x47586660}
       __PRETTY_FUNCTION__ = "queue_exec"
#10 0x00000000004ee7bf in pbx_exec (c=0x2aab680ac3b8, app=0x2aaaac0ddb70, data=0x42695990 "queue_10131,t,,blank,1800,,,sub-queue-onanswer") at pbx.c:1446
       res = <value optimized out>
       u = 0x2aab2c134c60
       saved_c_appl = 0x0
       saved_c_data = 0x0
       __PRETTY_FUNCTION__ = "pbx_exec"
#11 0x00000000004fe99c in pbx_extension_helper (c=0x2aab680ac3b8, con=<value optimized out>, context=0x2aab680ac910 "asterisk-queue-blc", exten=0x2aab680ac960 "030000000000", priority=17, label=0x0,
   callerid=0x2aab680a7310 "0500000000", action=E_SPAWN, found=0x4269800c, combined_find_spawn=1) at pbx.c:4489
       e = <value optimized out>
       app = 0x2aaaac0ddb70
       res = <value optimized out>
       q = {incstack = {0x0 <repeats 128 times>}, stacklen = 0, status = 5, swo = 0x0, data = 0x0, foundcontext = 0x2aab680ac910 "asterisk-queue-blc"}
       passdata = "queue_10131,t,,blank,1800,,,sub-queue-onanswer\000/FDcwXzuO\000e89\000\000\000\000``iB\000\000\000\000\340giB\000\000\000\000\340giB\000\000\000\000\340giB", '\000' <repeats 12 times>"\377, \377\377\377\377\377\377\377\377giB\023\000\000\000\026\242X", '\000' <repeats 13 times>, "\030_iB\000\000\000\000\000\000\000\000\001\000\000\000\v", '\000' <repeats 31 times>, " [iB", '\000' <repeats 52 times>, " \000\000\000\000\000\000\000P\000\000\000\000\000\000\000\377\377\377s", '\000' <repeats 12 times>"\377, \377\377\377\060\000\000\000`\004\325\\:", '\000' <repeats 11 times>"\340"...
       matching_action = 0
       __PRETTY_FUNCTION__ = "pbx_extension_helper"
#12 0x000000000050198b in ast_spawn_extension (c=0x2aab680ac3b8, args=0x0) at pbx.c:5127
No locals.
#13 __ast_pbx_run (c=0x2aab680ac3b8, args=0x0) at pbx.c:5230
       digit = <value optimized out>
       invalid = <value optimized out>
       dst_exten = "\000\200iB\000\000\000\000\020\200iB", '\000' <repeats 12 times>, "\030\200iB", '\000' <repeats 20 times>, "\030", '\000' <repeats 47 times>, " \200iB\000\000\000\000(\200iB\000\000\000\000\060\200iB\000\000\000\000\070\200iB\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\004\000\000\000P\200iB\000\000\000\000\244\323 ]:", '\000' <repeats 27 times>"\340, )\325\\:", '\000' <repeats 11 times>"\240, \367\236 \000\000\000\000`\b\002\000\000\000\000\000\340)\325\\:\000\000\000\030\000\000\000\000\000\000\000-8\247\\:", '\000' <repeats 11 times>, "@d\030h\253*\000"
       pos = 5440934
       found = 1
       res = 0
       error = 0
       __PRETTY_FUNCTION__ = "__ast_pbx_run"
#14 0x0000000000503e6b in pbx_thread (data=0x52e7) at pbx.c:5571
No locals.
#15 0x000000000053beac in dummy_start (data=<value optimized out>) at utils.c:1075
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {46915674203200, 4875765709081475290, 0, 3, 4096, 46915674089632, 4875765708039747770, 4875765709076383304}, __mask_was_saved = 0}}, __pad = {0x426981a0, 0x0,
           0x3a5cd4d890, 0x3a5cd4d898}}
       __cancel_arg = 0x42698940
       not_first_call = <value optimized out>
       ret = <value optimized out>
#16 0x0000003a5d20677d in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#17 0x0000003a5cad3c1d in clone () from /lib64/libc.so.6
No symbol table info available.
{noformat}

By: Walter Doekes (wdoekes) 2014-05-22 09:48:34.464-0500

Shlomi are there other threads referencing the same mem? {{class=0x2aaab12ed278}}

By: Shlomi Gutman (voicenter) 2014-05-25 09:22:42.308-0500

I do not have that core file anymore. But i have 3-4 others i saved. Do you mind to tell how do i check it or/and link me to good manual for "dummies" :) Thanks

By: Shlomi Gutman (voicenter) 2014-05-25 09:54:52.597-0500

As i understand, the moh that is used is realtime and from cache. And it everytime rescans the files before playing. The problem is when there are a lot of calls to the same Q, it tries to rescan (free and readd ) files at the same time from several threads/calls?
Maybe there should be lock added or maybe the best practice is to change array of files to save thread list ?

By: Joshua C. Colp (jcolp) 2016-10-17 14:23:32.314-0500

Still applicable to later versions as well.

By: Niek Klein Kromhof (nkkromhof) 2016-11-23 03:59:39.410-0600

I'm currently experiencing this issue as well on 13.8-cert2. I've disabled 'cachertclasses=yes' and restarted Asterisk but was still experiencing segfaults on initialization of MoH. It may or may not be pertinent that storage used for MoH files is a GlusterFS volume.

By: Sean Bright (seanbright) 2020-08-26 16:02:54.494-0500

Technically this is the original bug and ASTERISK-28927 is the duplicate, but it is easier to just mark this as such and move on.