[Home]

Summary:ASTERISK-25718: file: Use after free during shutdown
Reporter:Badalian Vyacheslav (slavon)Labels:
Date Opened:2016-01-22 15:40:06.000-0600Date Closed:2016-02-12 10:39:09.000-0600
Priority:MinorRegression?
Status:Closed/CompleteComponents:Core/General
Versions:13.7.0 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:On Ctrl+C exit with active calls

{code}
==28264==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d00016b350 at pc 0x00000065c33d bp 0x7feb7cd23c30 sp 0x7feb7cd23c20
WRITE of size 4 at 0x60d00016b350 thread T131
   #0 0x65c33c in ast_atomic_fetchadd_int /root/asterisk-13.7.0/include/asterisk/lock.h:685
   #1 0x665265 in __ast_module_unref /root/asterisk-13.7.0/main/loader.c:1564
   #2 0x61b355 in filestream_destructor /root/asterisk-13.7.0/main/file.c:428
   #3 0x492fdb in internal_ao2_ref /root/asterisk-13.7.0/main/astobj2.c:445
   #4 0x4932fa in __ao2_ref /root/asterisk-13.7.0/main/astobj2.c:516
   #5 0x61de42 in ast_closestream /root/asterisk-13.7.0/main/file.c:1054
   #6 0x61918e in ast_stopstream /root/asterisk-13.7.0/main/file.c:194
   #7 0x61ffe6 in waitstream_core /root/asterisk-13.7.0/main/file.c:1418
   #8 0x621042 in ast_waitstream /root/asterisk-13.7.0/main/file.c:1601
   #9 0x7feb9b745f58 in playback_exec /root/asterisk-13.7.0/apps/app_playback.c:489
   #10 0x6c5a4d in pbx_exec /root/asterisk-13.7.0/main/pbx.c:1722
   #11 0x6dc083 in pbx_extension_helper /root/asterisk-13.7.0/main/pbx.c:4994
   #12 0x6e20cf in ast_spawn_extension /root/asterisk-13.7.0/main/pbx.c:6216
   #13 0x6e483c in __ast_pbx_run /root/asterisk-13.7.0/main/pbx.c:6633
   #14 0x6e6e70 in pbx_thread /root/asterisk-13.7.0/main/pbx.c:6953
   #15 0x7d981c in dummy_start /root/asterisk-13.7.0/main/utils.c:1237
   #16 0x7febb03cedc4 in start_thread (/lib64/libpthread.so.0+0x7dc4)
   #17 0x7febaf6ae21c in clone (/lib64/libc.so.6+0xf621c)

ASAN:SIGSEGV
==28264==AddressSanitizer: while reporting a bug found another one. Ignoring.

{code}
Comments:By: Asterisk Team (asteriskteam) 2016-01-22 15:40:08.493-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Badalian Vyacheslav (slavon) 2016-01-23 01:43:55.320-0600

Looks to thread race

By: Corey Farrell (coreyfarrell) 2016-01-25 11:41:34.773-0600

It looks like this caused a segmentation fault. Do you still have the core dump?  I think we need the output of 'thread apply all bt' from GDB to look into this.  Your backtrace shows the thread that is running the still active channel, I need to see the thread that is shutting down asterisk to know how far it got.

By: Joshua C. Colp (jcolp) 2016-01-26 18:04:59.586-0600

Per [~coreyfarrell] if you could attach a full backtrace it would be good. He's familiar with the shutdown process and can provide some insight.

By: Badalian Vyacheslav (slavon) 2016-01-28 11:40:59.265-0600

in asan you can't create core file becouse alocated virtual mem size > 1TB. I will try catch it in gdb interactive.

By: Badalian Vyacheslav (slavon) 2016-02-08 05:40:41.498-0600

It is very difficult to repeat in conjunction with GDB.

Steps to reproduce:
1. create context with 10 x Playfile
2. sipp 100+ channels to this context
3. asterisk -gvvvvc ...... wait 10 sec (all loaded and calls begin) and do Ctrl + C
4. Repeat step 3 until 10-30. In time to catch takes about 1-2 minutes if no GDB


By: Corey Farrell (coreyfarrell) 2016-02-12 10:39:09.266-0600

[~slavon]: It is likely that many races exist when using Ctrl+C or 'core stop now' (both are actually the same thing).  The idea of this shutdown mode is to exit asterisk as fast as possible.  Databases and logs are flushed, but active threads are not shutdown, most components are left running.  Any issue with this style shutdown requires a backtrace showing all threads, otherwise it's not possible to determine how this happened.

Even with a backtrace of all threads it might not be possible / desirable to fix.  The problem is that fast shutdown purposefully ignores the normal rule of cleanup after yourself,  and this can cause a race between threads that are still running and things done by libc after the call to {{exit}}.  In general I recommend using 'core stop gracefully' for all testing of this kind.

Note if you run across any bug where this fast shutdown mode can produce persistent corruption (like in a database), that would be a different type of bug.