[Home]

Summary:ASTERISK-27412: core: Audiohook freeing interpolated frame when it shouldn't.
Reporter:Mikhail (j3st3r)Labels:patch pjsip
Date Opened:2017-11-10 11:47:30.000-0600Date Closed:2017-11-16 07:59:50.000-0600
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General Core/General
Versions:15.1.0 15.1.1 Frequency of
Occurrence
Frequent
Related
Issues:
duplicatesASTERISK-27238 Bridging: Crash freeing a frame that's already been freed
is related toASTERISK-27321 Asterisk Crashing with FRACK Errors and Serious Network Trouble
is related toASTERISK-27300 Asterisk crashes randomly (FRACK!, chan_sip)
Environment:ubuntu server 16.04 x64Attachments:( 0) autosupport.tar.gz
( 1) ba.tar.gz
( 2) debug_log_123456
( 3) jira_asterisk_27412_v15.1.patch
( 4) jira_asterisk_27413_v15.1.patch
( 5) valgrind.txt
( 6) valgrind3.txt
( 7) valgrind4.txt
Description:I've installed Asterisk 15.1.0 on Ubuntu server 16.04. I configured realtime sippeers (tried via odbc and res_config_mysql - all the same). And asterisk core crashes after 20-40 minutes of working.

System has 4gb of RAM and 4 gb swap:
                   total        used        free      shared  buff/cache   available
Mem:           3849         438        1079          24        2330        3094
Swap:          3993         466        3527


processor is Intel(R) Core(TM) i3 CPU         560  @ 3.33GHz

backtraces:
https://drive.google.com/file/d/179L3aN2iey2nN_zdGKyJZ5EGNGGOpEHh/view?usp=sharing
https://drive.google.com/file/d/1RuLZ-eqyhEyFYHc1bLANof1COuXfYKUq/view?usp=sharing
Comments:By: Asterisk Team (asteriskteam) 2017-11-10 11:47:31.300-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Richard Mudgett (rmudgett) 2017-11-10 12:15:20.518-0600

Please follow the Asterisk Issue Guidelines [1] and attach backtraces to the issue.  Links to third party locations tend to vaporize and become invalid when someone eventually gets around to looking at them.  That being said, the backtraces show memory corruption so you need to either run under valgrind [2] or with the MALLOC_DEBUG flag [3] enabled in the menuselect compiler options section.

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines
[2] https://wiki.asterisk.org/wiki/display/AST/Valgrind
[3] https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag

By: Mikhail (j3st3r) 2017-11-10 13:56:15.027-0600

Hello, Richard.
I've started asterisk with valgrand, and the issue repeated, I attach valgrind.txt

By: Richard Mudgett (rmudgett) 2017-11-10 16:41:46.670-0600

Realtime has nothing to do with this problem as the problem is in the media pathway.

From the valgrind output it looks like someone is freeing a media frame when they shouldn't.  The freed media frame is then cached in a thread local cache.  However, since the frame is still being used it might get passed to another thread and freed in the other thread's frame cache.  As a result all the frame caches can get poisoned.

Another thing from the valgrind output is that it appears to be affecting a two party call with mixmonitor recording the call.  Therefor, a simplified setup appears possible to replicate the problem as it could just be between chan_sip and mixmonitor.

We need to know how your system is setup (Attach config files, etc.) [1] as someone will have to replicate this problem.  Also debug logs [2] captured when the problem happens could be used to help simplify the setup.

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines
[2] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Mikhail (j3st3r) 2017-11-11 01:47:57.362-0600

I attach debug log and autosupport script info. Asterisk is used for website callbacks (customer inserts his phone number into widget on a website, this information goes via AMI into Asterisk, and asterisk originates two legs of the call)

By: Mikhail (j3st3r) 2017-11-11 04:46:13.720-0600

I've upgraded Asterisk to version 15.1.1 and have the same problem

By: Richard Mudgett (rmudgett) 2017-11-11 14:02:09.930-0600

I created ASTERISK-27413 to help with finding the memory corruption bug you are having.  That issue describes how to disable the media frame caching after applying the patch.

[^jira_asterisk_27413_v15.1.patch] - This patch adds the cache_media_frames option to help with finding use after free of media frames.  This patch applies to v15.1.x.

Use {{patch -p1 -i jira_asterisk_27413_v15.1.patch}} to apply.

Once you have done so, please reinstall asterisk, disable the media frame cache, recreate the issue under valgrind and attach the new output.

By: Mikhail (j3st3r) 2017-11-12 12:49:38.522-0600

Surprisingly, but after applying the patch - Asterisk works (although I assumed that the patch is needed to find the problem, but does not solve it).

ps aux |grep asterisk |grep -v grep
asterisk 10530 12.6 10.4 1335724 410940 pts/3  Sl+  12:16  72:27 /usr/bin/valgrind.bin --suppressions=/usr/src/asterisk-15.1.1/contrib/valgrind.supp --log-fd=9 asterisk -vvvvcg
asterisk 10531  0.0  0.0   4224   660 pts/3    S+   12:16   0:00 astcanary /var/run/asterisk/alt.asterisk.canary.tweet.tweet.tweet 10530

cat /etc/asterisk/asterisk.conf |grep cache_media
cache_media_frames=no

asterisk -rx 'core show uptime'
System uptime: 9 hours, 31 minutes, 37 seconds
Last reload: 9 hours, 31 minutes, 37 seconds

asterisk -rx 'core show calls'
0 active calls
560 calls processed

I  continue to observe.

By: Richard Mudgett (rmudgett) 2017-11-12 16:41:35.950-0600

valgrind can prevent crashes by how it operates and still find things.

The patch adds a new option to disable the media frames caches.  The default of the new option has the caches still enabled so you need to explicitly set the option off in asterisk.conf as indicated by ASTERISK-27413.

By: Mikhail (j3st3r) 2017-11-12 23:58:36.870-0600

Asterisk died after ~22 hours of work. At the end of CLI output - I have seen "FRACK!, Failed assertion bad magic number 0x0 for object 0xd206110 (0) at line 5522 in action_originate of manager.c Killed"

valgrind3.txt attached.

By: Mikhail (j3st3r) 2017-11-13 04:11:50.296-0600

here is one more output:
{noformat}
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c                                                                                       [0/1921]
[2017-11-13 13:04:35] #3: [0x4651e0] asterisk(__ao2_cleanup_debug+0x48) [0x4651e0]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
[2017-11-13 13:04:35] #4: [0x53fc99] asterisk() [0x53fc99]
[2017-11-13 13:04:35] #5: [0x53fcf9] asterisk(ast_frame_free+0x2e) [0x53fcf9]
[2017-11-13 13:04:35] #6: [0x5336a4] asterisk() [0x5336a4]
[2017-11-13 13:04:35] #7: [0x533866] asterisk() [0x533866]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
[2017-11-13 13:04:35] #8: [0x4c7ff5] asterisk() [0x4c7ff5]
[2017-11-13 13:04:35] #9: [0x4ca714] asterisk(ast_read+0x22) [0x4ca714]
[2017-11-13 13:04:35] #10: [0x5356c2] asterisk() [0x5356c2]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 146 in format_cap_framed_destroy of format_cap.c
[2017-11-13 13:04:35] #11: [0x535c44] asterisk(ast_waitstream+0x40) [0x535c44]
[2017-11-13 13:04:35] #12: [0x1aff5f65] /usr/lib/asterisk/modules/app_playback.so(+0x2f65) [0x1aff5f65]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 6071 in _dtor_tmp_fmt of channel.c
[2017-11-13 13:04:35] #13: [0x5b0a9d] asterisk(pbx_exec+0x119) [0x5b0a9d]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
[2017-11-13 13:04:35] #14: [0x59b7db] asterisk() [0x59b7db]
[2017-11-13 13:04:35] #15: [0x59f6b6] asterisk(ast_spawn_extension+0x50) [0x59f6b6]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 5616 in ast_set_write_format of channel.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
[2017-11-13 13:04:35] #16: [0x5a03ec] asterisk() [0x5a03ec]
[2017-11-13 13:04:35] #17: [0x5a1e6b] asterisk(ast_pbx_run_args+0x9a) [0x5a1e6b]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
[2017-11-13 13:04:35] #18: [0x5a1ea9] asterisk(ast_pbx_run+0x1d) [0x5a1ea9]
[2017-11-13 13:04:35] #19: [0x5aabdf] asterisk() [0x5aabdf]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
[2017-11-13 13:04:35] #20: [0x6447a6] asterisk() [0x6447a6]
[2017-11-13 13:04:24] ERROR[27986][C-00000204]: frame.c:347 ast_frdup: FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0)
[2017-11-13 13:04:35] Got 19 backtrace records
[2017-11-13 13:04:35] #0: [0x6478e8] asterisk(__ast_assert_failed+0x8d) [0x6478e8]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
[2017-11-13 13:04:35] #1: [0x464121] asterisk() [0x464121]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
[2017-11-13 13:04:35] #2: [0x4649c6] asterisk(__ao2_ref+0xa1) [0x4649c6]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 157 in __frame_free of frame.c
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c
[2017-11-13 13:04:35] #3: [0x5403ea] asterisk(ast_frdup+0x291) [0x5403ea]
[2017-11-13 13:04:35] #4: [0x4c043c] asterisk() [0x4c043c]
FRACK!, Failed assertion bad magic number 0x0 for object 0xd24a180 (0) at line 347 in ast_frdup of frame.c

[2017-11-13 13:04:35] #5: [0x4c0a7a] asterisk(ast_queue_frame+0x2a) [0x4c0a7a]
{noformat}

valgrind4.txt is attached

By: Richard Mudgett (rmudgett) 2017-11-13 18:22:56.481-0600

[^valgrind3.txt] and [^valgrind4.txt] are showing three things:
# 0x5BDFEC: normalise_history (plc.c:99) is using memcpy() when it should use memmove() because the memory blocks overlap.  This is minor and not really causing a problem.
# 0x46E17A: audio_audiohook_write_list (audiohook.c:953) is freeing a frame it shouldn't.  Apparently middle_frame == start_frame.  In that case it shouldn't free it.  This is the main memory corruption mechanism seen in the backtraces.  The regression change was introduced by ASTERISK-26926.
# Chan_sip is destroying its pvt data structure before it is time due to a ref counting error.  This is causing the FRACKs and many invalid reads/writes after the FRACKs.  Be aware that chan_sip is extended support so someone from the community needs to address this problem.

The first two I'm going to consider as what needs to be fixed to complete this issue.  The third issue with chan_sip is reported in other issues and is a duplicate.  However, the valgrind output is very useful for the chan_sip problem.

By: Richard Mudgett (rmudgett) 2017-11-13 18:58:18.204-0600

Oops.  The FRACKs you pasted are more likely associated with the bad free of frames and not from chan_sip.  However, there is a ref counting problem in chan_sip shown by the valgrind output.

By: Mikhail (j3st3r) 2017-11-14 01:29:00.445-0600

Hi, Richard. Thank you for your answers and giving time to my problem. I'll change the channel's driver to pjsip. Can you orient in time when the remaining bugs will be fixed and what do I need to do?

By: Mikhail (j3st3r) 2017-11-14 03:04:17.733-0600

Today I've seen this behavior:

Asterisk is working, but calls can not be originated, and in the CLI (asterisk is working via valgrind) I see:
{noformat}
*CLI> [2017-11-14 12:02:08]   == Manager 'callback' logged on from 127.0.0.1
FRACK!, Failed assertion bad magic number 0x0 for object 0xd2a9b70 (0) at line 5522 in action_originate of manager.c
[2017-11-14 12:02:09] ERROR[9919]: manager.c:5522 action_originate: FRACK!, Failed assertion bad magic number 0x0 for object 0xd2a9b70 (0)
FRACK!, Failed assertion bad magic number 0x0 for object 0xd2a9b70 (0) at line 146 in format_cap_framed_destroy of format_cap.c
[2017-11-14 12:02:11] Got 15 backtrace records
*CLI> [2017-11-14 12:02:11] #0: [0x6478e8] asterisk(__ast_assert_failed+0x8d) [0x6478e8]
[2017-11-14 12:02:11] #1: [0x464121] asterisk() [0x464121]
[2017-11-14 12:02:11] #2: [0x4649c6] asterisk(__ao2_ref+0xa1) [0x4649c6]
[2017-11-14 12:02:11] #3: [0x53d2ef] asterisk(__ast_format_cap_append+0xa7) [0x53d2ef]
[2017-11-14 12:02:11] #4: [0x56ffd0] asterisk() [0x56ffd0]
[2017-11-14 12:02:11] #5: [0x574371] asterisk() [0x574371]
[2017-11-14 12:02:11] #6: [0x574d66] asterisk() [0x574d66]
[2017-11-14 12:02:11] #7: [0x575210] asterisk() [0x575210]
*CLI> [2017-11-14 12:02:11] #8: [0x62ec71] asterisk() [0x62ec71]
[2017-11-14 12:02:11] #9: [0x6447a6] asterisk() [0x6447a6]
[2017-11-14 12:02:10] ERROR[9919]: translate.c:1329 ast_translator_best_choice: Cannot determine best translation path since one capability supports no formats
[2017-11-14 12:02:10] WARNING[9919]: channel.c:6093 request_channel: No translator path exists for channel type Local (native (codec2|g723|ulaw|alaw|gsm|g726|g726aal2|adpcm|(null)|slin|slin|slin|slin|slin|slin|slin|slin|lpc10|g729|speex|speex|speex|ilbc|g722|siren7|siren14|testlaw|g719|opus|jpeg|png|h261|h263|h263p|h264|mpeg4|vp8|vp9|red|t140|t38|silk|silk|silk|silk)) to ((null))
[2017-11-14 12:02:11] ERROR[9919]: format_cap.c:146 format_cap_framed_destroy: FRACK!, Failed assertion bad magic number 0x0 for object 0xd2a9b70 (0)
[2017-11-14 12:02:11] Got 20 backtrace records
[2017-11-14 12:02:11] #0: [0x6478e8] asterisk(__ast_assert_failed+0x8d) [0x6478e8]
[2017-11-14 12:02:11] #1: [0x464121] asterisk() [0x464121]
[2017-11-14 12:02:11] #2: [0x4649c6] asterisk(__ao2_ref+0xa1) [0x4649c6]
[2017-11-14 12:02:11] #3: [0x4651e0] asterisk(__ao2_cleanup_debug+0x48) [0x4651e0]
[2017-11-14 12:02:11] #4: [0x53ce2d] asterisk() [0x53ce2d]
[2017-11-14 12:02:11] #5: [0x464f97] asterisk(__ao2_ref+0x672) [0x464f97]
[2017-11-14 12:02:11] #6: [0x53cb6e] asterisk() [0x53cb6e]
[2017-11-14 12:02:11] #7: [0x464f97] asterisk(__ao2_ref+0x672) [0x464f97]
[2017-11-14 12:02:11] #8: [0x4651e0] asterisk(__ao2_cleanup_debug+0x48) [0x4651e0]
[2017-11-14 12:02:11] #9: [0x571f7d] asterisk() [0x571f7d]
[2017-11-14 12:02:11] #10: [0x574371] asterisk() [0x574371]
[2017-11-14 12:02:11] #11: [0x574d66] asterisk() [0x574d66]
[2017-11-14 12:02:11] #12: [0x575210] asterisk() [0x575210]
[2017-11-14 12:02:11] #13: [0x62ec71] asterisk() [0x62ec71]
[2017-11-14 12:02:11] #14: [0x6447a6] asterisk() [0x6447a6]
[2017-11-14 12:02:11]   == Manager 'callback' logged off from 127.0.0.1
[2017-11-14 12:02:12]   == Manager 'callback' logged on from 127.0.0.1
{noformat}


By: Richard Mudgett (rmudgett) 2017-11-14 08:34:42.719-0600

The issue is accepted and we have enough information to fix the first two issues.  There should be enough information for the chan_sip issue as well.  I cannot give a time frame on when the issue will be fixed.

The FRACK backtraces would be more useful if you had BETTER_BACKTRACES enabled in menuselect and debug symbols available.  From the valgrind output these FRACKs would appear to be associated with the ref counting issue with chan_sip.

By: Mikhail (j3st3r) 2017-11-14 08:38:27.537-0600

No problem, I'll recompile asterisk with these options. How should I start it after recompiling? Via valgrind or as usual?

By: Richard Mudgett (rmudgett) 2017-11-14 08:43:52.395-0600

You have already collected the information needed from valgrind so you don't need to run under valgrind.

By: Mikhail (j3st3r) 2017-11-14 08:49:32.172-0600

Got it, thank you for clarification

By: Richard Mudgett (rmudgett) 2017-11-14 18:12:22.760-0600

[^jira_asterisk_27412_v15.1.patch] - This patch should fix the first two issues found by valgrind.  The overlapping blocks in plc.c and the incorrect fame free in the audiohook code.

patch -p1 -i jira_asterisk_27412_v15.1.patch

Let me know if it works.

By: Mikhail (j3st3r) 2017-11-15 04:27:28.437-0600

>The issue is accepted and we have enough information to fix the first two >issues. There should be enough information for the chan_sip issue as well. I >cannot give a time frame on when the issue will be fixed.
>The FRACK backtraces would be more useful if you had >BETTER_BACKTRACES enabled in menuselect and debug symbols >available. From the valgrind output these FRACKs would appear to be >associated with the ref counting issue with chan_sip.


Backtrace is ataached - ba.tar.gz

By: Mikhail (j3st3r) 2017-11-15 04:28:56.859-0600

> Let me know if it works.

Richard, thanks for patch.  I'll notify you if the issue persists after patching

By: Richard Mudgett (rmudgett) 2017-11-15 07:59:28.613-0600

The backtrace you attached today [^ba.tar.gz] has no symbols so it is not usable.  However, it looks like you have MALLOC_DEBUG enabled because it is trying to access address 0xdeaddeaddeaddexx.

By: Mikhail (j3st3r) 2017-11-15 08:00:48.222-0600

Could you please explain me how to enable those symbols?

By: Richard Mudgett (rmudgett) 2017-11-15 08:17:29.488-0600

Strange.  All the other backtraces and valgrind output you have posted has had symbols.  Why not this one?  The symbols should be there if you built and installed it yourself.  Did you build and install this version yourself?

By: Mikhail (j3st3r) 2017-11-15 08:21:03.928-0600

Yes, I did. The difference between configurations is choosing options DEBUG_THREADS, DEBUG_FD_LEAKS and MALLOC_DEBUG in menuselect

By: Mikhail (j3st3r) 2017-11-15 14:28:08.228-0600

surprisingly, but after parching 15.1.1 I've got this error:

rtp_engine.c:447 ast_rtp_instance_new: No RTP engine was found. Do you have one loaded?

but rtp modules are loaded...
Module                         Description                              Use Count  Status      Support Level
bridge_native_rtp.so           Native RTP bridging module               0          Running              core
chan_rtp.so                    RTP Media Channel                        0          Running              core
res_rtp_multicast.so           Multicast RTP Engine                     0          Running              core
res_srtp.so                    Secure RTP (SRTP)                        0          Running              core
4 modules loaded

By: Richard Mudgett (rmudgett) 2017-11-15 14:46:21.879-0600

DEBUG_THREADS, DEBUG_FD_LEAKS and MALLOC_DEBUG alter the public API to add debugging parameters.  Any externally compiled modules like codec_opus or res_digium_phone cannot be used as they can result in crashes.

By: Richard Mudgett (rmudgett) 2017-11-15 16:04:11.386-0600

We have confirmation that the memory corruption caused by incorrectly freeing the frame in audiohook.c is a duplicate of ASTERISK-27238.

By: Friendly Automation (friendly-automation) 2017-11-16 07:59:51.454-0600

Change 7231 merged by Joshua Colp:
audiohook.c: Fix freeing a frame and still using it.

[https://gerrit.asterisk.org/7231|https://gerrit.asterisk.org/7231]

By: Friendly Automation (friendly-automation) 2017-11-16 08:17:18.865-0600

Change 7232 merged by Joshua Colp:
audiohook.c: Fix freeing a frame and still using it.

[https://gerrit.asterisk.org/7232|https://gerrit.asterisk.org/7232]

By: Friendly Automation (friendly-automation) 2017-11-16 08:27:58.024-0600

Change 7233 merged by Joshua Colp:
audiohook.c: Fix freeing a frame and still using it.

[https://gerrit.asterisk.org/7233|https://gerrit.asterisk.org/7233]