Summary: | ASTERISK-27862: ARI: Crashing on json_mem_free (after sending event) | ||
Reporter: | Stefan Repke (stefffan@gmx.de) | Labels: | |
Date Opened: | 2018-05-18 08:27:45 | Date Closed: | 2018-06-14 09:00:58 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Applications/app_stasis |
Versions: | 13.21.0 | Frequency of Occurrence | Occasional |
Related Issues: | |||
Environment: | Attachments: | ( 0) backtrace_1.txt ( 1) backtrace_2.txt ( 2) backtrace_3.txt ( 3) cli.txt | |
Description: | ARI is activated and there are several apps connected with "subscribeAll=true".
Then, Asterisk crashes occasionally when delivering ARI events! With normal workload, this happens once or twice a day. It can be provoked by adding many (eg. 10) ARI subscribers and producing lots of events (eg. PeerStatusChange by registering/unregistering a peer). I coredumped three crashes, all having a memory corruption in {{json_mem_free}} at {{json.c:84}}; cf. attached backtraces. * Backtrace 2 and 3 have similar problems: {{double free or corruption (fasttop)}}, * Backtrace 1 looks like wrong memory access. I've also attached the last logger messages from the CLI (debug=5 / verbose=5). | ||
Comments: | By: Asterisk Team (asteriskteam) 2018-05-18 08:27:47.038-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Richard Mudgett (rmudgett) 2018-05-18 16:35:29.299-0500 Your backtrace appears to contain a memory corruption. We need one or both of the following items to continue investigation of the issue: 1. Valgrind output. See https://wiki.asterisk.org/wiki/display/AST/Valgrind for instructions on how to use Valgrind with Asterisk. 2. MALLOC_DEBUG output. See https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag for instructions on how to use the MALLOC_DEBUG option. Note that MALLOC_DEBUG and Valgrind are mutually exclusive options. Valgrind output is preferable, but will be more system resource intensive and may be difficult to get on a production system. In such a case, you may have better luck getting the necessary output from MALLOC_DEBUG. By: Stefan Repke (stefffan@gmx.de) 2018-05-22 09:00:23.957-0500 It's a memory corruption for sure! I did some extensive testing with the two options of memory debugging: 1. With valgrind, I wasn't able to reproduce the crash! 2. With {{MALLOC_DEBUG}} (and {{memory backtrace off}}), it crashed again (after a few seconds of stress testing): {noformat:title=mmlog} WARNING: Memory corrupted after free of 0x7f92c0020480 allocated at json.c ast_json_malloc() line 155 WARNING: Memory corrupted after free of 0x7f92ec0066c0 allocated at json.c ast_json_malloc() line 155 WARNING: Memory corrupted after free of 0x7f92c00381f0 allocated at json.c ast_json_malloc() line 155 WARNING: Freeing unregistered memory 0x7f92ac041cf0 by json.c json_mem_free() line 84 {noformat} {noformat:title=core-thread1.txt} !@!@!@! thread1.txt !@!@!@! $1 = {si_signo = 11, si_errno = 0, si_code = 128, _sifields = {_pad = {0 <repeats 28 times>}, _kill = {si_pid = 0, si_uid = 0}, _timer = {si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 0, si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x0}, _sigpoll = {si_band = 0, si_fd = 0}}} Signal Stop Print Pass to program Description SIGSEGV Yes Yes Yes Segmentation fault Thread 1 (Thread 0x7f92b3d17700 (LWP 9121)): #0 0x0000000000536992 in ast_json_unref (json=0x7f92ac04adc8) at json.c:253 __cur = 0xdeaddeaddeaddead free_list = 0x7f92ac010150 mem = 0x7f92ac041cf0 #1 0x00007f92f37cc6c5 in sub_default_handler (data=0x7f935000cf70, sub=0x7f935000f650, message=0x7f92ac045d80) at stasis/app.c:357 app = 0x7f935000cf70 json = 0x7f92ac04adc8 #2 0x00000000005deb0e in router_dispatch (data=0x7f935000f510, sub=0x7f935000f650, message=0x7f92ac045d80) at stasis_message_router.c:201 router = 0x7f935000f510 route = {message_type = 0x0, callback = 0x7f92f37cc616 <sub_default_handler>, data = 0x7f935000cf70} #3 0x00000000005cd840 in subscription_invoke (sub=0x7f935000f650, message=0x7f92ac045d80) at stasis.c:434 __PRETTY_FUNCTION__ = "subscription_invoke" #4 0x00000000005ce319 in dispatch_exec_async (local=0x7f92b3d16da0) at stasis.c:733 sub = 0x7f935000f650 message = 0x7f92ac045d80 #5 0x00000000005eb2e9 in ast_taskprocessor_execute (tps=0x7f935000f7c0) at taskprocessor.c:970 local = {local_data = 0x7f935000f650, data = 0x7f92ac045d80} t = 0x7f92b80131e0 size = 0 __PRETTY_FUNCTION__ = "ast_taskprocessor_execute" #6 0x00000000005e957c in default_tps_processing_function (data=0x7f9350002100) at taskprocessor.c:191 listener = 0x7f9350002100 tps = 0x7f935000f7c0 pvt = 0x7f9350001fe0 sem_value = 6701170 res = 0 __PRETTY_FUNCTION__ = "default_tps_processing_function" #7 0x0000000000600b19 in dummy_start (data=0x7f935000fb20) at utils.c:1239 __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {0, -1581999541431752663, 0, 140270822379616, 0, 140268058801920, -1581999541406586839, 1526435149223897129}, __mask_was_saved = 0}}, __pad = {0x7f92b3d16ef0, 0x0, 0x0, 0x0}} __cancel_routine = 0x451d2a <ast_unregister_thread> __cancel_arg = 0x7f92b3d17700 __not_first_call = 0 ret = 0x0 a = {start_routine = 0x5e94e6 <default_tps_processing_function>, data = 0x7f9350002100, name = 0x7f935000fbe0 "default_tps_processing_function started at [ 208] taskprocessor.c default_listener_start()"} __PRETTY_FUNCTION__ = "dummy_start" #8 0x00007f9356986064 in start_thread (arg=0x7f92b3d17700) at pthread_create.c:309 __res = <optimized out> pd = 0x7f92b3d17700 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140268058801920, 1526434874704078889, 0, 140270822379616, 0, 140268058801920, -1581999541429655511, -1582925802170013655}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> pagesize_m1 = <optimized out> sp = <optimized out> freesize = <optimized out> __PRETTY_FUNCTION__ = "start_thread" #9 0x00007f935592962d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 No locals. {noformat} 3. With {{MALLOC_DEBUG}} and {{memory backtrace on}}, I wasn't able to reproduce the crash either (even after hours of stress testing)! But this gives interessting insights into the other memory corruptions "at json.c ast_json_malloc() line 155": {noformat:title=mmlog} WARNING: Memory corrupted after free of 0x7f894c088400 allocated at json.c ast_json_malloc() line 155 Memory allocation backtrace: #0: [0x45cc17] main/astmm.c:523 __ast_malloc() (0x45cbdb+3C) #1: [0x53674f] main/json.c:155 ast_json_malloc() (0x536724+2B) #2: [0x7f89a1cd72ee] libjansson.so.4 json_integer() (0x7f89a1cd72e0+E) #3: [0x7f89a1cd5c76] libjansson.so.4 <unknown>() #4: [0x7f89a1cd5afb] libjansson.so.4 <unknown>() #5: [0x7f89a1cd5f5f] libjansson.so.4 json_vpack_ex() (0x7f89a1cd5ed0+8F) #6: [0x537b6d] main/json.c:859 ast_json_vpack() (0x537b1c+51) #7: [0x537b0c] main/json.c:850 ast_json_pack() (0x537a65+A7) #8: [0x7f890b49ebb7] channels/chan_iax2.c:9082 update_registry() #9: [0x7f890b4ae9ed] channels/chan_iax2.c:11555 socket_process_helper() #10: [0x7f890b4b26e0] channels/chan_iax2.c:11974 socket_process() #11: [0x7f890b4b2dba] channels/chan_iax2.c:12086 iax2_process_thread() #12: [0x600b19] main/utils.c:1239 dummy_start() {noformat} 4. Further, I've observed some occurances of the following line during stress testing: {noformat} [2018-05-22 13:30:17] ERROR[21499]: ari/ari_websockets.c:175 ast_ari_websocket_session_write: Failed to encode JSON object {noformat} All this looks like a fatal race condition on the json objects ... By: Richard Mudgett (rmudgett) 2018-05-22 18:03:03.444-0500 Which version of libjansson are you using? A change was made https://gerrit.asterisk.org/#/c/8186/ that removed some locking to take advantage of some thread safe changes done in libjansson version 2.11. By: Stefan Repke (stefffan@gmx.de) 2018-05-23 04:05:39.569-0500 We use the Debian package libjansson of version 2.7-1+deb8u1. Further, I was curious about the advantages you mentioned and moved to Debian package 2.11-1. Then, all of the above problems are gone: * no crashes! * no "Memory corrupted after free of 0x... allocated at json.c ast_json_malloc() line 155"! * no "ERROR\[...\]: ari/ari_websockets.c:175 ast_ari_websocket_session_write: Failed to encode JSON object"! Hence, Asterisk has threadding issues when using an older version of libjansson (like Debian stable) ... |