Summary: | ASTERISK-24683: Crash in PBX ast_hashtab_lookup_internal during core restart now | ||||
Reporter: | Peter Katzmann (pk16208) | Labels: | |||
Date Opened: | 2015-01-14 08:36:13.000-0600 | Date Closed: | 2015-03-26 17:17:56 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Core/PBX | ||
Versions: | 11.12.1 11.13.1 11.14.2 11.15.0 | Frequency of Occurrence | Frequent | ||
Related Issues: |
| ||||
Environment: | Ubuntu precise | Attachments: | ( 0) backtrace030115.txt ( 1) backtrace040115.txt ( 2) backtrace050115.txt ( 3) bt03.txt ( 4) bt04.txt ( 5) bt05.txt ( 6) extensions.tar.bz2 | ||
Description: | Hello,
we see several core dumps through our nightly restarts. It occurs only during restart phase | ||||
Comments: | By: Matt Jordan (mjordan) 2015-01-14 09:27:24.716-0600 Which version of Asterisk are you running? IIRC, there were some improvements made to the reload operation in {{pbx.c}}, and I'd like to make sure that the version you are reproducing this on has (or does not have) those fixes. By: Peter Katzmann (pk16208) 2015-01-15 02:56:11.921-0600 This particular server is running with 11.13 By: Michael L. Young (elguero) 2015-01-15 07:20:15.503-0600 Me thinks these might be related... memory issues on the system? I see this in all the backtraces attached: {noformat} Thread 36 (Thread 0xb70ad6c0 (LWP 18819)): #0 0xb772a424 in __kernel_vsyscall () #1 0xb7658170 in poll () from /lib/i386-linux-gnu/libc.so.6 #2 0x08084b67 in ast_el_read_char (editline=0x9fe1be0, cp=0xbf89f97f "") at asterisk.c:2531 #3 0xb710664b in el_getc () from /usr/lib/i386-linux-gnu/libedit.so.2 #4 0xb71077f4 in el_gets () from /usr/lib/i386-linux-gnu/libedit.so.2 #5 0x0806968e in main (argc=<optimized out>, argv=<error reading variable: Cannot access memory at address 0x6>) at asterisk.c:4401 {noformat} By: Peter Katzmann (pk16208) 2015-01-15 07:53:44.471-0600 This is our biggest production server with over 2000 active users. The server logs did not show anything related to memory issues By: Michael L. Young (elguero) 2015-01-15 08:07:31.692-0600 I am just "speaking out loud". I am probably mis-reading something above - it was prompted from the other issue where the cli is outputting messages about not having memory. By: Matt Jordan (mjordan) 2015-01-15 20:14:33.018-0600 If you have the core file, can you print out the following in {{gdb}}: {noformat} # frame 2 # print *contexts_table {noformat} By: Peter Katzmann (pk16208) 2015-01-16 03:51:50.921-0600 Program terminated with signal 11, Segmentation fault. {noformat} #0 ast_format_cmp (format1=0x0, format2=0xb3f55314) at format.c:335 335 format.c: No such file or directory. (gdb) frame 2 #2 0x0811309a in ast_parse_allow_disallow (pref=0x8db4a10, cap=0x8db4488, list=0x912d796 "h263p", allowing=1) at frame.c:846 846 frame.c: No such file or directory. (gdb) print *contexts_table $1 = {array = 0x90b68f8, tlist = 0x8fe5138, compare = 0x814b0f0 <ast_hashtab_compare_contexts>, newsize = 0x8114f20 <ast_hashtab_newsize_java>, resize = 0x8114d30 <ast_hashtab_resize_java>, hash = 0x8149ea0 <ast_hashtab_hash_contexts>, hash_tab_size = 197, hash_tab_elements = 84, largest_bucket_size = 2, resize_count = 4, do_locking = 0, lock = {lock = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0, __flags = 0 '\000', __shared = 0 '\000', __pad1 = 0 '\000', __pad2 = 0 '\000', __writer = 0}, __size = '\000' <repeats 31 times>, __align = 0}, track = 0x0, tracking = 0}} (gdb) {noformat} By: Matt Jordan (mjordan) 2015-01-16 09:20:49.257-0600 What you've just provided is different than your other crashes. Can you provide the backtrace.txt from whatever core file you just loaded into gdb? By: Peter Katzmann (pk16208) 2015-01-16 09:34:36.683-0600 Yup you are right, here are now the right one: {noformat} for core.lnx06-asterisk1-2015-01-03T03:01:14+0100 (gdb) frame 2 548 hashtab.c: No such file or directory. #2 ast_hashtab_lookup (tab=0xb2739b8, obj=0xb3652b5c) at hashtab.c:548 (gdb) print *contexts_table $1 = {array = 0x9ea95b8, tlist = 0x977c990, compare = 0x814a6d0 <ast_hashtab_compare_contexts>, newsize = 0x8114a60 <ast_hashtab_newsize_java>, resize = 0x8114870 <ast_hashtab_resize_java>, hash = 0x8149480 <ast_hashtab_hash_contexts>, hash_tab_size = 163, hash_tab_elements = 69, largest_bucket_size = 3, resize_count = 3, do_locking = 0, lock = {lock = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0, __flags = 0 '\000', __shared = 0 '\000', __pad1 = 0 '\000', __pad2 = 0 '\000', __writer = 0}, __size = '\000' <repeats 31 times>, __align = 0}, track = 0x0, tracking = 0}} {noformat} {noformat} for core.lnx06-asterisk1-2015-01-04T03\:01\:28+0100 (gdb) frame 2 548 hashtab.c: No such file or directory. #2 ast_hashtab_lookup (tab=0xb6f016e0, obj=0xb3f8ec1c) at hashtab.c:548 (gdb) print *contexts_table $1 = {array = 0xa9cb5450, tlist = 0xb6f00040, compare = 0x814a6d0 <ast_hashtab_compare_contexts>, newsize = 0x8114a60 <ast_hashtab_newsize_java>, resize = 0x8114870 <ast_hashtab_resize_java>, hash = 0x8149480 <ast_hashtab_hash_contexts>, hash_tab_size = 163, hash_tab_elements = 68, largest_bucket_size = 3, resize_count = 3, do_locking = 0, lock = {lock = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0, __flags = 0 '\000', __shared = 0 '\000', __pad1 = 0 '\000', __pad2 = 0 '\000', __writer = 0}, __size = '\000' <repeats 31 times>, __align = 0}, track = 0x0, tracking = 0}} (gdb) {noformat} {noformat} for core.lnx06-asterisk1-2015-01-05T03\:01\:15+0100 (gdb) frame 2 6948 pbx.c: No such file or directory. #2 0x08147315 in find_context (context=<optimized out>) at pbx.c:6948 (gdb) print *contexts_table $1 = {array = 0xb3e00090, tlist = 0xb3e00090, compare = 0x814a6d0 <ast_hashtab_compare_contexts>, newsize = 0x8114a60 <ast_hashtab_newsize_java>, resize = 0x8114870 <ast_hashtab_resize_java>, hash = 0x8149480 <ast_hashtab_hash_contexts>, hash_tab_size = 163, hash_tab_elements = 68, largest_bucket_size = 3, resize_count = 3, do_locking = 0, lock = {lock = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0, __flags = 0 '\000', __shared = 0 '\000', __pad1 = 0 '\000', __pad2 = 0 '\000', __writer = 0}, __size = '\000' <repeats 31 times>, __align = 0}, track = 0x0, tracking = 0}} (gdb) {noformat} By: Matt Jordan (mjordan) 2015-01-16 09:49:29.064-0600 Please attach the full backtrace as a text file to the issue. By: Peter Katzmann (pk16208) 2015-01-16 10:40:11.229-0600 Made backtraces , hope the info will be enough if not, please give detailed order for stupid minded By: Matt Jordan (mjordan) 2015-01-21 07:47:13.057-0600 Can you attach your {{extensions.conf}} as well? By: Peter Katzmann (pk16208) 2015-01-21 08:51:26.435-0600 extensions attached By: Corey Farrell (coreyfarrell) 2015-02-09 18:04:55.211-0600 I just looked over your backtraces. One thread shows logger_close() is currently running. This means that pbx_shutdown() has already completed, contexts_table has already been destroyed. Setting {{contexts_table = NULL}} after it is destroyed would likely help, but not eliminate the risk of crash during a restart or stop 'now'. By: Matt Jordan (mjordan) 2015-02-09 20:10:19.593-0600 That's odd. If we're that far along in our shutdown sequence, why is {{chan_sip}} still running? How are you initiating the restart/shutdown? By: Corey Farrell (coreyfarrell) 2015-02-09 20:23:37.500-0600 The backtraces show {{core restart now}} being run through a netconsole. This means that modules are not shutdown, but most parts of the core are. By: Peter Katzmann (pk16208) 2015-02-10 03:13:17.368-0600 It's an cron job doing rasterisk -rx "core restart now" In the night at 3 o Clock , when there shoudl nothing going on By: Corey Farrell (coreyfarrell) 2015-02-11 08:59:40.775-0600 The backtrace shows a SIP SUBSCRIBE caused the crash by trying to lookup a context when the PBX core is already shutdown. You may be right about the lack of calls, but that doesn't mean that 'nothing' is going on. |