ASTERISK-24683: Crash in PBX ast_hashtab_lookup_internal during core restart now

[Home]

Summary: ASTERISK-24683: Crash in PBX ast_hashtab_lookup_internal during core restart now

Reporter: Peter Katzmann (pk16208) Labels:

Date Opened: 2015-01-14 08:36:13.000-0600 Date Closed: 2015-03-26 17:17:56

Priority: Major Regression?

Status: Closed/Complete Components: Core/PBX

Versions: 11.12.1 11.13.1 11.14.2 11.15.0 Frequency of
Occurrence Frequent

Related
Issues:
is related to ASTERISK-24684 crash during sip reload

Environment: Ubuntu precise Attachments: ( 0) backtrace030115.txt
( 1) backtrace040115.txt
( 2) backtrace050115.txt
( 3) bt03.txt
( 4) bt04.txt
( 5) bt05.txt
( 6) extensions.tar.bz2

Description: Hello,
we see several core dumps through our nightly restarts.
It occurs only during restart phase

Comments: By: Matt Jordan (mjordan) 2015-01-14 09:27:24.716-0600

Which version of Asterisk are you running? IIRC, there were some improvements made to the reload operation in {{pbx.c}}, and I'd like to make sure that the version you are reproducing this on has (or does not have) those fixes.
By: Peter Katzmann (pk16208) 2015-01-15 02:56:11.921-0600

This particular server is running with 11.13
By: Michael L. Young (elguero) 2015-01-15 07:20:15.503-0600

Me thinks these might be related... memory issues on the system?

I see this in all the backtraces attached:

{noformat}
Thread 36 (Thread 0xb70ad6c0 (LWP 18819)):
#0 0xb772a424 in __kernel_vsyscall ()
#1 0xb7658170 in poll () from /lib/i386-linux-gnu/libc.so.6
#2 0x08084b67 in ast_el_read_char (editline=0x9fe1be0, cp=0xbf89f97f "") at asterisk.c:2531
#3 0xb710664b in el_getc () from /usr/lib/i386-linux-gnu/libedit.so.2
#4 0xb71077f4 in el_gets () from /usr/lib/i386-linux-gnu/libedit.so.2
#5 0x0806968e in main (argc=<optimized out>, argv=<error reading variable: Cannot access memory at address 0x6>) at asterisk.c:4401
{noformat}
By: Peter Katzmann (pk16208) 2015-01-15 07:53:44.471-0600

This is our biggest production server with over 2000 active users.

The server logs did not show anything related to memory issues
By: Michael L. Young (elguero) 2015-01-15 08:07:31.692-0600

I am just "speaking out loud". I am probably mis-reading something above - it was prompted from the other issue where the cli is outputting messages about not having memory.
By: Matt Jordan (mjordan) 2015-01-15 20:14:33.018-0600

If you have the core file, can you print out the following in {{gdb}}:

{noformat}
# frame 2
# print *contexts_table
{noformat}
By: Peter Katzmann (pk16208) 2015-01-16 03:51:50.921-0600

Program terminated with signal 11, Segmentation fault.

{noformat}
#0 ast_format_cmp (format1=0x0, format2=0xb3f55314) at format.c:335
335 format.c: No such file or directory.
(gdb) frame 2
#2 0x0811309a in ast_parse_allow_disallow (pref=0x8db4a10, cap=0x8db4488, list=0x912d796 "h263p", allowing=1) at frame.c:846
846 frame.c: No such file or directory.
(gdb) print *contexts_table
$1 = {array = 0x90b68f8, tlist = 0x8fe5138, compare = 0x814b0f0 <ast_hashtab_compare_contexts>, newsize = 0x8114f20 <ast_hashtab_newsize_java>,
resize = 0x8114d30 <ast_hashtab_resize_java>, hash = 0x8149ea0 <ast_hashtab_hash_contexts>, hash_tab_size = 197, hash_tab_elements = 84, largest_bucket_size = 2, resize_count = 4,
do_locking = 0, lock = {lock = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0,
__flags = 0 '\000', __shared = 0 '\000', __pad1 = 0 '\000', __pad2 = 0 '\000', __writer = 0}, __size = '\000' <repeats 31 times>, __align = 0}, track = 0x0, tracking = 0}}
(gdb)
{noformat}
By: Matt Jordan (mjordan) 2015-01-16 09:20:49.257-0600

What you've just provided is different than your other crashes. Can you provide the backtrace.txt from whatever core file you just loaded into gdb?
By: Peter Katzmann (pk16208) 2015-01-16 09:34:36.683-0600

Yup you are right, here are now the right one:
{noformat}
for core.lnx06-asterisk1-2015-01-03T03:01:14+0100
(gdb) frame 2
548 hashtab.c: No such file or directory.
#2 ast_hashtab_lookup (tab=0xb2739b8, obj=0xb3652b5c) at hashtab.c:548
(gdb) print *contexts_table
$1 = {array = 0x9ea95b8, tlist = 0x977c990, compare = 0x814a6d0 <ast_hashtab_compare_contexts>, newsize = 0x8114a60 <ast_hashtab_newsize_java>,
resize = 0x8114870 <ast_hashtab_resize_java>, hash = 0x8149480 <ast_hashtab_hash_contexts>, hash_tab_size = 163, hash_tab_elements = 69, largest_bucket_size = 3, resize_count = 3,
do_locking = 0, lock = {lock = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0,
__flags = 0 '\000', __shared = 0 '\000', __pad1 = 0 '\000', __pad2 = 0 '\000', __writer = 0}, __size = '\000' <repeats 31 times>, __align = 0}, track = 0x0, tracking = 0}}
{noformat}
{noformat}
for core.lnx06-asterisk1-2015-01-04T03\:01\:28+0100
(gdb) frame 2
548 hashtab.c: No such file or directory.
#2 ast_hashtab_lookup (tab=0xb6f016e0, obj=0xb3f8ec1c) at hashtab.c:548
(gdb) print *contexts_table
$1 = {array = 0xa9cb5450, tlist = 0xb6f00040, compare = 0x814a6d0 <ast_hashtab_compare_contexts>, newsize = 0x8114a60 <ast_hashtab_newsize_java>,
resize = 0x8114870 <ast_hashtab_resize_java>, hash = 0x8149480 <ast_hashtab_hash_contexts>, hash_tab_size = 163, hash_tab_elements = 68, largest_bucket_size = 3, resize_count = 3,
do_locking = 0, lock = {lock = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0,
__flags = 0 '\000', __shared = 0 '\000', __pad1 = 0 '\000', __pad2 = 0 '\000', __writer = 0}, __size = '\000' <repeats 31 times>, __align = 0}, track = 0x0, tracking = 0}}
(gdb)
{noformat}
{noformat}
for core.lnx06-asterisk1-2015-01-05T03\:01\:15+0100

(gdb) frame 2
6948 pbx.c: No such file or directory.
#2 0x08147315 in find_context (context=<optimized out>) at pbx.c:6948
(gdb) print *contexts_table
$1 = {array = 0xb3e00090, tlist = 0xb3e00090, compare = 0x814a6d0 <ast_hashtab_compare_contexts>, newsize = 0x8114a60 <ast_hashtab_newsize_java>,
resize = 0x8114870 <ast_hashtab_resize_java>, hash = 0x8149480 <ast_hashtab_hash_contexts>, hash_tab_size = 163, hash_tab_elements = 68, largest_bucket_size = 3, resize_count = 3,
do_locking = 0, lock = {lock = {__data = {__lock = 0, __nr_readers = 0, __readers_wakeup = 0, __writer_wakeup = 0, __nr_readers_queued = 0, __nr_writers_queued = 0,
__flags = 0 '\000', __shared = 0 '\000', __pad1 = 0 '\000', __pad2 = 0 '\000', __writer = 0}, __size = '\000' <repeats 31 times>, __align = 0}, track = 0x0, tracking = 0}}
(gdb)
{noformat}
By: Matt Jordan (mjordan) 2015-01-16 09:49:29.064-0600

Please attach the full backtrace as a text file to the issue.
By: Peter Katzmann (pk16208) 2015-01-16 10:40:11.229-0600

Made backtraces , hope the info will be enough if not, please give detailed order for stupid minded
By: Matt Jordan (mjordan) 2015-01-21 07:47:13.057-0600

Can you attach your {{extensions.conf}} as well?
By: Peter Katzmann (pk16208) 2015-01-21 08:51:26.435-0600

extensions attached
By: Corey Farrell (coreyfarrell) 2015-02-09 18:04:55.211-0600

I just looked over your backtraces. One thread shows logger_close() is currently running. This means that pbx_shutdown() has already completed, contexts_table has already been destroyed. Setting {{contexts_table = NULL}} after it is destroyed would likely help, but not eliminate the risk of crash during a restart or stop 'now'.
By: Matt Jordan (mjordan) 2015-02-09 20:10:19.593-0600

That's odd. If we're that far along in our shutdown sequence, why is {{chan_sip}} still running?

How are you initiating the restart/shutdown?
By: Corey Farrell (coreyfarrell) 2015-02-09 20:23:37.500-0600

The backtraces show {{core restart now}} being run through a netconsole. This means that modules are not shutdown, but most parts of the core are.
By: Peter Katzmann (pk16208) 2015-02-10 03:13:17.368-0600

It's an cron job doing
rasterisk -rx "core restart now"

In the night at 3 o Clock , when there shoudl nothing going on
By: Corey Farrell (coreyfarrell) 2015-02-11 08:59:40.775-0600

The backtrace shows a SIP SUBSCRIBE caused the crash by trying to lookup a context when the PBX core is already shutdown. You may be right about the lack of calls, but that doesn't mean that 'nothing' is going on.