[Home]

Summary:ASTERISK-22208: Extremely high CPU load and segfaults (in ast_hashtab_start_traversal) on a reload
Reporter:David Brillert (aragon)Labels:
Date Opened:2013-07-29 16:31:07Date Closed:2013-11-15 08:32:50.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/General Core/Configuration
Versions:11.5.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:CentOS 5.9 32 bitAttachments:( 0) 7_26_2013_gdb_thread_apply_all_bt_full.txt
( 1) core_31T09_gdb_bt_full.txt
( 2) core_show_locks.txt
( 3) core_12843_segfault_bt_full.txt
( 4) core_show_locks_8192013.txt
( 5) extensions.rar
( 6) features.conf
( 7) gdb_thread_apply_all_bt_8192013.txt
( 8) lab_7_31_2013_gdb_thread_apply_all_bt_full.txt
( 9) messages.rar
(10) valgrind_july_31.rar
(11) valgrind.rar
Description:CPU load in htop will register in the 60's and or 80's after a reload until I restart Asterisk.

Ticket is open at the request of Rusty Newton in relationship to ASTERISK-22093
Comments:By: David Brillert (aragon) 2013-07-31 09:32:07.845-0500

After more lab testing I have produced some high loads and some segfaults with DONT_OPTIMIZE AND DEBUG_THREADS compiled.

The load gets very high during the Asterisk reload and the server is not processing any calls, 100% idle.

lab_7_31_2013 gdb thread apply all bt full.txt is a gdb attach to the Asterisk PID.  core_show_locks.txt is the output

By: David Brillert (aragon) 2013-07-31 09:54:18.949-0500

Since I am seeing segfaults now I am raising the priority.

By: David Brillert (aragon) 2013-07-31 12:54:37.752-0500

If I look at the first few lines of the backtrace
#0  0x08116809 in ast_hashtab_start_traversal ()
(gdb) bt
#0  0x08116809 in ast_hashtab_start_traversal ()
#1  0x08161341 in ast_merge_contexts_and_delete ()
#2  0x00abda9c in pbx_load_module () at pbx_config.c:1855
#3  0x081210fa in ast_module_reload ()
#4  0x080c812e in handle_core_reload ()
#5  0x080c9949 in ast_cli_command_full ()

And I watch the reload the suspicious part is this:


[2013-07-26 13:12:26]     -- Registered extension context 'parkedcalls'; registrar: features
[2013-07-26 13:12:26]     -- merging incls/swits/igpats from old(parkedcalls) to new(parkedcalls) context, registrar = pbx_config
[2013-07-26 13:12:26]     -- Added extension '7000' priority 1 to parkedcalls
[2013-07-26 13:12:26]     -- merging incls/swits/igpats from old(parkedcalls_default) to new(parkedcalls_default) context, registrar = pbx_config
[2013-07-26 13:12:26] WARNING[1474]: pbx.c:9619 add_priority: Unable to register extension '7000' priority 1 in 'parkedcalls_default', already in use
[2013-07-26 13:12:26]     -- Dropping old dialplan item parkedcalls_default/7000/1 [Park()] (registrar=features) due to conflict with new dialplan
[2013-07-26 13:12:26]     -- merging incls/swits/igpats from old(parkedcalls_tenant) to new(parkedcalls_tenant) context, registrar = pbx_config
[2013-07-26 13:12:26] WARNING[1474]: pbx.c:9619 add_priority: Unable to register extension '7000' priority 1 in 'parkedcalls_tenant', already in use
[2013-07-26 13:12:26]     -- Dropping old dialplan item parkedcalls_tenant/7000/1 [Park()] (registrar=features) due to conflict with new dialplan

Attaching features.conf and a new valgrind trace during today's testing.


By: Rusty Newton (rnewton) 2013-08-06 16:34:51.420-0500

Thanks for all the debug!

David can you also attach an Asterisk full log (including VERBOSE and DEBUG) showing from start of Asterisk through reload and to shutdown?

Can you also attach extensions.conf? Sanitized if need be, but still verify you can use it to reproduce the issue.

By: Rusty Newton (rnewton) 2013-08-06 16:38:56.046-0500

If you can additionally get another backtrace with the compiler flag "BETTER_BACKTRACES" enabled that would be extremely helpful as well.

By: David Brillert (aragon) 2013-08-06 18:17:10.074-0500

extensions.conf attached
messages.txt file attached
Can't compile with BETTER_BACKTRACES without some tools and libraries we don't yet support.  So I hope that isn't a showstopper.

By: David Brillert (aragon) 2013-08-19 08:44:56.386-0500

I ran core show locks and gdb thread commands when I thought Asterisk was locking during a reload.  In other words when I see Asterisk reload hanging just after  

[2013-08-19 09:34:24]   == Parsing '/etc/asterisk/users.conf': Found
[2013-08-19 09:34:24]     -- Registered extension context 'parkedcalls'; registrar: features
[2013-08-19 09:34:24]     -- merging incls/swits/igpats from old(parkedcalls) to new(parkedcalls) context, registrar = pbx_config
[2013-08-19 09:34:24]     -- Added extension '7000' priority 1 to parkedcalls
[2013-08-19 09:34:24]     -- merging incls/swits/igpats from old(parkedcalls_default) to new(parkedcalls_default) context, registrar = pbx_config
[2013-08-19 09:34:24] WARNING[4007]: pbx.c:9619 add_priority: Unable to register extension '7000' priority 1 in 'parkedcalls_default', already in use
[2013-08-19 09:34:24]     -- Dropping old dialplan item parkedcalls_default/7000/1 [Park()] (registrar=features) due to conflict with new dialplan
[2013-08-19 09:34:24]     -- merging incls/swits/igpats from old(parkedcalls_tenant) to new(parkedcalls_tenant) context, registrar = pbx_config
[2013-08-19 09:34:24] WARNING[4007]: pbx.c:9619 add_priority: Unable to register extension '700' priority 1 in 'parkedcalls_tenant', already in use
[2013-08-19 09:34:24]     -- Dropping old dialplan item parkedcalls_tenant/700/1 [Park()] (registrar=features) due to conflict with new dialplan
demo*CLI> core show locks


I had also compiled with BETTER_BACKTRACES DONT_OPTIMIZE DEBUG_THREADS and I'm waiting for another segfault to provide the requested core dump output.

Core and gdb output is
core show locks 8192013.txt
gdb thread apply all bt 8192013.txt

By: David Brillert (aragon) 2013-11-15 08:32:50.338-0600

This was caused by subscription monitoring of lots of extensions.  The fix was to change to dynamic hints.