[Home]

Summary:ASTERISK-26054: Asterisk crashes (core dump)
Reporter:B. Davis (just4fun07)Labels:
Date Opened:2016-05-24 12:36:48Date Closed:2016-06-07 12:11:16
Priority:CriticalRegression?
Status:Closed/CompleteComponents:CDR/cdr_custom
Versions:13.9.1 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:cat /etc/*release* SHMZ release 6.6 (Final) SHMZ release 6.6 (Final) SHMZ release 6.6 (Final) SHMZ release 6.6 (Final) cpe:/o:schmooze:linux:6:GA lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Stepping: 2 CPU MHz: 2399.943 BogoMIPS: 4799.33 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 15360K NUMA node0 CPU(s): 0-5,12-17 NUMA node1 CPU(s): 6-11,18-23 vmstat procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 224716 241680 30706788 0 0 0 3 1 1 0 0 100 0 0Attachments:( 0) backtrace.txt
Description:New installation, CDR records stored on external database over a dedicated network interface, system appears to run fine and then randomly once every day or two has a core dump.

System operates with about 100-180 active calls w/ about 250-300 channels open.

System has 32GB RAM and ram shows as mostly cashed.

Attached is the backtrace.
Comments:By: Asterisk Team (asteriskteam) 2016-05-24 12:36:49.090-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: B. Davis (just4fun07) 2016-05-24 12:38:32.050-0500

Backtrace

By: Richard Mudgett (rmudgett) 2016-05-24 12:54:26.703-0500

Thank you for the crash report. However, we need more information to investigate the crash. Please provide:

1. A backtrace generated from a core dump using the instructions provided on the Asterisk wiki [1].
2. Specific steps taken that lead to the crash.
3. All configuration information necesary to reproduce the crash.

Thanks!

[1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

The backtrace you supplied has little in the way of symbolic information.  However, it is showing an abort in memory allocation which tends to indicate memory corruption.

By: Richard Mudgett (rmudgett) 2016-05-24 12:57:36.730-0500

Your backtrace appears to contain a memory corruption. We need one or both of the following items to continue investigation of the issue:
1. Valgrind output. See https://wiki.asterisk.org/wiki/display/AST/Valgrind for instructions on how to use Valgrind with Asterisk.
2. MALLOC_DEBUG output. See https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag for instructions on how to use the MALLOC_DEBUG option.

Note that MALLOC_DEBUG and Valgrind are mutually exclusive options. Valgrind output is preferable, but will be more system resource intensive and may be difficult to get on a production system. In such a case, you may have better luck getting the necessary output from MALLOC_DEBUG.



By: Etienne Lessard (hexanol) 2016-05-25 12:33:15.067-0500

Hello,

we started having a similar issue (i.e. having random asterisk process termination caused by an ABRT signal raised by libc after detecting memory corruption) starting with Asterisk 13.8.0 (it was working fine with Asterisk 13.7.2). I'm currently trying to isolate the problem, but haven't been able to precisely pinpoint it since I'm having some trouble reproducing it in a systematic way.

That said, the problem seems to be from one of the "odbc components", i.e. most likely in either res_odbc, res_config_odbc, cel_odbc, or in unixodbc or the ODBC driver (I'm using psqlodbc). I say that because:

* on our "load test" system, we are using res_odbc both to store CEL (via cel_odbc) and the queue_log (via res_config_odbc). This is a 32 bits Debian 8 system.
* after upgrading to asterisk 13.8, the asterisk process started crashing once or twice a day: same thing with asterisk 13.9
* after disabling all the odbc related stuff in asterisk, it stopped crashing
* I've been able to make asterisk crash with a simple module that calls "ast_store_realtime" repeatedly from multiple threads (it took 1 200 000 tries before crashing with a memory corruption error the first time)
* I've tried to reproduce it on another system but I've not been able yet
* I'm currently trying to run it under valgrind, but I've not seen anything interesting yet
* If you look at "B. Davis" backtrace, you'll see that he's also using odbc in asterisk:

{code}
Thread 303 (Thread 0x7fc95643e700 (LWP 6759)):
#0  0x00007fc95869857d in write () from /lib64/libc.so.6
#1  0x00007fc95862ead3 in _IO_new_file_write () from /lib64/libc.so.6
#2  0x00007fc958630085 in _IO_new_do_write () from /lib64/libc.so.6
#3  0x00007fc958630df3 in _IO_flush_all_lockp () from /lib64/libc.so.6
#4  0x00007fc9585f0eb9 in abort () from /lib64/libc.so.6
#5  0x00007fc95862d537 in __libc_message () from /lib64/libc.so.6
#6  0x00007fc958632f4e in malloc_printerr () from /lib64/libc.so.6
#7  0x00007fc958635cf0 in _int_free () from /lib64/libc.so.6
#8  0x00007fc8e58c41fe in my_SQLFreeEnv () from /usr/lib64/libmyodbc5.so
#9  0x00007fc954c3fc38 in ?? () from /usr/lib64/libodbc.so.2
#10 0x00007fc954c40829 in ?? () from /usr/lib64/libodbc.so.2
#11 0x00007fc954c4527a in SQLDisconnect () from /usr/lib64/libodbc.so.2
#12 0x00007fc954e9e64c in ?? () from /usr/lib64/asterisk/modules/res_odbc.so
#13 0x00007fc954e9c5ce in ?? () from /usr/lib64/asterisk/modules/res_odbc.so
#14 0x000000000045cc3a in ?? ()
#15 0x000000000045cf1d in __ao2_ref ()
#16 0x00007fc954e9e40a in ast_odbc_release_obj () from /usr/lib64/asterisk/modules/res_odbc.so
#17 0x00007fc8c0770b03 in ?? () from /usr/lib64/asterisk/modules/cel_odbc.so
#18 0x00000000004a82c4 in ?? ()
#19 0x000000000045de00 in ?? ()
#20 0x000000000045e133 in __ao2_callback ()
#21 0x00000000004a844c in ?? ()
#22 0x00000000004a99fa in ?? ()
#23 0x00000000004a9c8f in ?? ()
#24 0x00000000005dae29 in ?? ()
#25 0x00000000005c97f1 in ?? ()
#26 0x00000000005ca38d in ?? ()
#27 0x00000000005e72f6 in ast_taskprocessor_execute ()
#28 0x00000000005e58ab in ?? ()
#29 0x00000000005fb85d in ?? ()
#30 0x00007fc95931daa1 in start_thread () from /lib64/libpthread.so.0
#31 0x00007fc9586a593d in clone () from /lib64/libc.so.6
{code}

Note that I'm not running the latest version of unixodbc (using 2.3.1, latest is 2.3.4), nor the latest version of psqlodbc (using 09.03.0300, latest is 09.05.0210). I do plan on trying these out. I've not enabled connection pooling in unixodbc neither yet.

By: Michael L. Young (elguero) 2016-05-26 15:59:38.777-0500

@Etienne

Since it sounds like your issue may be related to odbc on Debian, take a look at this issue: ASTERISK-25891

By: Etienne Lessard (hexanol) 2016-05-30 08:28:38.689-0500

Indeed, updating both unixodbc and psqlodbc to the latest available versions has fixed the issue, i.e. it is not crashing with memory corruption anymore. I don't know if updating both unixodbc and psqlodbc was strictly necessary, or just updating unixodbc would have been sufficient. Anyway, sorry for the noise.

That said, my tests seems to show that there's a new memory leak between Asterisk 13.7 and Asterisk 13.9. I'm already aware of ASTERISK-25262, but when I compare the memory usage from our 13.7 tests with our 13.9 tests, the rate at which memory is consumed by asterisk has grown between these 2 versions. Now, the question is, who's the culprit... (I hope it's not the updated ODBC libraries...). Someone is aware of it ?

@Davis since you are also using ODBC, I'm assuming you have a similar issue. You should try either to update unixodbc and your mysql ODBC driver, or stop using ODBC from asterisk and see if it still crashes.

By: B. Davis (just4fun07) 2016-07-27 19:29:36.181-0500

Sorry I was away for a little while, I am going to take some of these suggestions and see if we have some improvements and maybe get some more details.

This issue is seen on a FreePBX distro, and under the hood we have updated the unixodbc but have only seen a reduction in the occurrences we still crash about once a week.

Last week we replaced some RAM because we were seeing some disk swapping.

After replacing the RAM we have not had a coredump yet. but I will keep a close eye on it and update if any changes.

Running unixODBC-2.3.4, Server version: 5.1.73 Source distribution, FreePBX Distro 13,


By: Asterisk Team (asteriskteam) 2016-07-27 19:29:36.324-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Joshua C. Colp (jcolp) 2016-07-27 19:34:56.999-0500

The usage of UnixODBC for pooling (which exposed problems in its implementation and the database connectors) was discontinued as of Asterisk 13.10.0, and our own pooling implementation was put into place instead. This has resolved the issue with no upgrades of UnixODBC or the connector required.