[Home]

Summary:ASTERISK-27149: Segfault on reload
Reporter:amirali (amirali)Labels:
Date Opened:2017-07-22 09:40:05Date Closed:2020-01-14 11:13:48.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:
Versions:14.6.0 Frequency of
Occurrence
Related
Issues:
Environment:Debian 8 64bitAttachments:( 0) backtrace.txt
( 1) backtrace.txt
( 2) backtrace-threads.txt
( 3) core-brief.txt
( 4) core-brief.txt
( 5) core-full.txt
( 6) core-full.txt
( 7) core-locks.txt
( 8) core-locks.txt
( 9) core-thread1.txt
(10) core-thread1.txt
(11) Debug.txt
(12) full.txt
Description:Hello
Dear asterisk team we are facing an issue on one of our machine which runs debian 8 64 bit .
The issue hit us both on asterisk 13 and 14 .
on asterisk 14 once the core reload command issued for second time the asterisk abort with segfault .
Also on version 13 it just segfaulted randomly .
I have created core dump for version 14 which happens on issuing core reload for second time .
I have maybe tracked it down to  cdr_mysql.so module as when trying to unload it the segfaults happens too.
Regards
Comments:By: Asterisk Team (asteriskteam) 2017-07-22 09:40:06.861-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Rusty Newton (rnewton) 2017-07-24 18:16:35.904-0500

{quote}
I have created core dump for version 14 which happens on issuing core reload for second time .
I have maybe tracked it down to cdr_mysql.so module as when trying to unload it the segfaults happens too.
{quote}

It sound like the traces you included were from the scenario where you perform a core reload.

Can you also attach a trace from when you manually reload only cdr_mysql.so?

Please also attach a debug log from both scenarios: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

That is, one debug log captured during a core reload and another captured during the cdr_mysql.so reload.

Thanks!

By: amirali (amirali) 2017-07-25 05:36:03.742-0500

Dear Newton
I was not clearly been able to produce it while loading/unload cdr_mysql.so ,it just seems random but some how related to mysql module.
I created new core dump along the full and debug log for asterisk.
The issue tends to always happen upon core reload.


By: Rusty Newton (rnewton) 2017-07-27 08:11:05.981-0500

I'm no developer, but both of the crashes look a little different. Yet both of the traces are running through ast_merge_contexts_and_delete and context_merge functions. One seems to happen possibly during pbx_lua configuration initialization if I'm reading it right.

Are you using Lua dialplan? If not, can you remove the lua default configs and noload the lua modules in modules.conf?

By: amirali (amirali) 2017-08-03 13:46:25.911-0500

Well we are not using any LUA dial plan .I will no load the module and report back.
We have too much strange issues with asterisk on this machine which im not able to pinpoint .
This same machine when using asterisk 13 keeps segfaulting upon new calls from sangoma tdm card using sangoma /dahdi drivers.
Using version 14 core reload cause segfault and on daily basis there are too many stuck channels in asterisk which eventually make asterisk unresponsive .
The only guess i have is dahdi / sangoma drivers are not compatible with linux kernel version 3 and causing some internal lock in asterisk somehow.
Also may i know that if any kind of strange or old or somehow deformed dial plan can cause such a things in asterisk ?
I will try to pinpoint the issue and report back if possible .



By: Joshua C. Colp (jcolp) 2017-08-04 08:55:31.195-0500

Bad dialplan shouldn't cause these kind of problems, but do noload the lua module and report back if you are able to isolate things further.

By: amirali (amirali) 2017-08-05 02:44:56.576-0500

I unloaded the pbx_lua.so but the segfault continues to happen .
This time it tend to always happen at astobj2.c

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000464d06 in __ao2_ref (user_data=0xdeaddeaddeaddead, delta=-1, tag=0x663480 "", file=0x663460 "format_cap.c", line=91, func=0x663570 <__PRETTY_FUNCTION__.8746> "format_cap_destroy") at astobj2.c:468
468             struct astobj2 *obj = __INTERNAL_OBJ_CHECK(user_data, file, line, func);

I uploaded new stack trace

By: Rusty Newton (rnewton) 2017-08-14 18:39:52.865-0500

Amirali, if I understand correctly - you have other machines running a different distribution of Linux, but with a similar Asterisk configuration (same version?) that do not experience the problem?

I'm trying to better understand the scope of the issue in your environment.

By: amirali (amirali) 2017-08-15 00:47:40.306-0500

Dear rusty ,
First we faced this issue on a production server using debian 8 64 bit my first guess was this is because of sangoma tdm card drivers which are not fully compatible with kernel 3.
But upon further tests i found our development machine suffers same issue upon core reload.
Configuration is the same on both machines .
I still think this maybe caused by some wrong configuration or some how wrong agis , but i can not find clear way to continue tshooting it .
One test i did was moving our /etc/asterisk configuration files and using asterisk samples and it seems to be working ok so i guess it is configuration problem which cause the segfault .


By: Rusty Newton (rnewton) 2017-08-16 09:04:22.094-0500

How extensive is your Asterisk configuration?

Can you attach the complete configuration that reproduces the issue with a reload?

Feel free to sanitize any private information out of course.

By: Asterisk Team (asteriskteam) 2017-08-30 12:00:02.643-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines