[Home]

Summary:ASTERISK-25370: res_corosync segfaults at startup with corosync version > 2.x
Reporter:mdu113 (mdu113)Labels:
Date Opened:2015-09-03 16:43:55Date Closed:2017-07-13 15:53:11
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_corosync
Versions:11.19.0 Frequency of
Occurrence
Constant
Related
Issues:
is related toASTERISK-24343 res_corosync segfaults in dispatch_thread_handler
Environment:linux 64bit kernel 3.10.17 distro slackware64 14.1 corosync 2.3.5Attachments:( 0) backtrace.txt
( 1) backtrace2.txt
( 2) menuselect.makeopts
( 3) valgrind.txt
Description:Asterisk crashes at startup when trying to load res_corosync.so. The content of res_corosync.conf doesn't matter. As long as file exists (even empty one) asterisk segfaults.
The issue seems identical to the one described in ASTERISK-24343, but since that issue was closed due to no feedback, I'm opening a new one.
Backtrace is attached.
I've also tested it in asterisk 13.5.0 and got the same result - asterisk segfaults. The backtrace is taken from 11.19.0 though as that one I'm currently using.
Thanks
Comments:By: Asterisk Team (asteriskteam) 2015-09-03 16:43:56.339-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Russell Bryant (russell) 2015-09-03 17:34:14.164-0500

If this is trivial to reproduce, can you run asterisk under valgrind and post the output?

By: mdu113 (mdu113) 2015-09-08 16:29:34.580-0500

sorry for the late response - for some reason i didn't get notification that the issue is waiting for feedback.
anyway i followed instruction here: https://wiki.asterisk.org/wiki/display/AST/Valgrind. asterisk didn't crash under
valgring (it crashes 100% without it). i've attached valgrind.txt.


By: Rusty Newton (rnewton) 2015-09-09 07:49:47.148-0500

[~mdu113] can you reproduce this with a default installation of Asterisk and sample config files?

That is, can you reproduce this issue without touching anything on a fresh installation?

It'll help to narrow it down to something in your configuration/build or something in the environment (if that is possible - it could be a combination).

By: mdu113 (mdu113) 2015-09-09 09:47:52.442-0500

yes, i didn't do any alterations. it's default installation. the only config file i left in /etc/asterisk is modules.conf with "autoload=yes".
as before if do "touch /etc/asterisk/res_corosync.conf" then asterisk segfaults.
if it helps asterisk is compiled as follows:

VERSION=11.19.0
PREFIX=/usr/local/asterisk-${VERSION}
./configure \
 --prefix=$PREFIX \
 --libdir=$PREFIX/lib \
 --sysconfdir=$PREFIX/etc \
 --localstatedir=$PREFIX/var \
 --with-postgres=/usr/local/pgsql \
 --with-cap \
 --with-cpg

i've also attached menuselect.makeopts

By: Rusty Newton (rnewton) 2015-09-17 11:34:21.813-0500

I've tried reproducing this issue in a variety of ways and I can't get it to crash.

bq. The content of res_corosync.conf doesn't matter. As long as file exists (even empty one) asterisk segfaults.

Are you sure about this? I've tried with the sample file, blank file, no file, etc and I can't get it to crash.

Something else is perhaps triggering the issue. Since you can't reproduce the issue with valgrind running (did you try MALLOC_DEBUG?) and we can't reproduce the issue then we'll probably close this out unless you can narrow down how to reproduce the problem.

If you can reproduce the issue with MALLOC_DEBUG output that may be helpful..

By: Russell Bryant (russell) 2015-09-17 12:08:19.549-0500

Make sure you test with the same corosync version, at least.

Related: https://github.com/corosync/corosync/issues/57#issuecomment-137867703

By: mdu113 (mdu113) 2015-09-18 13:47:01.343-0500

backtrace taken with MACLLOC_DEBUG enabled

By: mdu113 (mdu113) 2015-09-18 13:51:04.417-0500

it still crashes if res_corosync.conf file exists.
Will it help if I provide you with access to my testing machine?
Please note, I'll be out for a week. I'll reply when I return. thanks.

By: mdu113 (mdu113) 2015-09-18 13:55:09.095-0500

BTW, not sure if it matters, my testing machine is actually virtual machine running in virtualbox. I've tested it before with asterisk 1.8 and res_ais.so and it worked (meaning didn't crash. there were other issues with res_ais.so though)

By: Rusty Newton (rnewton) 2015-09-24 18:03:03.171-0500

Please attach the mmlog file from your crash with MALLOC_DEBUG enabled.

https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag

By: mdu113 (mdu113) 2015-09-28 12:23:42.861-0500

Unfortunately, it's not being created under crash scenario. To verify if I compiled asterisk properly, I've tried to disable res_corosync.so loading (so to avoid crash) and then mmlog file is created just fine, so I guess I did compile is correctly. When res_corosync loading is enabled and asterisk crashes the file is not created, so I guess the crash happens before asterisk has a chance to create it

By: Rusty Newton (rnewton) 2015-10-01 07:32:45.465-0500

I'm not sure there is anywhere to go from here.

You can't get valgrind or mmlog debug so that the issue can be examined further. We can't reproduce.. the issue only happens on your system as far as we know.

I'm going to ping [~russell] to see if he has any further ideas.

By: mdu113 (mdu113) 2015-10-01 15:17:39.340-0500

Like I said I can give you access to the machine where it happens for further investigation.
The issue was reported by at least two more people, so I guess it's not unique to me:
https://github.com/corosync/corosync/issues/57
ASTERISK-24343

By: Asterisk Team (asteriskteam) 2015-10-16 12:00:21.210-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: mdu113 (mdu113) 2015-10-17 18:25:49.878-0500

Well the issue still exists and still is a problem. I'm willing to help any way I could and my offer to provide access to the machine exposing the problem still stands. I'd really appreciate if some of the developers could take a closer look at the issue.

By: Asterisk Team (asteriskteam) 2015-10-17 18:25:50.330-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: mdu113 (mdu113) 2015-10-20 14:29:50.846-0500

Additional peace of information. I downgraded corosync to 1.4.7 and recompiled asterisk against it.
Now asterisk loads res_corosync just fine and device_state sharing also works perfectly.
So the problem exists only if asterisk is compiled against latest corosync (2.3.5) or may be the whole 2.x corosync family

By: Russell Bryant (russell) 2015-10-20 14:43:39.480-0500

Sounds right.  That's the same issue reported in the github corosync issue I linked to earlier (works on older corosync, but not newer).

This code was certainly originally developed against much older corosync.

By: Matt Jordan (mjordan) 2015-10-20 21:27:07.221-0500

Not that it is surprising, but I confirmed as well that 1.4.7 still works:

{code}
*CLI> corosync show members

=============================================================
=== Cluster members =========================================
=============================================================
===
=== Node 1
=== --> Group: asterisk
=== --> Address 1: 192.168.0.102
=== Node 2
=== --> Group: asterisk
=== --> Address 1: 192.168.0.105
===
=============================================================
{code}

Hilarious that 1.4.7 is still the default package on CentOS 6.7. Oh well... people in glass houses and what-not.