[Home]

Summary:ASTERISK-21847: Segfault due to dahdi_restart and round robin
Reporter:Ivo Andonov (ivo.andonov)Labels:
Date Opened:2013-05-30 05:27:13Date Closed:2013-07-03 18:35:03
Priority:MinorRegression?
Status:Closed/CompleteComponents:Channels/chan_dahdi
Versions:1.8.22.0 11.4.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Not relevantAttachments:( 0) jira_asterisk_21847_v1.8.patch
Description:Hello everyone,

I did not specify a version as I think this one matches any version.

I'm using Asterisk 1.6.2.20 in a production environment. Every morning I restart the PRI interface using the "dahdi restart" application. The first call that the system gets after that using the PRI generates a segfault. The dial string uses the round robin channel search.

After generating a core dump and a bt full I traced the problem down to dahdi_request function in which the struct dahdi_pvt *p gets set to an invalid pointer (0x38 in my case). Looking into the code I think I spoted the possible problem. Leaving it to the developers' opinion, but here are my findings:

1. (minor) round_robin array is defined as being of size 32, while the comments / description says "Dial(DAHDI/(g|G|r|R)<group#(0-63)>"... This might be confusing as the array is 32 and not 64. While there is an array length check against the group parameter in recent versions, it is not in 1.6.2.20.

2. dahdi_restart destroys all channels and thus invalidates any pointers in the round_robin array, however the latter is not reset to null, and I think this is the cause of the segfault I'm getting. round_robin is memset to 0 at load_module only.

Best,
Ivo

Comments:By: Richard Mudgett (rmudgett) 2013-05-30 11:30:11.581-0500

Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.4 and 1.6.x branches has ended. For continued maintenance support please move to the 1.8 branch which is a long term support (LTS) branch. For more information about branch support, please see https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions.  After testing with Asterisk 1.8, if you find this problem has not been resolved, please open a new issue against Asterisk 1.8.



By: Ivo Andonov (ivo.andonov) 2013-05-30 12:00:05.205-0500

"I did not specify a version as I think this one matches any version."

Anyway, should be better now probably.

By: Michael L. Young (elguero) 2013-05-30 12:35:08.028-0500

Ivo,

{quote}
I did not specify a version as I think this one matches any version.
{quote}

Please help prove that statement.

As Richard stated, please test on a supported branch.  It helps save time not chasing down a bug that is already fixed in our supported releases.

If you can reproduce on 1.8 or 11, please attach the backtrace and any pertinent debug information as txt files.

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

Thanks

By: Ivo Andonov (ivo.andonov) 2013-05-31 01:12:14.133-0500

Please trust me, the problem is there. Read again my 2 points in this report.

The first point in the report cannot be proved by compiling any version - the problem there is just a matter of inconsistency between code and docs / comments in chan_dahdi.c. It would take some 10 minutes max too have a look at it.

As I stated it is a production environment and I cannot migrate easily from 1.6.2.20 to 1.8. Yes, I can use a different machine but I do not have a bunch of 2k USD priced PRI cards and a spare PRI trunk for testing. The problem is there and I fixed it for myself by memsetting round_robin array in the setup_dahdi function. Might not be the best place, but it solved my problem. I am just pointing this out to Devs in order to make a better Asterisk. I am not complaining and expecting any help, so if you want to close the issue due to not enough data (cores, stack traces etc) without having a minimal look into the pointed places in the code - I cannot argue.



By: Rusty Newton (rnewton) 2013-05-31 14:09:51.866-0500

From talking with Richard, this is confirmed to exist in 1.8 and 11 as well. I'll go ahead and open the issue.

By: Richard Mudgett (rmudgett) 2013-05-31 14:17:02.837-0500

[^jira_asterisk_21847_v1.8.patch] - This patch clears the round_robin pointer array in dahdi_restart().

There is a very similar place in the v1.6.2 code.

Thank you for the analysis.  I can see not clearing the round_robin array causing a crash if you are using the round robin channel allocation method and do a restart.