[Home]

Summary:ASTERISK-25888: Frequent segfaults in function can_ring_entry() of app_queue.c
Reporter:Sébastien Couture (sysreq)Labels:
Date Opened:2016-04-01 09:54:44Date Closed:2016-04-25 05:26:30
Priority:CriticalRegression?Yes
Status:Closed/CompleteComponents:Applications/app_queue
Versions:11.22.0 13.8.0 Frequency of
Occurrence
Frequent
Related
Issues:
is duplicated byASTERISK-25953 Segfault in Queue
is duplicated byASTERISK-25975 Asterisk 11.22.0 crashes due to error in app_queue.so
is duplicated byASTERISK-26733 asterisk queue segfault
is duplicated byASTERISK-25920 Asterisk 13.8.0 segfaults using app.queue when ringinuse set to yes and another call comes in.
is duplicated byASTERISK-25973 Asterisk crashes when call busy agent is enabled
is duplicated byASTERISK-26037 Asterisk crashed repeatedly - Segmentation fault
is duplicated byASTERISK-26182 A second call into inbound queue causes all calls that came in via the queue to go silent.
is related toASTERISK-26877 app_queue: Crash when seeing if a member can be rung
Environment:Linux 3.2.0-86-generic #124-Ubuntu SMP Wed Jun 17 21:40:14 UTC 2015 x86_64 x86_64 x86_64 GNU/LinuxAttachments:( 0) backtrace1.txt
( 1) backtrace2.txt
( 2) backtrace3.txt
Description:After upgrading from 11.19.0 to 11.22.0, we've all of a sudden started experiencing frequent segmentation faults related to the can_ring_entry() function in the app_queue module.

I've attached the backtraces of 3 crashes that happened in the space of two hours on a system with a load of about 1000 peers and 80 simultaneous calls.

We use both realtime queues and queue_members. We've reverted to 11.19.0 on this system for the time being.
Comments:By: Asterisk Team (asteriskteam) 2016-04-01 09:54:44.844-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Rusty Newton (rnewton) 2016-04-01 16:22:16.774-0500

Thanks for the report! Can you provide whatever logs you had running at the time of the crash? We only typically need the last couple thousand lines before the crash.

By: ibercom (ibercom) 2016-04-02 12:05:32.739-0500

I think this is a regression.
The commit (Asterisk11) 1943cfc53cadff36e64704257c06590bceca459b is problematic.
I experienced the same problem and reverted this commit.
Now everything is ok.

[~sysreq] you can try this change, line 3669 in apps/app_queue.c
{noformat}
if (call->member->in_call && call->lastqueue->wrapuptime) {
{noformat}
to:
{noformat}
if (call->member->in_call && call->lastqueue && call->lastqueue->wrapuptime) {
{noformat}

By: Carlos Oliva (coliva) 2016-04-04 07:22:15.982-0500

The same issue here using 13.8.0 version. The change provided by Sébastien Couture in last comment (apps/app_queue.c line 4167 in 13.8.0) seems to solve the issue, but I'm not 100% sure. I will test for a bit more of time

By: Carlos Oliva (coliva) 2016-04-06 10:22:03.990-0500

I can confirm after 2 days of testing the change suggested by Sébastien Couture solves the issue in my test environment

By: Misha Vodsedalek (vmisha) 2016-04-12 09:14:25.797-0500

I can also confirm that the suggested fix is good.  I examined the core dump and found out that the call->lastqueue is NULL and therefore the dereference of call->lastqueue->wrapuptime results in a segmentation fault.

By: Rusty Newton (rnewton) 2016-04-14 08:50:24.129-0500

If someone wants to submit the fix to Gerrit that will get more eyes on it and move it through the process to see if this is the best fix.

https://wiki.asterisk.org/wiki/display/AST/Gerrit+Usage

By: Andrew Nagy (tm1000) 2016-04-19 15:52:53.745-0500

https://github.com/asterisk/asterisk/commit/3b9d8b60b211377f2023ebfbfdd157cfb668de6e

By: sdolloff (sdolloff) 2016-05-03 11:11:32.033-0500

Is this also being fixed in the 11 branch?  I only see a release for 13.9

By: Joshua C. Colp (jcolp) 2016-05-03 11:14:56.190-0500

The fix went into 11 but a release has not yet happened.

By: Daniel Denson (dandenson) 2016-07-27 17:00:15.128-0500

is the fix in 11.23.0?  I just got hit with this hard at a customer...

By: Joshua C. Colp (jcolp) 2016-07-27 17:03:39.230-0500

Yes, the fix is in 11.23.0.

By: Daniel Denson (dandenson) 2016-07-27 17:05:59.154-0500

great! thanks