[Home]

Summary:ASTERISK-26755: app_queue: Random queues disappear on "core reload queue all"
Reporter:Kirill Katsnelson (kkm)Labels:
Date Opened:2017-01-25 21:51:08.000-0600Date Closed:2017-01-30 11:29:25.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Applications/app_queue
Versions:13.13.1 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:$ uname -a Linux qa1-asterisk1 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Attachments:
Description:We have 500+ queues, the "core reload queue all" command is sent every 2 minutes, and sometimes a queue disappears on reload: it is in the queues.conf, but just not there until the next reload.

----

The issue is very easy to reproduce in a matter of a second. First, create 1000 queues:

{code}
#!/bin/bash
ASTROOT=~/asterisk/myroot
 (
   cat << EOF
[general]
persistentmembers = no
autofill = yes
updatecdr = no
EOF
   seq -f "[Q%03.0f]" 0 999
   cat << EOF
timeout = 1
retry = 1
autopause = no
ringinuse = no
setqueuevar = yes
strategy = random
announce-frequency = 0
EOF
 ) > ${ASTROOT}/etc/asterisk/queues.conf
{code}

Then make two torturously tight loops; the first in extensions.ael trying to enter the queue:

{code}
context from-sip {
 796 => {
   Queue(Q999,,,,0.01);
   jump ${EXTEN};
 }
}
{code}

and the second reloading the queue files

{code}
#!/bin/bash
ASTROOT=~/asterisk/myroot
while :; do
 # Reload queues
 touch ${ASTROOT}/etc/asterisk/queues.conf
 ${ASTROOT}/sbin/asterisk -rx "queue reload parameters"
done
{code}

Call the first, run the second, and there will be a lot of failures reported from Queue() complaining the queue Q999 does not exist.

-----

This is a race condition in app_queues.c. When reloading, all queues are first marked dead, and then resurrected as soon as each is loaded from config. At the same time, the dead flag is checked on a queue whenever the Queue() app returns, for lame-ducking out of service on a deleted queue, such that the queue is unlinked when it has no calls, which is our case. Both pieces hold locks... but these are different locks!

-----

I am sending a patch against the 13 branch that fixed a problem for us (under the above artificial test conditions). It is in QA now, not yet under a production load. I'll post the progress.
Comments:By: Asterisk Team (asteriskteam) 2017-01-25 21:51:09.133-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Friendly Automation (friendly-automation) 2017-01-30 11:29:26.476-0600

Change 4822 merged by zuul:
app_queue: Fix queues randomly disappearing on reload

[https://gerrit.asterisk.org/4822|https://gerrit.asterisk.org/4822]

By: Friendly Automation (friendly-automation) 2017-01-30 11:29:50.892-0600

Change 4826 merged by zuul:
app_queue: Fix queues randomly disappearing on reload

[https://gerrit.asterisk.org/4826|https://gerrit.asterisk.org/4826]

By: Friendly Automation (friendly-automation) 2017-01-30 11:40:20.845-0600

Change 4829 merged by zuul:
app_queue: Fix queues randomly disappearing on reload

[https://gerrit.asterisk.org/4829|https://gerrit.asterisk.org/4829]