[Home]

Summary:ASTERISK-09813: Queue timeouts are considerably higher than expected
Reporter:Nick Barnes (bcnit)Labels:
Date Opened:2007-07-05 20:31:20Date Closed:2007-07-24 11:25:27
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_queue
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 10127_more_debug.patch
( 1) 10127_realtime_fix.patch
( 2) 10127_with_debug.patch
( 3) 10127.patch
( 4) queue_log.txt
( 5) queue-more-debug.txt
Description:A queue is defined as follows:

----------------------------
[testq]
timeout=10
retry=5
strategy=ringall
wrapuptime=0
maxlen=0
joinempty=no
leavewhenempty=yes
eventmemberstatus=no
eventwhencalled=no
reportholdtime=no
memberdelay=0
weight=0
timeoutrestart=0
autopause=no
member => SIP/100
member => SIP/101
member => SIP/102
----------------------------

When called with 'Queue(testq,n)':
1 - The phones ring for 10 seconds.
2 - There is 5 seconds silence
3 - The phones ring for 10 seconds.
4 - There is 5 seconds silence
5 - The call exits the queue on the time-out cycle

So, despite the queue having a timeout of 10 seconds, the call is in the queue for 30.

When called with 'Queue(testq,n,,,10):
1 - The phones ring for 10 seconds.
2 - There is 5 seconds silence
3 - The call exits the queue (no mention is made of the time-out cycle).

So, despite coding an explicit 10 seconds timeout, the call is in the queue for 15.



****** ADDITIONAL INFORMATION ******

This is repeatable in both static and realtime configurations.
Comments:By: Jason Parker (jparker) 2007-07-06 10:27:25

This is correct (and documented) behavior.  The timeout option is how long to ring the phone, not how long callers stay in the queue.

By: Nick Barnes (bcnit) 2007-07-09 08:56:44

I take your point.

However, if what you say is the case, the first example should ring the handsets for ten seconds.

It doesn't - it rings them for 20 (two lots of 10).

Since the queue is called with the 'n' paramter (no retry on timeout) and the timeout does occur and a retry does take place, there is clearly something wrong.

Also, shouldn't 'retry' be ignored if the call is going to timeout? i.e. if timeout=10, retry=5 and the call is queued with 'Queue(testq,n)', then surely it should be 10 seconds before the call bounces out and not 15. If this is not the case, we waste 5 seconds of the caller's life during which we can do nothing with the call.

As an aside (and I appreciate that this is not the ideal place to ask), just how is one supposed to dump a call to a queue for a specific amount of time, regardless of what that queue is doing with the call?

By: Mark Michelson (mmichelson) 2007-07-09 13:11:48

I just did some thorough investigation on this and here's what I've found:

The n option is dependent on the timeout parameter given to it in the Queue() app call. The n option is ignored until the timeout has expired. The thing is, app_queue exits due to the timeout, NOT the n option. The only way to actually make app_queue exit due to the n option being in effect is to not specify a timeout, in which case it defaults to fifteen seconds.

Now, since the check for the n parameter is where it is in code, it means that if the start of the first cycle of calling the queue members and waiting for the retry time occurs before the default timeout of fifteen seconds has elapsed, then the n option will be ignored and so a second round of calling the queue members and waiting through the retry time will occur. This is the behavior you're seeing in your first scenario.

Since the intention of the n option is to only ring the queue members once before moving on, this seems like broken behavior. A fix will be committed shortly.

By: Digium Subversion (svnbot) 2007-07-09 13:16:06

Repository: asterisk
Revision: 74120

------------------------------------------------------------------------
r74120 | mmichelson | 2007-07-09 13:16:05 -0500 (Mon, 09 Jul 2007) | 6 lines

The n option for Queue should make the queue exit immediately after failure to reach any members and should not
be dependent on the timeout value passed to Queue

(closes issue ASTERISK-9813, reported by bcnit, repaired by me)


------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-07-09 13:20:25

Repository: asterisk
Revision: 74121

------------------------------------------------------------------------
r74121 | mmichelson | 2007-07-09 13:20:24 -0500 (Mon, 09 Jul 2007) | 14 lines

Merged revisions 74120 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r74120 | mmichelson | 2007-07-09 13:32:50 -0500 (Mon, 09 Jul 2007) | 6 lines

The n option for Queue should make the queue exit immediately after failure to reach any members and should not
be dependent on the timeout value passed to Queue

(closes issue ASTERISK-9813, reported by bcnit, repaired by me)


........

------------------------------------------------------------------------

By: Leif Madsen (lmadsen) 2007-07-17 15:11:22

I'm reopening this issue because this patch has broken expected behaviour on my system.

With the 'n' option enabled, and using the round-robin w/ memory strategy, the Queue() now exits after the first member doesn't answer. Expected behaviour would be that this should really try all the members once in the strategy, and then continue on.

By: Leif Madsen (lmadsen) 2007-07-17 15:24:52

The real issue I think is that it should not actually exit after failure to reach _any_ member, but rather if it can't reach any member(s) depending on the strategy involved.

Exiting right away makes sense when the strategy is ringall, like in the example given, but exiting after the first called member when you've got a strategy of rrmemory seems wrong to me.

By: Mark Michelson (mmichelson) 2007-07-17 18:26:31

I've written a new patch to take blitzrage's considerations into account. If the ring strategy is roundrobin or rrmemory, then instead of ringing just one phone, we'll ring every phone once. If the strategy is any other, then we'll only ring one phone (or all phones once if the strategy is ringall).

The reason I decided to do it this way is that roundrobin and rrmemory are the only strategies that guarantee that each retry will ring a different member. Leastrecent and fewestcalls have the potential to keep ringing the same member over and over.

The patch I have uploaded was made against an SVN version later than 74120, so in order to apply it, you'll need to first make sure you are at an SVN revision later than 74120 so that my original fix is still intact.

Also, as another side effect of my changes, since I added a member_count field to the call_queue struct, I also changed the dialplan function QueueMemberCount to use this new field. This may be committed as a separate patch later, but for now it's all in this one.

Let me know what you think.

By: Leif Madsen (lmadsen) 2007-07-18 11:29:17

Just did some testing with this patch, and the very first time I called into the Queue, it called all three members (using the round robin w/ memory strategy), then exited and continued on in the dialplan. This is exactly what was wanted.

However, on subsequent calls into the Queue, only one member was rung, then we'd exit and continue on in the dialplan, which was the original problem reported upon re-opening the ticket.

If I restart Asterisk it appears to work the first try, but then falls back to original behaviour. Also, I noticed in the 2nd test that 2 phones were rung, but when the system would go to ring the 3rd phone it just sat there, then reported back "Nobody picked up in 10000ms" and then continued on (like it tried to call the 3rd member, but never actually did... ?)

I'm on IRC if you need more info or testing from me. This patch seems close though!

By: Mark Michelson (mmichelson) 2007-07-18 12:46:56

New patch uploaded. This one has a line of debug added so that I can see which of two possibly incorrect variables is incorrect.

By: Mark Michelson (mmichelson) 2007-07-18 15:16:58

I've uploaded another patch which will hopefully fix your problem, blitzrage. Admittedly, since I don't have realtime set up here this is untested, but tracing it in my brain it seems to work. Give this a try and let me know how it turns out.

The new patch is called 10127_realtime_fix.patch.

By: Leif Madsen (lmadsen) 2007-07-18 16:55:46

I tested the latest patch and it is very close to working.

On the first call, 2 of 3 members are rang, but where the third member would be called, there is just a delay, then the Queue() exits and continues on in the dialplan.

On the 2nd call, all 3 members are rang and the Queue() exits to the dialplan (works as desired).

All calls beyond the first call seem to work fine, it's just the first call that doesn't work right. If I restart Asterisk, then I can reproduce as stated above.

By: Leif Madsen (lmadsen) 2007-07-24 07:49:21

OK, tested again with the more-debug patch. Seems to ring 2 interfaces fine, then quits before it rings the 3rd interface for some reason. I added the queue-more-debug.txt file which contains the logging for the test call.

I removed everything I didn't think was necessary in order to help aid the reader :)

By: Leif Madsen (lmadsen) 2007-07-24 10:36:24

I had problems making this work before, but maybe I was just a nub and wasn't applying this correctly or something.

With the 10127-more-debug.patch file everything works as I would expect. Essentially this 'n' option becomes a "single iteration" flag that continues on in the dialplan upon calling all available members in the queue.

It works!

By: Digium Subversion (svnbot) 2007-07-24 11:09:49

Repository: asterisk
Revision: 76801

------------------------------------------------------------------------
r76801 | mmichelson | 2007-07-24 11:09:48 -0500 (Tue, 24 Jul 2007) | 13 lines

Added a membercount variable to call_queue struct which keeps track of the number of logged in members in a particular queue.
This makes it so that the 'n' option for Queue() can act properly depending on which strategy is used. If the strategy is
roundrobin, rrmemory, or ringall, we want to ring each phone once before moving on in the dialplan. However, if any other strategy is
used, we will only ring one phone since it cannot be guaranteed that a different phone will ring on subsequent attempts to ring a phone.

As a side effect of this, the QUEUE_MEMBER_COUNT dialplan function now just reads the membercount variable instead of traversing through
the member list to figure out how many members there are.

Special thanks to blitzrage for helping to test this out.

(closes issue ASTERISK-9813, reported by bcnit, patched by me, tested by blitzrage)


------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-07-24 11:25:27

Repository: asterisk
Revision: 76804

------------------------------------------------------------------------
r76804 | mmichelson | 2007-07-24 11:25:25 -0500 (Tue, 24 Jul 2007) | 21 lines

Merged revisions 76801 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r76801 | mmichelson | 2007-07-24 11:26:58 -0500 (Tue, 24 Jul 2007) | 13 lines

Added a membercount variable to call_queue struct which keeps track of the number of logged in members in a particular queue.
This makes it so that the 'n' option for Queue() can act properly depending on which strategy is used. If the strategy is
roundrobin, rrmemory, or ringall, we want to ring each phone once before moving on in the dialplan. However, if any other strategy is
used, we will only ring one phone since it cannot be guaranteed that a different phone will ring on subsequent attempts to ring a phone.

As a side effect of this, the QUEUE_MEMBER_COUNT dialplan function now just reads the membercount variable instead of traversing through
the member list to figure out how many members there are.

Special thanks to blitzrage for helping to test this out.

(closes issue ASTERISK-9813, reported by bcnit, patched by me, tested by blitzrage)


........

------------------------------------------------------------------------