[Home]

Summary:ASTERISK-16947: Asterisk do not respect priority between queued calls.
Reporter:Thomas Foerster (tomtom87)Labels:
Date Opened:2010-11-11 17:19:35.000-0600Date Closed:
Priority:MajorRegression?No
Status:Open/NewComponents:Applications/app_queue
Versions:1.8.8.2 13.18.4 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) Bug_queue_priority
( 1) extensions.conf
( 2) messages.log
( 3) queues.conf
Description:I am running an Asterisk server in a call center environment.
Since months, when there are many calls (usually when the system goes over 15 -20 incoming calls), the priority between queues is not respected anymore for some calls randomly. The priority inside each queue is well respected, but not between several queues.
All queues concerned have the same weight

There is no announcement in the queues that could avoid the caller to be bridged with a member at the right moment.

I AM RUNNING ASTERISK 1.6.2.13

What I found strange is that at the moment the following caller has to be bridged, the moh is stopped and restarted on the channel that should normally be bridged now. But then Asterisk bridges another channel.


****** ADDITIONAL INFORMATION ******

I have included the following exemple where this bug happens :


[Nov  9 16:59:35] VERBOSE[3094] Agent 105 (Local/0105) is a member of queues "science" and "maths_1"

[Nov  9 19:24:35] VERBOSE[5435] Agent 105 is busy

[Nov  9 19:24:34] VERBOSE[5455] Caller "4183253025"(DAHDI/17-1) enters queue science
[Nov  9 19:25:10] VERBOSE[5463] Caller "4186550000"(DAHDI/4-1)enters queue science
[Nov  9 19:27:51] VERBOSE[5516] Caller "8198406787"(DAHDI/3-1) enters queue maths_1

[Nov  9 19:28:51] VERBOSE[5435] Agent 105 Hangup and becomes available

[Nov  9 19:29:07] VERBOSE[5455]Asterisk stops moh on caller "4183253025" (DAHDI/17-1)

[Nov  9 19:29:07] VERBOSE[5455]Asterisk starts moh on caller "4183253025" (DAHDI/17-1)

[Nov  9 19:29:12] VERBOSE[5539] Asterisk connects 8198406787 to member (Local/0105)


Everything happens here : (around line 330 of the included file messages.log)

[Nov  9 19:27:51] VERBOSE[5516] pbx.c:     -- Executing [1@level2:3] Queue("DAHDI/3-1", "maths_1,c,,,600") in new stack
[Nov  9 19:27:51] VERBOSE[5516] res_musiconhold.c:     -- Started music on hold, class 'default', on DAHDI/3-1

[Nov  9 19:28:51] VERBOSE[5435] chan_dahdi.c:     -- Hungup 'DAHDI/15-1'
[Nov  9 19:28:51] VERBOSE[5466] pbx.c:   == Spawn extension (profs, 0105, 2) exited non-zero on 'Local/0105@profs-0dfd;2'

[Nov  9 19:29:07] VERBOSE[5539] pbx.c:     -- Executing [0105@profs:1] Set("Local/0105@profs-4b85;2", "AGENT_SIP=304") in new stack
[Nov  9 19:29:07] VERBOSE[5539] pbx.c:     -- Executing [0105@profs:2] Dial("Local/0105@profs-4b85;2", "SIP/304,30") in new stack
[Nov  9 19:29:07] VERBOSE[5539] netsock.c:   == Using SIP RTP TOS bits 184
[Nov  9 19:29:07] VERBOSE[5539] netsock.c:   == Using SIP RTP CoS mark 5
[Nov  9 19:29:07] VERBOSE[5539] app_dial.c:     -- Called 304
[Nov  9 19:29:07] VERBOSE[5516] app_queue.c:     -- Called Local/0105@profs/n

[Nov  9 19:29:07] VERBOSE[5455] res_musiconhold.c:     -- Stopped music on hold on DAHDI/17-1
[Nov  9 19:29:07] VERBOSE[5455] res_musiconhold.c:     -- Started music on hold, class 'default', on DAHDI/17-1
[Nov  9 19:29:07] VERBOSE[5539] app_dial.c:     -- SIP/304-0000015e is ringing
[Nov  9 19:29:07] VERBOSE[5516] app_queue.c:     -- Local/0105@profs-4b85;1 is ringing

[Nov  9 19:29:12] VERBOSE[5539] app_dial.c:     -- SIP/304-0000015e answered Local/0105@profs-4b85;2
[Nov  9 19:29:12] VERBOSE[5516] app_queue.c:     -- Local/0105@profs-4b85;1 answered DAHDI/3-1
Comments:By: Leif Madsen (lmadsen) 2010-12-06 13:01:14.000-0600

Can you elaborate on "priorities between queues"? I don't understand what the actual problem here. Can you elaborate on what is happening in the call flow, and what you're expecting to happen?

I'm guessing that the usage of the Agent channel within a Local channel may be causing some sort of issues for you, especially in a busy system, because the device state with Agent and Local channels is best effort. Only with with SIP channels is the device state more likely to be guaranteed.

By: Thomas Foerster (tomtom87) 2010-12-07 11:45:59.000-0600

Hi !

Thanks for taking care of this issue.

In order to elaborate on what is the actual problem, here is what is happening and what is expected in terms of call flow, for the example taken in "Additionnal information" :

Expected :

1) Local/0105 is a member of queues "science" and "maths_1"
2) Member Local/0105 is busy
3) Caller "Caller1"(DAHDI/17-1) enters queue science
4) Caller "Caller2"(DAHDI/4-1)enters queue science
5) Caller "Caller3"(DAHDI/3-1) enters queue maths_1
6) Member Local/0105 Hangup and becomes available
7) Asterisk connects Caller1 to member (Local/0105)


Happening :

1) Local/0105 is a member of queues "science" and "maths_1"
2) Member Local/0105 is busy
3) Caller "Caller1"(DAHDI/17-1) enters queue science
4) Caller "Caller2"(DAHDI/4-1)enters queue science
5) Caller "Caller3"(DAHDI/3-1) enters queue maths_1
6) Member Local/0105 Hangup and becomes available
7) Asterisk connects Caller3 to member (Local/0105)


The difference is in step7: Asterisk is bridging a caller that has been waiting less time than other callers. This is what i meant by "not respecting the priority between queues".

This is just an example, I have dozens of other cases each day over about an hundred calls. Those cases happen when there are a lot of callers in the queues.

Nevertheless, I noticed in all cases that Asterisk always respects the priority between callers of the same queue.

About the fact that this might be related to the use of Local channels as queue members, I am loging the members using for example :

exten => s,n,addQueueMember(science,Local/0105@profs/n,,,105-John Smith,SIP/410)

Therefore, shouldn't Asterisk monitor the SIP channel due to the last argument of "addqueuemember" ?

Thank You

Thomas



By: Thomas Foerster (tomtom87) 2012-01-21 16:36:54.612-0600

Hi,

I have updated to Asterisk 1.8.8.1

This issue still happens. Now it seems to be reproducable easily.


Here are my actions :

1) Using the console to log the phones into the queues :
Queue add member SIP/411 to primaire
Queue add member SIP/411 to maths_2
Queue add member SIP/411 to sciences

Queue add member SIP/406 to primaire
Queue add member SIP/406 to maths_2
Queue add member SIP/406 to sciences

Queue add member SIP/403 to primaire
Queue add member SIP/403 to maths_2
Queue add member SIP/403 to sciences

2) I call extension 706 on each phone, which is Musiconhold(), to put all phones on "In Use" state.

3) Using cell phone 5146908662, I call my asterisk server DID and choose queue "primaire"
[Jan 21 17:12:33] VERBOSE[2042] pbx.c:     -- Executing [1@entrants_ouvert:1] Queue("DAHDI/i1/5146908662-1", "primaire,c,,,600") in new stack

4) Using cell phone 5145212424, I call my asterisk server DID and choose queue "maths_2"
[Jan 21 17:13:07] VERBOSE[2048] pbx.c:     -- Executing [2@level2:1] Queue("DAHDI/i1/5145212424-2", "maths_2,c,,,600") in new stack

5) Using cell phone 4383917461, I call my asterisk server DID and choose queue "sciences"
[Jan 21 17:13:39] VERBOSE[2058] pbx.c:     -- Executing [4@level2:1] Queue("DAHDI/i1/4383917461-3", "sciences,c,,,600") in new stack

6) I hangup the call on SIP/411, to change SIP/411 to state "Not In Use".
[Jan 21 17:13:52] DEBUG[2032] channel.c: Hanging up channel 'SIP/411-00000001'
[Jan 21 17:13:52] DEBUG[1130] devicestate.c: Changing state for SIP/411 - state 1 (Not in use)



7) Asterisk connects me to 5145212424
[Jan 21 17:13:53] DEBUG[2048] app_queue.c: DAHDI/i1/5145212424-2 is trying to call a queue member.



8) I hang up SIP /411

9) Asterisk connects me to 4383917461
[Jan 21 17:14:08] DEBUG[2058] app_queue.c: DAHDI/i1/4383917461-3 is trying to call a queue member.


10) I hang up SIP /411

11) Asterisk connects me to 5146908662
[Jan 21 17:14:25] DEBUG[2042] app_queue.c: DAHDI/i1/5146908662-1 is trying to call a queue member.

12) I hang up SIP /411


As you see the priority is not respected. This happens on all calls I made. You can reproduce it.
I attached the complete debug logs (Bug_queue_priority). All queues have same weight

Also, please note that priority between callers inside the same queue is respected.



By: David Woolley (davidw) 2012-01-23 06:28:55.679-0600

This is how it has always worked.  Members don't actually look up queued calls, but rather, every second, the top queued call in each queue looks up free members.  If there is a weight difference between the queues, it considers the member unavailable if it is also in a higher weight queue and there are at least as many ready to run calls in the higher weight queue as available members in that queue.  The calls in the higher weight queue have to wait until they poll for members before they actually claim a member.

Changing the behaviour would be non-trivial.  If the queues were of equal weight, Asterisk would have to evaluate the total number of runnable jobs in other queues with the same member that were higher priority, or were the same priority, but had been waiting at least as long as the call that is currently seeking an agent.  I suspect making it age sensitive might be considered a feature request. However, I think the more general problem in the next comment indicates that there is also a bug.

By: David Woolley (davidw) 2012-02-08 11:44:52.027-0600

Actually, even when simply giving precedence to higher weight queues, this looks to be broken to me (up to and including trunk), unless I've completely misread the code.

The problem is that it tries each higher weight queue in isolation, but forgets that the same members may be in two higher weight queues, each with one less calls waiting than available agents.

Also, there may be an even higher weight queue that doesn't have the original member as a candidate to handle the call, but does have a member in common with the intermediate weight queue, possibly making that member unavailable for the intermediate queue, and therefore requiring the candidate to satisfy the waiting calls in that queue.

At the moment, I'm beginning to think that getting this to work correctly, even without acounting for wait time, may have non-polynomial complexity.

By: Thomas Foerster (tomtom87) 2012-02-08 13:26:35.438-0600

Hi David,

I think you identified the bug I was trying to explain

"The problem is that it tries each higher weight queue in isolation, but forgets that the same members may be in two higher weight queues, each with one less calls waiting than available agents.
"

If this is fixed, my problem of "priorities between queues not respected" will be fixed too.

By: David Woolley (davidw) 2012-02-17 13:23:11.693-0600

Enhancing this, at least without a rethink of the algorithm, may have serious performance consequences.  We are already getting a situation, with lots of queues and agents, and I supect thread debugging adding to the processing, where it is taking over 300ms (milli-seconds) to do ring_one, excluding the ast_call. (We're not sure whether it is the compare_weight part or the ast_request part which is the rate determining step of the loop over the queue members, when using AgentLogin agents.)

This time isn't currently covered by autoservice, which means it has consequences beyond the queue scheduling delays.  Unfortunately, we are working with a dead version (1.6.1.0) so I cannot raise that here.  (If the problem that is causing all my mail to asterisk-dev to silently vanish (even though we have now confirmed that lists.digium.com is accepting it), I might follow that side of it through on that list.)

By: Thomas Foerster (tomtom87) 2012-03-09 12:39:25.367-0600

Hi David,

Do you think there is any chance this issue will be addressed one day ?

By: David Woolley (davidw) 2012-03-12 05:37:27.763-0500

That's not for me to say.  At the moment, I can't see us submitting a patch.  I have no idea how important this is to Digium or to any other code contributor.

I would say it was a sufficiently large amount of work that it will only get done if someone with the right programming skills, and time, feels that the rework is the only solution to their problem.