[Home]

Summary:ASTERISK-16115: [patch] problem with ringinuse=no, queue members receive sometimes two calls
Reporter:nik600 (nik600)Labels:
Date Opened:2010-05-19 02:52:17Date Closed:2016-04-25 16:32:42
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_queue
Versions:Frequency of
Occurrence
Related
Issues:
causesASTERISK-20801 Non-SIP queue members get no calls when ringinuse=no.
is duplicated byASTERISK-23378 [patch]Queue with 'ringinuse=no' and members in realtime can get several calls at the same time (with patch)
is duplicated byASTERISK-25064 Members (ringinuse disabled) of multiple queues ringing with other queue calls.
is duplicated byASTERISK-26013 Multiple queued calls sent to agent
is related toASTERISK-21574 Queue is sending multiple calls to the available agents at once when autofill is enabled
is related toASTERISK-22189 Wrap up time is ignored for queue members who are members in multiple queues
is related toASTERISK-24310 CLONE - Wrap up time is ignored for queue members who are members in multiple queues
Environment:Attachments:( 0) app_queue.c.patch
( 1) app_queue.c-1.6.2.10.patch
( 2) app_queue.c-svn-r368404.patch
( 3) app_queue.c-svn-r370418.patch
( 4) app_queue.c-svn-r375015.patch
( 5) debug_.txt
( 6) debug.filtered.gz
( 7) jira_asterisk_16115_revert_r370418_v1.8.patch
( 8) jira_asterisk_16115_single_q_v1.8.patch
Description:Dear all

on a debian amd64 i've installed (from source) asterisk 1.4.31

On the system we have in average 50 concurrent calls in queue and 40
sip members.

I'm experiencing an apparently random problem:
sometimes some users receive 2 calls from asterisk, apparently
ignoring the ringinuse=no settings.
It appears on users that are members of many queues

As you can see from the log, the user goes in a status Ring+Inuse.

Any idea?
Why the call is still dispatched to the user if it is not in the "Not
in use" status?

i've added some customized log in the ring_entry function and this is the result:
{noformat}
[May 18 14:13:04] DEBUG[24945] app_queue.c: KUMBELOG: queue=queue_1        count=1,membercount=13,ringinuse=0,device=SIP/PL1009,status=1
[May 18 14:13:04] DEBUG[24945] app_queue.c: Found matching member SIP/PL1009 in queue 'queue_2'
[May 18 14:13:04] VERBOSE[24945] logger.c:     -- Called SIP/PL1009
[May 18 14:13:05] VERBOSE[24945] logger.c:     -- SIP/PL1009-00001807 is ringing
[May 18 14:13:06] DEBUG[25098] app_queue.c: KUMBELOG: queue=queue_2        count=2,membercount=15,ringinuse=0,device=SIP/PL1009,status=1
[May 18 14:13:06] DEBUG[25098] app_queue.c: Found matching member SIP/PL1009 in queue 'queue_1'
[May 18 14:13:06] DEBUG[25098] app_queue.c: Found matching member SIP/PL1009 in queue 'queue_3'
[May 18 14:13:06] VERBOSE[25098] logger.c:     -- Called SIP/PL1009
[May 18 14:13:07] VERBOSE[25098] logger.c:     -- SIP/PL1009-00001808 is ringing
[May 18 14:13:07] DEBUG[25312] app_queue.c: KUMBELOG: queue=queue_3        count=1,membercount=18,ringinuse=0,device=SIP/PL1009,status=6
[May 18 14:13:08] DEBUG[25382] app_queue.c: KUMBELOG: queue=queue_4        count=1,membercount=18,ringinuse=0,device=SIP/PL1009,status=6
[May 18 14:13:08] DEBUG[25224] app_queue.c: KUMBELOG: queue=queue_2        count=2,membercount=15,ringinuse=0,device=SIP/PL1009,status=6
[May 18 14:13:12] VERBOSE[25098] logger.c:     -- SIP/PL1009-00001808 answered SIP/192.168.55.32-000017e6
[May 18 14:13:13] VERBOSE[25098] logger.c:     -- Native bridging SIP/192.168.55.32-000017e6 and SIP/PL1009-00001808
[May 18 14:13:14] DEBUG[25224] app_queue.c: KUMBELOG: queue=queue_2        count=1,membercount=15,ringinuse=0,device=SIP/PL1009,status=7
{noformat}
It seems that the system does not change the status of the user after calling it, and then re-schedule a new call.

After that the status is updated and goes in a ring+inuse status (7)

Do you have any idea about what can cause that?

This is an example of my config
{noformat}
[PL1009]
context=mycontext
callerid=PhoneLine1009 <1009>
secret=pwd1009
type=peer
host=dynamic
call-limit=3
disallow=all
allow=ulaw

queues:
[queue_1]
weight=10
wrapuptime=0
strategy=leastrecent
joinempty=no
retry=0
autopause=yes
setinterfacevar=yes
eventwhencalled=yes
eventmemberstatus=yes
ringinuse=no

member => SIP/PL1009

[queue_2]
weight=10
wrapuptime=0
strategy=leastrecent
joinempty=no
retry=0
autopause=yes
setinterfacevar=yes
eventwhencalled=yes
eventmemberstatus=yes
ringinuse=no

member => SIP/PL1009


[queue_3]
weight=10
wrapuptime=0
strategy=leastrecent
joinempty=no
retry=0
autopause=yes
setinterfacevar=yes
eventwhencalled=yes
eventmemberstatus=yes
ringinuse=no

member => SIP/PL1009
{noformat}

****** ADDITIONAL INFORMATION ******

I've tried:

1.4.31
1.4.30

run the system using ESXi on DL380
run the system using ESXi on HP Blade
run the system directly on hardware without virtualization
used slackware 13.0 instead of debian AMD 64
changed the kernel hertz to 1000 instead of 250
added dahdi to optimize timing

On client-side, i've tested
Sjphone on windows
CISCO 7940 phone

in all these test-case i had the problem, and it occurs with a frequency of 100 times each 4000 calls.
Comments:By: Cristian Dimache (cristiandimache) 2010-05-19 04:46:09

Same thing here with 1.6.2.7.
I have not tested so much as nik600, but seems to be the same issue.

By: nik600 (nik600) 2010-05-19 06:06:09

Hi cristiandimache can you give us some information?

how many calls?
how many users?
what kind of user agent do you use?
do you have dynamic or static members?

just to see if there are some similar configuration

By: Cristian Dimache (cristiandimache) 2010-05-19 07:31:13

About 20 to 30 calls, outbound or inbound (the inbound are on the problem queue).
About 50-60 users at a time.
We use a custom app on top of pjsip
I have dynamic members.

By: nik600 (nik600) 2010-05-19 16:24:48

I'm thinking about the log...

if the logger prints:
[May 18 14:13:05] VERBOSE[24945] logger.c: -- SIP/PL1009-00001807 is ringing
why the status on the next step
[May 18 14:13:06] DEBUG[25098] app_queue.c: KUMBELOG: queue=queue_2 count=2,membercount=15,ringinuse=0,device=SIP/PL1009,status=1

is still 1 (Not in use) instead of 6 (ringing) ?

what is the event that updates the member status?
can this be related to some SIP UDP messages that are not received from the client?

tomorrow i'll test to record some SIP DEBUG messages and let you know something.

any idea is appreciated to debug this behaviour.

Thanks

By: nik600 (nik600) 2010-05-20 06:37:21

I've attached a debug file.

as you can see Asterisk receive the RINGING message from the client

<--- SIP read from 10.192.37.122:5060 --->
SIP/2.0 180 Ringing
Via: SIP/2.0/UDP 192.168.55.32:5060;branch=z9hG4bK37352f9d;rport=5060;received=192.168.55.32
From: "queue_1" <sip:0461XXXXXX@192.168.55.32>;tag=as35ea7787
To: "unknown" <sip:PL1012@10.192.37.122>;tag=3280131324f
Contact: <sip:PL1012@10.192.37.122>
Call-ID: 28a4b3446e0b0c870e542b365f665909@192.168.55.32
CSeq: 102 INVITE
Content-Length: 0
Server: SJphone/1.65.377a (SJ Labs)

and prints it in the log

[May 20 13:06:19] VERBOSE[11620] logger.c:     -- SIP/PL1012-00001215 is ringing


But the ring_entry function in app_queue still see the interface as Not in Use

[May 20 13:06:19] DEBUG[13250] app_queue.c: KUMBELOG: queue=queue_2 count=2,membercount=9,ringinuse=0,device=SIP/PL1012,status=1


And place the second call

[May 20 13:06:19] DEBUG[13250] chan_sip.c: Call to peer 'PL1012' is 2 out of 3

The question is:

why if asterisk prints that the device is ringing does not update the status?

By: nik600 (nik600) 2010-05-21 06:51:58

cristiandimache, have you tried the

internal_timing = yes

option in asterisk.conf ?

By: Cristian Dimache (cristiandimache) 2010-05-21 08:14:00

No, I have not. I fail to see the link between failing to proper identify the user state in a queue and timing of RTP...

By: nik600 (nik600) 2010-05-21 10:36:09

As you can see from the log, asterisk receive the notification of ringing from the called device but doesn't change it and places a second call.
After that, the device status is changed and goes in state 7 (Ringing + InUse).

So, maybe there is a timing problem during the handling of internal messages.

I've tried to change it today and (at the moment) i haven't experienced the problem.

Can you try to see what happens on your system?

By: nik600 (nik600) 2010-05-24 02:03:02

today i've experienced  the problem with internal_timing = yes option.

i've noticed that with call-limit = 1 the problem is not present.

I'm planning to move the configuration of sip users to realtime, due to have the possibility to change this value from an application server when it is needed.

Do you have any suggestment about something else to debug?

I can put some custom trace in the code, but where?

By: nik600 (nik600) 2010-06-16 15:41:45

i just want to report that i have the problem also on 1.6.0.28.

with call-limit to 1 i can avoid that, but the problem remains.

Any idea about any else kind of debug or test i can do ?

By: nik600 (nik600) 2010-07-24 02:04:28

i've seen that
https://issues.asterisk.org/view.php?id=16035
https://issues.asterisk.org/view.php?id=16472
are similar to this Issue.

what do you think about it?

By: Fernando Lujan (flujan) 2010-08-26 14:20:45

Have you try downgrading to 1.4.29? I am experiencing the same issue here. Set call-limit=1 and seeing if it will fix this issue. Tried 1.4.35 and nothing changed.

By: nik600 (nik600) 2010-08-27 02:59:50

i've tried downgrading to 1.4.26.2, buth i still have the problem.

i've seen that setting the call-limit = 1 fix the problem in the 90% of times, but sometimes still happend.

Any idea about any else kind of debug or test i can do ?

By: nik600 (nik600) 2010-08-27 03:32:22

Ok, flujan,cristiandimache,RoadKill it seems that only we have that problem.

But it affects all these version of Asterisk:
1.4.x
1.6.0.x
1.6.2.x

This kind of problem is quite severe in a production call-center environment so probably there is some configuration that generates this problem with a random frequence.

I propose to share our configuration and see if there is some similar cusom settings.

For example, my installation use this:

i use both manager and http interface, on port 5038 and on port 8088 with mxml.

i use 5038 to read events, and port 8088 to view queue status and originate new calls.

i use dynamic members in queue, and other particularities are these settings:

eventwhencalled = vars
eventmemberstatus = yes
setinterfacevar=yes
setqueueentryvar=yes

flujan,cristiandimache,RoadKill can you post your configuration?

Thanks for your help

By: Stefan Schmidt (schmidts) 2010-08-27 04:09:11

i am not sure but do you use the answer application in your extension before calling the queue? if yes this could be the problem klaus3000 and i found and solve about the wrong indication state of a dialed user, using answer before the dial. see issue 17641 for this.

By: nik600 (nik600) 2010-08-27 05:07:13

thanks for your note.

in my case i have an AGI script that answer the call, and then, after some audio messages and DTMF input the call is put into the specific queue using a Goto.

Using queues you have to answer the call before to join the queue.

But maybe the the problem is if in the dialplan there are many answer, for example:

***********************

[context_A]

exten => 1,1,Answer
exten => 1,n,Background(something)
exten => 1,n,Goto(context_C,1,1)

[context_B]

exten => 1,n,Goto(context_C,1,1)


[context_C]

exten => 1,1,Answer
exten => 1,n,Queue(2000)

***********************

In this example i can call the extension 1 in context_C from two points, one answer the call and the other one not.

So i put an Answer also in the context_C.

is that the problem?

looking at the code in the answer application seems to me that this situation is handled.

By: Stefan Schmidt (schmidts) 2010-08-27 08:05:10

the problem isnt answer itself and maybe answering a call within an AGI could cause this too. The problem was that a sip call which is answered (really by the client not the application) the indication isnt set to the right value. This would also cause some problems when this call is transfered. The patch i mean just sets the indication to the right value after the client answers, so this could also depend on the status of the client.

By: nik600 (nik600) 2010-08-27 08:19:50

i'm not sure about it because in our case the second call is forwarded to the agent even if the channel is in ringing state (not yes answered), but in any case i'll try to apply your patch and let you know.

Thanks

By: Fernando Lujan (flujan) 2010-08-27 09:16:14

Do you guys try to downgrade to version 1.4.29.1? nik600 please tell us the result of the patch.

nik600, sometimes you need a answer before the queue command. I have problems with the caller not hearing the MOH if I do not answer the queue. I don't believe it is related.

By: Charles Moye (chazzam) 2010-08-28 13:48:43

In Asterisk 1.4.X have you tried setting 'limitonpeers=yes' in sip.conf under general, and then on the user setting 'call-limit' to anything (2 or above should work) with ringinuse=no set on the queue?

In 1.6.2 limitonpeers is replaced by 'callcounter=yes'.

By: Fernando Lujan (flujan) 2010-08-28 21:41:11

This issue does not happen with autofill=no on queues.conf

By: nik600 (nik600) 2010-08-30 02:30:48

the test with the patch suggested from schmidts fails (no significative changes).

I'll try autofill=no and let you know ASAP.

Thanks

By: Bruce Spidel (bms) 2010-08-31 10:20:26

My client is running 1.6.2.10 on a CentOS 5.5 call center with 60+ queues and peak call volumes of 400+ concurrent calls. autofill is on,

I have noticed in the AMI log that when an agent is a member of multiple queues, sometimes he will be assigned multiple calls within 10 micro seconds. I do not yet understand the queuing implementation, however it looks as though a caller is "assigned" to a member (Newchannel) then another caller from a different queue is "assigned" to the same member (another Newchannel) before any flag is set on the interface indicating intention to use. This is well before the ringing on the devices. One caller gets answered and the other times out.

This happens hundreds of times per day. Calls get handled ok from the caller perspective but the agents are angry for receiving call waiting messages. Finally managers are confused about how to manage workforce with so many apparent call timeouts.

Does anyone know if there is a flag on the interface that says someone is or is about to use the channel and so it should not be "assigned" to another caller?

This is my first note so please be gentle.

By: nik600 (nik600) 2010-08-31 10:34:22

thanks bms, this is exactly the problem that i'm experiencing too, and i agree: is very frustrating for agents, it creates many problem expecially if you have autopause=yes, because the agent is automatically paused when losts the 2nd call.

I still have to test the autofill=no setting i hope to do it by the end of the week and give you a feedback in the next week (i can't touch production).

If you can test autofill=no let us know, thanks.

Bye



By: Bruce Spidel (bms) 2010-08-31 10:46:19

I am afraid that autofill=no will upset our call processing capability. With 60 calls in a queue, waiting for just 1 second on each member to answer may run average call times up half a minute. Autofill is a good thing if it is aware that another queue is assigned to one of its members.

By: Fernando Lujan (flujan) 2010-09-01 08:57:25

Yeap, I hope someone look at this issue. It is probably related to the autofill timing. Here, I am experiencing this issue with a single queue and 60 agents.

By: nik600 (nik600) 2010-09-01 09:09:20

i have about 30 agents and 10 queues.

By: Cristian Dimache (cristiandimache) 2010-09-01 10:25:38

I've got 40 agents and about 15 queues. It seems like a bug in a race condition handling the state of the agent.

By: nik600 (nik600) 2010-09-09 01:43:00

I confirm, with autofill=no the problem doesn't appear.

I've tried to look in the code but it's not very clear for me the autofill behaviour, is there some documentation more specific than the comments in queue.conf file?

By: Bruce Spidel (bms) 2010-09-09 14:29:23

I found that in 1.6.2.11 with 10 calls in each of 5 queues;
 queues.conf
   autofill=yes
   ringinuse=no (per queue)
 sip.conf
   call-limit=1 (per member)

members get only one call at a time. We are testing under load tonight.
call-limit allows n incoming and n outgoing calls when limit set to n.

I found out from developers that device state is not granular enough to preclude multiple calls. Call-limit is intended to choke off calls at the device level. There may be a problem if you need more than one outgoing call at a time.

By: Bruce Spidel (bms) 2010-09-10 15:21:36

call-limit set to one does choke off multiple rings but also precludes agent putting caller on hold to make another call. bummer.

By: Charles Moye (chazzam) 2010-09-10 16:54:54

bms, have you tried with it set to a higher value, like 50? Just having it set at all should do something. And what about having callcounter=yes set in sip.conf?

By: Bruce Spidel (bms) 2010-09-15 09:11:54

Please consider the attached patch to 1.6.2.10 understanding I will patch the head pending some initial feedback on this patch. This is my first submission so please be gentle. My analysis of this issue brought me to the conclustion that in very short intervals the queue can think a member is not-in-use when in fact it is preparing to dial. This is because each queue has a different member list and the same member can be part of multiple queues. Under certain loads it is not possible to suppress the attempt to dial on the same interface simultaneously. Phones with call waiting will accept this but callers will fall out of the queue unnecessarily.

To resolve this apparent problem, I created a queue view of interfaces that it consults to see if it is already using one interface. I added a linked list of structures containing the interface name and last time it was used. I added a pointer to the structure in member. Each time an attempt to try a member is performed I make sure that the member points to the structure for the particular interface. This way if two queues each have a member pointing to the same interface, only one will get access. And no one else will gain access for 2 seconds after which the device status will probably be correct.

Not being very familiar with most of the code and being tasked to fix a specific problem I did not know where to put any garbage collection. I have the code to do it if someone can point me to where it should be called.

I tested the patch with 1 member associated with 5 queues and dumping several calls in each queue with a script. 1.6.2.10 and 1.6.2.11 both would periodically ring the agent with 2-4 calls simultaneously. The attached patch single threads the calls. My environment requires autofill and multiple outgoing calls. I would be grateful for constructive notes.

By: Leif Madsen (lmadsen) 2010-09-20 14:58:23

Patch attached. Ready for Testing.

By: Bruce Spidel (bms) 2010-09-24 13:43:53

Just curious, has anyone looked into the attached patch?

By: Juan Sebastian Hernandez (juanshh) 2010-09-27 12:09:58

Hello,

Patch don't works, I'm patching on asterisk 1.6.0.19.

here is the log:

Hunk #1 succeeded at 351 (offset -279 lines).
Hunk #2 FAILED at 417.
Hunk #3 FAILED at 560.
Hunk #4 succeeded at 2312 (offset -259 lines).
2 out of 4 hunks FAILED -- saving rejects to file apps/app_queue.c.rej

Any Idea???

By: Bruce Spidel (bms) 2010-09-27 13:47:37

Juanshh: Thanks for trying. I think there are too many source code differences to overcome between 1.6.0.19 and 1.6.2.10 for which this patch was intended. The queue changed quite a bit in intervening releases.

By: Leif Madsen (lmadsen) 2010-10-04 12:11:21

juanshh: the patch probably just needs to be applied manually then. Look a the rejected hunks, and try to figure out where they fit in the code.

By: Bruce Spidel (bms) 2010-10-08 14:32:47

We have patched a call center that sees peek call volumes of 400 simultaneous calls without any multi-ring problems in 1 week. We were seeing 25 per day.



By: Fernando Lujan (flujan) 2010-11-05 08:14:47

lmadsen: Did you review the code? Any changes of putting it on the 1.4.x branch?

By: JoshE (n8ideas) 2012-03-07 22:31:40.222-0600

It appears this is still an issue with Asterisk 1.8.8.1... has anyone seen this occurring?  Anyone tried to move the patch forward?

By: William luke (greenlightcrm) 2012-03-15 18:22:08.333-0500

I can confirm this is still an issue with 10.1.2.

Observed exactly the same behaviour as Bruce Spidel initially described, via the AMI. This happens when calls from multiple queues are delivered at almost the same microsecond.
I can post logs from AMI with timestamps if it would help?

Setting autofill off massively reduces the problem, but it still happens a few times a day for us. I think with autofill on it could happen with two calls from the same queue, but with autofill off it's limited to different queues.

We're using Local channels as the queue members.

Are there plans to commit this patch any time soon - it's been a while?

By: Marco Aurelio (aureliors123) 2012-05-17 12:06:36.026-0500

Any update about this issue?

By: art (art) 2012-05-22 06:24:57.698-0500

I patched Bruce Spidel's patch to asterisk svn r341437 and it seems to fix issue for me. Vote from me to commit this patch.

By: William luke (greenlightcrm) 2012-05-22 06:30:18.893-0500

Lets commit the patch, finally!

By: art (art) 2012-06-11 05:11:14.093-0500

I edited Bruce Spidel patch so it would patch against svn revision 368404. We have used it in production server with about 5000 calls per day for one week. Seems to be working.

By: Italo Rossi (italorossi) 2012-07-18 15:46:20.238-0500

This patch has already been applied? What versions will receive the fix?

By: Italo Rossi (italorossi) 2012-07-24 11:17:35.640-0500

Hello all,

I've been running into this problem with asterisk versions 1.8.7.1, 1.8.10.1, and probably
the most recent version also has this issue.

Follow these steps in order to reproduce:

Create a queue testqueue with one dynamic member. I've used X-lite softphone.

Create this context:
{code}
[testcall]
exten => 10,1,Answer()
exten => 10,n,Wait(12)
exten => 10,n,Hangup()
{code}

Call file test:
{code}
cat <<EOF > /tmp/test
Channel: Local/10@testcall
Application: Queue
Data: testqueue
EOF
{code}
Creating 101 concurrent calls and place the calls to outgoing directory:
{code}
# cd /var/spool/asterisk/

cp /tmp/test .; for i in `seq 1 100`; do cp test test$i; done; touch -d "`date --date='+2 second' +%T`" test*; mv test* outgoing/
{code}

This command will schedule 101 concurrent calls to the context [testcall] on the next 2 seconds,
these calls will join testqueue at the same time (almost) and this behavior *may* reproduce the problem.

This is my test results:

Without my patch: 3 rounds of 101 concurrent calls, I'm able to reproduce the problem for 6 times (freq 20%)

With my patch applied: 3 rounds of 101 concurrent calls, I can't reproduce it, so I guess that the problem is fixed.

The explanation:

In app_queue.c the code that checks the state of the queue member resides before the function ast_call, which makes the
request to the interface. *HERE IS THE PROBLEM*: At this time is possible that two or more running threads may enter that function (ast_call)
at the same time (a race condition), which will cause this behavior:
{noformat}
thread A:                           thread B
   request member state = free         request member state = free
   place the call                      place the call
{noformat}
My suggestion is to add a lock on the queue member just before the ast_call in order to avoid place the call without the guarantee that
this member is definitively free and just one thread makes the call to the interface, respecting the ringinuse parameter, obviously.

I've attached the patch to further readings, suggestions are welcome.

PS: I've already signed the license, and it's waiting approval.

By: Italo Rossi (italorossi) 2012-07-24 11:23:13.554-0500

This patch is running in production environment with ~2000 calls per day.

By: Italo Rossi (italorossi) 2012-07-27 09:21:00.396-0500

I'm attaching my patch again since my license was approved. Until now my environment is running without problems, very stable.

By: Italo Rossi (italorossi) 2012-07-27 09:23:32.285-0500

Anyone can remove the old patches?

By: Richard Mudgett (rmudgett) 2012-07-27 09:51:00.531-0500

Deleted old patch.

By: Maciej Szmigiero (mhej) 2012-10-14 08:04:00.711-0500

Italo's patch only fixes a case where a member is logged only on one queue and
the race is between multiple calls being queued to this queue.

It does not fix the original problem race between multiple queues containing the same member - this is because (as Bruce has said) the same member on different queue is in fact a different member structure, all of them have to be locked for checking the status and calling the channel.

Please repoen the bug.

By: Italo Rossi (italorossi) 2012-10-15 12:34:00.681-0500

You're right. I'm attaching another patch that locks all members with the same interface. I need someone to review and see if this is really the best solution.

By: David Brillert (aragon) 2012-11-05 08:57:13.062-0600

I'm a little confused by the closed/fixed status of this report... due to Italo's last comment regarding another attached patch; was this patch also committed to branch?  Or was another report opened to resolve that issue?

I checked the svn log and do not see Italo's second patch = app_queue.c-svn-r375015.patch committed.
I only see this:

Revision 372088 - Directory Listing
Modified Thu Aug 30 19:24:57 2012 UTC (2 months ago) by root

Merged revisions 372049 via svnmerge from
file:///srv/subversion/repos/asterisk/branches/10

................
 r372049 | mmichelson | 2012-08-30 13:33:37 -0500 (Thu, 30 Aug 2012) | 16 lines
 
 Help prevent ringing queue members from being rung when ringinuse set to no.
 
 Queue member status would not always get updated properly when the member
 was called, thus resulting in the member getting multiple calls. With this
 change, we update the member's status at the time of calling, and we also
 check to make sure the member is still available to take the call before
 placing an outbound call.
 
 (closes issue ASTERISK-16115)
 reported by nik600
 Patches:
  app_queue.c-svn-r370418.patch uploaded by Italo Rossi (license #6409)

By: Matt Jordan (mjordan) 2012-11-05 09:09:28.399-0600

Yes, this issue needs to be reopened - it just hadn't been noticed by a bug marshal yet.  Reopening.

By: David Brillert (aragon) 2012-11-20 12:56:23.930-0600

Italo,

Have you tested your latest patch?
Have you put it up on the reviewboard for review?

By: Italo Rossi (italorossi) 2012-11-22 11:43:40.443-0600

Sorry for the delay.

It was tested in my environment, but I don't have many members in more than one queue in production.

Running the same test described early (see comments above) everything worked ok.

It's not in the reviewboard, can any bug marshal take a quick look and see if it's really necessary to submit for review?


By: Mikhail Lundberg (mvlbrn) 2012-12-03 02:41:27.447-0600

On Asterisk 11.0.1 i have the same issue.

I'v tested latest (app_queue.c-svn-r375015) patch on Asterisk 11.0.1 and issued crashes of Asterisk within ~5 minutes from Asterisk being started.
20 members, 5 separate queues with non constant dynamic members count, up to 18 waiting calls for one of queues, up to 40 simultaneous incoming calls at all.

By: Matt Jordan (mjordan) 2012-12-03 08:23:13.706-0600

Mikhail - if you're seeing a crash, its probably a very different issue.  If its crashing, you need to open a separate Jira issue, generate a backtrace, and attach it to the issue.

By: Mikhail Lundberg (mvlbrn) 2012-12-04 00:20:29.666-0600

Tested with clean rebuild and seen no creshes, so i'm sorry for the last comment, information about crash it not true.

Today I made some tests with latest 11.1.0_rc1 from SVN patched with app_queue.c-svn-r375015 using method with many .call files and still having multiply calls from queue.

{code}
>cat debug |grep -v pbx.c |grep 33099
[2012-12-04 12:49:20] DEBUG[32420][C-0000012c] app_queue.c: Trying 'SIP/33099' with metric 1000
[2012-12-04 12:49:20] DEBUG[32420][C-0000012c] app_queue.c: Locked SIP/33099 in queue 3333
[2012-12-04 12:49:20] DEBUG[32420][C-0000012c] chan_sip.c: Outgoing Call for 33099
[2012-12-04 12:49:20] DEBUG[31211] chan_sip.c: Checking device state for peer 33099
[2012-12-04 12:49:20] DEBUG[31211] devicestate.c: Changing state for SIP/33099 - state 6 (Ringing)
[2012-12-04 12:49:20] DEBUG[31211] devicestate.c: device 'SIP/33099' state '6'
[2012-12-04 12:49:20] DEBUG[32420][C-0000012c] app_queue.c: Unlocked SIP/33099 in queue 3333
[2012-12-04 12:49:20] DEBUG[32421][C-0000012d] app_queue.c: Trying 'SIP/33099' with metric 1000
[2012-12-04 12:49:20] DEBUG[32421][C-0000012d] app_queue.c: Locked SIP/33099 in queue 3333
[2012-12-04 12:49:20] DEBUG[32421][C-0000012d] chan_sip.c: Outgoing Call for 33099
[2012-12-04 12:49:20] DEBUG[31211] chan_sip.c: Checking device state for peer 33099
[2012-12-04 12:49:20] DEBUG[31211] devicestate.c: Changing state for SIP/33099 - state 6 (Ringing)
[2012-12-04 12:49:20] DEBUG[31211] devicestate.c: device 'SIP/33099' state '6'
[2012-12-04 12:49:20] DEBUG[32421][C-0000012d] app_queue.c: Unlocked SIP/33099 in queue 3333
[2012-12-04 12:49:20] DEBUG[32422][C-0000012e] app_queue.c: Trying 'SIP/33099' with metric 1000
[2012-12-04 12:49:20] DEBUG[32422][C-0000012e] app_queue.c: Locked SIP/33099 in queue 3333
[2012-12-04 12:49:20] DEBUG[32422][C-0000012e] chan_sip.c: Outgoing Call for 33099
[2012-12-04 12:49:20] DEBUG[31211] chan_sip.c: Checking device state for peer 33099
[2012-12-04 12:49:20] DEBUG[31211] devicestate.c: Changing state for SIP/33099 - state 6 (Ringing)
[2012-12-04 12:49:20] DEBUG[31211] devicestate.c: device 'SIP/33099' state '6'
{code}

I see Locks and then Unlocks here in seprate threads?

In sip.conf for 33099:
{code}
callconter=yes
busylevel=1
call-limit=3
{code}

In queue.conf:
{code}
autofill=yes
[3333](StandardQueue)
weight=90
strategy = rrmemory
servicelevel = 60
timeout = 15
retry = 3
wrapuptime=30
autopause=no
ringinuse=no
{code}


By: Italo Rossi (italorossi) 2012-12-04 07:04:01.984-0600

Mikhail - Are you seeing any debug messages regarding state change coming from app_queue.c?

Ex:
{code}
app_queue.c: Device 'xxx' changed to state '6' (Ringing)
{code}

By: Jonathan Rose (jrose) 2012-12-04 14:48:44.018-0600

If I understand this patch correctly, it locks all members of the same name across all queues. In order to do this, it has to lock a queue, run through all the members locking the ones that match the name along the way. It unlocks the queue after it has ran through all the members and then it does this to another queue.

Here's the problem.  We have to do this again shortly down the road with the individual members locked while we lock queues, iterate through them unlocking members on the way and then unlocking the queues.

The first time we took the queues lock and then took individual member locks.
The second time we had member locks already and took queue locks which owned those members.

In other words, one of these operations (locking or unlocking) is absolutely going to be locking order inversion which means it will likely be able to cause a deadlock.

We can't give up the queue locks like that if we want to be able to reclaim them again while holding all the locks we currently have. I know a way around this though... we just need to build a list of the locks being taken by the lock all of the same interface function and then have a separate function which releases them all without having to iterate through individual queues.  I'll work on this and post it to reviewboard.

By: Jonathan Rose (jrose) 2012-12-04 16:17:54.490-0600

I'm going to have to cancel on the above statement about fixing it with a list of member that needs to be unlocked.  While it would technically work, the problem here is really with the design where the interface is retrieved from the member while the individual interfaces remain more or less free agents. Trying to lock all these various members is going to cause performance issues.

I'd suggest approaching this problem in the following way:

* break the interface out from the member structure so that instead of having the name of the interface, the member has a reference to an ao2 object containing an interface which can be locked/flagged/etc independently from the queue member.
* As the members are being populated, you maintain a global container of interfaces similar to how we currently have the global container of queues. When adding a new member, you check the name of the interface to see if there is already an instance of it within the interface container. If there is, you take a reference for it as the new member's interface. If there isn't, you create a new interface object and add it to the interface container before grabbing the reference.
* When handling the call as we are right now, we lock/unlock the interface to keep other members from using it rather than trying to lock/unlock all the various members within individual queues.

Some effort will probably need to be made to ensure that interfaces are properly destroyed as members are removed and whatnot. If anyone wants to work on this, feel free to chime in.

By: Mikhail Lundberg (mvlbrn) 2012-12-05 01:34:05.126-0600

Filtered log with 100 concurrent queue calls

By: Mikhail Lundberg (mvlbrn) 2012-12-05 01:34:37.104-0600

Italo, yes i see messages, all of them from one test with 100 calls. Sometimes during phone was Ringing the Hangup button was pressed by me

> cat debug |grep "app_queue.c: Device 'SIP/"
{code}
[2012-12-05 14:23:24] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:24] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:24] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:24] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:25] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:25] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:25] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:26] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:26] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:27] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:27] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:29] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '1' (Not in use)
[2012-12-05 14:23:29] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '1' (Not in use)
[2012-12-05 14:23:32] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:32] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '6' (Ringing)
[2012-12-05 14:23:37] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '1' (Not in use)
[2012-12-05 14:23:37] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '1' (Not in use)
[2012-12-05 14:23:37] DEBUG[1163] app_queue.c: Device 'SIP/33099' changed to state '1' (Not in use)
{code}

Almost full (except some useless info - 'cat debug |grep "2012-12-05 14:23" |grep -v "pbx.c" >debug.filtered') log is attached: debug.filtered.gz

By: Italo Rossi (italorossi) 2012-12-06 13:39:43.430-0600

I've talked today with Jonathan Rose on #asterisk-bugs.

Just to let you know that I'll try to implement the suggested changes in app_queue. I'm coming back as soon as possible with the results.

By: Richard Mudgett (rmudgett) 2012-12-14 15:19:30.637-0600

[^jira_asterisk_16115_revert_r370418_v1.8.patch] - Patch reverts the [^app_queue.c-svn-r370418.patch] because it causes ASTERISK-20801.

By: Richard Mudgett (rmudgett) 2012-12-17 12:04:06.467-0600

[^jira_asterisk_16115_single_q_v1.8.patch] - This patch is a replacement for [^app_queue.c-svn-r370418.patch] and will work for any channel type.  Like the r370418 patch, it prevents multiple calls to a member that is in *only one* queue.

By: Richard Mudgett (rmudgett) 2013-01-08 18:08:15.766-0600

Committed a modified [^jira_asterisk_16115_single_q_v1.8.patch] to appropriate branches.

By: David Brillert (aragon) 2013-01-09 15:40:26.204-0600

Are we to expect a forthcoming patch to fix multiple calls to an in use member when the member belongs to multiple queues as well?

By: Matt Jordan (mjordan) 2013-01-09 16:18:34.039-0600

{quote}
Are we to expect a forthcoming patch to fix multiple calls to an in use member when the member belongs to multiple queues as well?
{quote}

We aren't working on this at this moment in time. If someone from the community would like to propose a patch for this bug, that would be appreciated.

By: Leif Madsen (lmadsen) 2013-01-23 16:29:58.096-0600

Closing issue. As for this issue, it is now resolved. Please open a new issue with a patch for any further refinement in this area.

By: Leif Madsen (lmadsen) 2013-01-23 17:13:29.701-0600

Actually, was a bit quick on the draw to close this. Leaving this open in case someone else wants to complete the multi-queue issues. The patches committed by Richard don't show a "closes issue" in the commit message, just a related link.

By: Alexandre Keller (alexandrekeller) 2013-04-12 08:05:55.697-0500

Hi there. Any news on that? I've been having the same problem, and had to disable autofill on queue setup to make it work.

By: WRP (wrp) 2013-05-17 12:42:56.055-0500

We have also experienced this issue with agents that exist in more than one queue.

Asterisk: 1.8.20.1
ringinuse=no


By: vbcrlfuser (vbcrlfuser) 2013-11-05 13:34:40.802-0600

Confirming I am seeing the same issue with 2 calls randomly delivering to an agent at same time, when assigned to multiple queues, and queues come under load, resulting in a false RINGNOANSWER.


1383577277|hspbx01-1383576644.57611|Q1|SIP/0004F233F52F|CONNECT|579|hspbx01-1383577273.57837|1

1383578052|hspbx01-1383576644.57611|Q1|SIP/0004F233F52F|COMPLETECALLER|579|775|4


1383578074|hspbx01-1383577301.57846|Q1|SIP/0004F233F52F|CONNECT|723|hspbx01-1383578068.58119|3

# THIS EVENT WAS 2ND CALL DELIVERED TO AGENT A
1383578084|hspbx01-1383577726.58005|Q2|SIP/0004F233F52F|RINGNOANSWER|15000

1383578479|hspbx01-1383577301.57846|Q2|SIP/0004F233F52F|COMPLETECALLER|723|405|2


Asterisk 1.6.2.19
80 agents
10-15 queues assigned to each agent
5000 calls a day

sip.conf
limitonpeers=yes
call-limit=2
ringinuse=no



By: vbcrlfuser (vbcrlfuser) 2013-11-05 16:34:20.249-0600

Ok having read the code is the answer to simply create a global list of interfaces and their status (in one place). So that each queue does not have a member and a status that has to be updated.  In other words just because interface SIP/1234 is a member of 10 queues does not mean it has has 20 statuses to maintain. SIP/1234 has one status to which all queues and threads should adhere.

I know I'm making that overly simple but it is something along the lines of below? Any roadblocks I'm not seeing other than a billion places needing a touch up in app_queue.c and the ever present threat of getting locks wrong?

struct member_status
{
char interface[80];
char state_interface[80];
int status;
}

static struct ao2_container *member_statues;

update handle statechange to modify the global list, not the member list

update ring_one and ring_entry to check the global list, not the member list

change every where there is a q->member->status to use the global list



By: vbcrlfuser (vbcrlfuser) 2013-11-05 16:47:53.985-0600

Has anyone thought about using some sort of reservation approach too? So in addition to state a thread agrees not to attempt a call for another queue while the interface has been reserved for 1-2 seconds?






By: David Brillert (aragon) 2014-02-25 12:48:57.572-0600

I'm bring this up again because:
1. The ticket is still open
2. We still have problems with agents receiving multiple calls while in use when logged to multiple queues.
3. Any chance this issue was fixed by the patch in ASTERISK-22189 ?

By: Matt Jordan (mjordan) 2014-02-25 12:58:55.342-0600

Nope, this was not solved by ASTERISK-22189.

This issue is still open because no one has proposed a patch that solves the issue. The issue here, fundamentally, is that the {{ringinuse}} option requires the global state of the queue member across all queues to be known. That requires locking all queues in order to get that state accurately, which (a) has massive performance implications, and (b) requires very careful usage of the locking in {{app_queue}} to avoid deadlocking the whole thing.

Until someone tackles that problem, this issue will remain open.

By: Italo Rossi (italorossi) 2014-02-25 13:14:48.920-0600

As I Remember, the correct solution is to extract the interface from queue member specific struct and replace that for a pointer to a global struct with all unique interfaces. This is the safe way to lock the member's interface.

I've started to write something but to be honest, I don't have to much time RIGHT NOW to do this and since I'm not a ao2_container expert, it'll take some time...

By: Matt Jordan (mjordan) 2014-02-25 13:26:10.097-0600

That wouldn't be "safe".

Say I had a queue member struct 'bob_1' in queue 'sales', and bob has a pointer back to some global state.

Now say I have the same queue member - but with a different struct allocated in memory, we'll say 'bob_2' - in queue 'support'. This bob has a pointer back to some global state as well.

Now if we want to determine if bob_2 is available, we have to follow his pointer back to the global state. Unfortunately, at the same time, bob picks up his phone to answer a call in sales. This has to go back and update his state in the global state.

And now we have a race condition.

You can solve this with judicious use of locking and other funness, but it isn't trivial and the locking will still result in essentially providing a 'global lock' across all queues.

By: Italo Rossi (italorossi) 2014-02-25 13:51:21.397-0600

Yeah, you're right.

Maybe assigning an new state to the interface can help.

On the global struct, just after acquiring the lock and before creating the channel to dial bob_2, we can set a state = offering.

The call from sales queue only get's delivered to bob if his state_interface is available and also his state is != from offering.

{noformat}
acquire global interface lock;
give up if state_interface is not available or state is offering;
set state to offering if state is waiting;
dial if not offering;
release lock;
{noformat}

This new state needs to be updated when bob answers/rejects the call.

What do you think?

By: Shlomi Gutman (voicenter) 2014-03-02 04:27:27.910-0600

Hello, i opened similar bug report for versions 1.8 and 11 with patch for 11.6-cert1 ( i do have as well patched 1.8 and can provide patch (basically it's almost the same one )
I find that problem is caused when the agent that is in call is getting reachable/unreachable status or in my case as devices in realtime and if i do sip reload in specific timing ( in between the sip devices got connected back and Queue is trying to call it) the checks that are done to be sure if the agent is not busy would return his status as not in use, which is faulty. So basically what i did is checked if there are any channels open with specific member-device(in my case it's done only if device is realtime one so if you want to check the patch, pay attention to that and you may want to delete that from if)
I'll attach patch here, but originally it was attached to  my report here - ASTERISK-23378




By: Shlomi Gutman (voicenter) 2014-03-02 04:30:56.180-0600

this one is for 11.6-cert1 and pay attention to the "call->member->realtime" as it work only for members from realtime

By: David Brillert (aragon) 2014-03-03 11:27:24.575-0600

I'm not using realtime

By: Shlomi Gutman (voicenter) 2014-03-03 11:42:54.929-0600

than just change " || (call->member->realtime && get_if_queue_member_has_chan(call->member->state_interface))) {" in patch  to
" || ( get_if_queue_member_has_chan(call->member->state_interface))) {"
and check if you can reproduce

By: David Brillert (aragon) 2014-03-03 11:49:11.070-0600

@Matt ASTERISK-23378 was closed out as a duplicate and the patch never committed.
Will you be reviewing Shlomi's proposed solution?

By: Shlomi Gutman (voicenter) 2014-03-04 02:19:59.346-0600

actually the patch here is actually the same i attached there, it's just i still was not approved to apply patches, so it was applied not as contribution (which i mentioned in comments), if it's important i can re-apply it there again (it's just the issue was closed, wasn't it?).

By: Shlomi Gutman (voicenter) 2014-03-04 02:24:29.670-0600

As well as it's important to understand:
1) How this patch influence on performance
2) If it's applied and used, there are several "double" checks in code before mine which we could not rely on(otherwise there were no need for patch :P), so maybe it would be better option not to add extra check but to switch old checks for this one



By: David Brillert (aragon) 2014-03-13 21:22:16.646-0500

@Shlomi, why don't you propose your patch on the reviewboard?
https://reviewboard.asterisk.org/r/

By: Shlomi Gutman (voicenter) 2014-03-16 05:29:39.080-0500

I don't mind to try (as never had experience before to use rbtools).
As well as there is still no feedback if it resolved anyone's problem and if there was any regression on performance, so should i propose it anyway or not?

By: David Brillert (aragon) 2014-03-16 12:16:53.841-0500

I think you should post your patch on the reviewboard.
There you will get feedback from the Digium devs and if you get a 'ship it' your contribution will be officially committed to the Asterisk code.

By: David Brillert (aragon) 2014-03-18 10:58:06.805-0500

@shlomi https://wiki.asterisk.org/wiki/display/AST/Code+Review


By: Shlomi Gutman (voicenter) 2014-03-30 10:01:07.220-0500

David, here you go - https://reviewboard.asterisk.org/r/3409/

By: David Brillert (aragon) 2014-03-31 08:09:45.153-0500

Good stuff Shlomi, now you will get a proper review of your code and others may be more willing to test your patch and confirm its effectiveness.  I plan to test your patch this week and even if it violates some coding guidelines, I'll be happy if it resolves my issues.

By: Shlomi Gutman (voicenter) 2014-04-13 03:09:08.190-0500

David, is there any news regarding the tests?

By: David Brillert (aragon) 2014-04-14 08:28:59.542-0500

Hi Shlomi, the end user hasn't agreed to insert the patch yet.

By: Mikhail Lundberg (mvlbrn) 2014-04-28 09:55:08.802-0500

I'v patched asterisk 11.8.1 on production and still have more then one call from queue to the agent.
Members are not realtime, queues.conf:autofill=no

By: Krzysztof Chmielewski (kristoff) 2014-06-02 05:15:20.215-0500

Asterisk 11.9.0 has the same issue.

By: Matt Jordan (mjordan) 2014-06-02 07:52:55.103-0500

Folks - commenting that such and such version "still has this issue" isn't helpful. The issue is open, it is still a bug in Asterisk. Of course the latest version still has the bug: the bug has not yet been fixed.

Unfortunately, Shlomi's patch (https://reviewboard.asterisk.org/r/3409/) does not appear to be a correct solution to this problem. If you are still having issues with this patch, that would verify some of the review findings. If, however, the patch does resolve this issue for you, that is also useful information. Testing that patch some more would be helpful.

Based on Shlomi's patch, it is also possible that ASTERISK-18411 is contributing to this issue. The patch proposed on https://reviewboard.asterisk.org/r/3508/ may also be helpful.

Please test these patches out - independently and together. Comment on this issue with the results. That will help contribute to fixing and closing this issue, as opposed to merely spamming it with "me too".

By: Fernando Lujan (flujan) 2014-06-11 10:16:00.837-0500

https://reviewboard.asterisk.org/r/3508/ and asterisk 11.10.
Problem IS NOT fixed.


By: Maciej Krajewski (jamicque) 2014-12-03 03:46:42.350-0600

Hi Matt, I have tested all the patches cotribiuted to that issue. Both independently and together. However, none is solving the issue when member is in multiple queues. He can still get two, and even sometimes 3 calls under heavy load.

By: xCally Support (xcally.support) 2015-11-06 09:23:53.611-0600

Hi there. Any news on that?

By: Chet Stevens (cwstevens) 2015-11-06 09:35:57.637-0600

We have agents who belong to several queues and they receive many calls. They were constantly receiving 2-4 calls simultaneously. As a workaround we had to add some additional dialplan in before ringing the agents to count the calls then indicate Congestion on the additional calls. The call will just stay in the queue in this case. The context that we use for calls that are being sent to the agents is below. We are currently using GLOBAL variables but had previously been using the Asterisk database to track the calls. I've included the commented lines for the database tracking. The feeling was that GLOBALS would be able to update fast enough to keep up with the flood of calls.

This does "fix" the problem as far as the agents are concerned but reports will show a lot more unanswered calls by the agents as a result of the Congestion.

{noformat}
[agents]
exten => _X.,1,NoOp()
same => n,Set(__agent=${EXTEN})
same => n,Set(GLOBAL(calls_${agent})=${IF($[${ISNULL(${GLOBAL(calls_${agent})})}]?0:${GLOBAL(calls_${agent})})})
same => n,Set(GLOBAL(calls_${agent})=${IF($[${GLOBAL(calls_${agent})} < 0]?0:${GLOBAL(calls_${agent})})})
same => n,Set(GLOBAL(calls_${agent})=$[${GLOBAL(calls_${agent})} + 1])
;same => n,Set(DB(calls/${agent})=${IF($[${DB_EXISTS(calls/${agent})}]?${DB(calls/${agent})}:0)})
;same => n,Set(DB(calls/${agent})=${IF($[${DB(calls/${agent})} <= 0]?1:$[${DB(calls/${agent})} + 1])})
same => n,GotoIf($[${GLOBAL(calls_${agent})} <= 1]?continue)
;same => n,GotoIf($[${DB(calls/${agent})} <= 1]?continue)
same => n,NoOp(CALLS: ${GLOBAL(calls_${agent})}. AGENT PJSIP/${EXTEN}_line IS ${DEVICE_STATE(PJSIP/${EXTEN}_line)}. ${caller_number} from ${queue}. CONGESTION!)
same => n,Congestion()
same => n(continue),NoOp(CALLS: ${GLOBAL(calls_${agent})}. AGENT PJSIP/${EXTEN}_line IS ${DEVICE_STATE(PJSIP/${EXTEN}_line)}. ${caller_number} from ${queue}. SENDING.)
same => n,Set(__TRANSFER_CONTEXT=agent_transfer)
same => n,Set(CALLERID(num)=${caller_number})
same => n,Set(CALLERID(name)=${caller_name})
same => n,Dial(PJSIP/${EXTEN}_line,,A(${queue_file}))
same => n,Hangup()

exten => h,1,NoOp()
same => n,Set(GLOBAL(calls_${agent})=$[${GLOBAL(calls_${agent})} - 1])
same => n,NoOp(   CALLS: ${GLOBAL(calls_${agent})}. AGENT PJSIP/${agent}_line. ${DIALSTATUS})
;same => n,Set(DB(calls/${agent})=$[${DB(calls/${agent})} - 1])
{noformat}

We also zero out the count when the agent logs out to avoid any situations where new calls won't reach the agent because the system left the variable at something greater than 0.

{noformat}
same => n,Set(GLOBAL(calls_${agent_number})=0)
{noformat}

By: xCally Support (xcally.support) 2015-11-07 11:36:18.286-0600

Thank you very much Chet.
This seems to be a nice workaround... However it includes some key limitations for a call center environment, where the Analytics are really important (the suggested workaround will generate a lot of fake reports).
Therefore, I don't think it is a feasible solution for a real environment.

I suggest to establish a task force to focus on this issue at a "root level", so to modify the queue application and definitely fix it.

As far as the call center environments are concerned, this is really one of the Asterisk most relevant open issues which needs to be addressed.

Any new patch or contribution is welcome... We will be glad to check and test it.

By: Richard Miller (ulogic) 2016-06-03 16:31:20.010-0500

I downloaded the latest version of app_queue.c for Asterisk 11.22+ dated 2016-04-25 from gerrit.  Still seeing the problem.  The CEL shows two calls firing off (CHAN_START events) to a single agent within 50ms of each other, sometimes even from the same queue.  I am a pretty competent coder.  Do you want me to take a stab at fixing it, or is someone at Digium charged with maintaining this module?

By: Joshua C. Colp (jcolp) 2016-06-03 16:36:05.599-0500

11.22.0 does not include the fix which resolved this issue, no release has been made yet of 11 which includes it. You would need to pull it into the release manually or use the 11 branch from git.

By: Richard Miller (ulogic) 2016-06-03 16:52:12.235-0500

Doesn't the download link on the right-hand panel of this page retrieve the latest version?  That file is dated April 25, 2016 which is the day this issued was deemed resolved?

https://gerrit.asterisk.org/#/c/2679/1/apps/app_queue.c

By: Joshua C. Colp (jcolp) 2016-06-03 16:53:27.097-0500

Ah, I see. I'd suggest opening up an issue with details to track it. There is no one working on app_queue, so if you feel as though you can take a look then feel free to.

By: Joshua C. Colp (jcolp) 2016-06-03 16:55:04.666-0500

The download link downloads that specific change, not the latest version.

By: Richard Miller (ulogic) 2016-06-03 17:01:23.066-0500

Should a new (linked) issue be created, or should it be continued here?

By: Joshua C. Colp (jcolp) 2016-06-03 17:03:35.837-0500

A new issue should be created.