[Home]

Summary:ASTERISK-18411: Queue members with hints for state_interface get stuck in "In Use" state.
Reporter:Steven Wheeler (swheeler)Labels:patch
Date Opened:2011-09-02 15:17:33Date Closed:2017-12-15 09:50:10.000-0600
Priority:MinorRegression?
Status:Closed/CompleteComponents:Applications/app_queue
Versions:1.8.5.0 12.1.0 Frequency of
Occurrence
Occasional
Related
Issues:
is duplicated byASTERISK-23724 Agent stay "In use" after logoff
Environment:64 bit CentOSAttachments:( 0) ASTERISK-18411_1.8.26.0.patch
( 1) ASTERISK-18411_11.8.0.patch
( 2) ASTERISK-18411_12.1.0.patch
Description:We have noticed a few times in the last week that the state of queue members has been stuck in "In use" state even though their state interface says "Idle".  When this happens the queue no longer routes calls to the agents.  The only work around we have found so far is to delete the member entries from the realtime table and then add them back in.

For instance the queue says all the members are In use:
{noformat}
asterisk -rx 'queue show queue1'
queue1 has 0 calls (max unlimited) in 'linear' strategy (0s holdtime, 67s talktime), W:0, C:1, A:3, SL:100.0% within 86400s
  Members:
     Member 1 (Local/member1@queue_calling/n) (realtime) (In use) has taken no calls yet
     Member 2 (Local/member2@queue_calling/n) (realtime) (In use) has taken 1 calls (last was 6076 secs ago)
     Member 3 (Local/member3@queue_calling/n) (realtime) (In use) has taken no calls yet
     Member 4 (Local/member4@queue_calling/n) (realtime) (In use) has taken no calls yet
  No Callers
{noformat}

But their hints say otherwise:
{noformat}
asterisk -rx 'core show hints'
               member1@blf                 : SIP/member1_softphon  State:InUse           Watchers  0
               member2@blf                 : SIP/member2_softphon  State:Idle            Watchers  0
               member3@blf                 : SIP/member3_softphon  State:InUse           Watchers  0
               member4@blf                 : SIP/member4_softphon  State:Idle            Watchers  0
{noformat}

The queue member table configuration:
{noformat}
mysql> select * from Queue_Members;
+----------+------------+------------+-------------------------------+------------------+---------+--------+
| uniqueid | membername | queue_name | interface                     | state_interface  | penalty | paused |
+----------+------------+------------+-------------------------------+------------------+---------+--------+
|    31229 | Member 2   | queue1     | Local/member2@queue_calling/n | hint:member2@blf |    NULL |   NULL |
|    31230 | Member 3   | queue1     | Local/member2@queue_calling/n | hint:member3@blf |    NULL |   NULL |
|    31231 | Member 4   | queue1     | Local/member4@queue_calling/n | hint:member4@blf |    NULL |   NULL |
|    31232 | Member 1   | queue1     | Local/member1@queue_calling/n | hint:member1@blf |    NULL |   NULL |
+----------+------------+------------+-------------------------------+------------------+---------+--------+
{noformat}

This issue occurs infrequently so a debug log would be huge, but if there is more information you need I will try to get it.
Comments:By: Leif Madsen (lmadsen) 2011-09-12 13:15:58.847-0500

From IRC after talking to Mark Michelson:

<putnopvut> I've never seen a hint used as a state interface before. That's clever. Um, the fact that the queue and the hint report different states *seems* buggy, but I can't say too much more without looking into it a lot more than I have time for.

Leif:  I've tested this with SIP peers (so instead of using a hint for the state_interface, using a SIP channel name) and I know that works, but someone will have to look into this.

By: Henry Fernandes (usinternet) 2011-12-14 15:55:46.677-0600

I am a colleague of the original poster, Steven.  I have information on how to reproduce this problem.

1. Call into a queue and have one of the members answer the call. At this point, the member is "in use".  
2. Do a "core reload".
3. Hang up the queue call.  You'll see that the member is stuck "in use".

Note that we can reproduce this on 1.8.7.1 and also 1.8.8.0-rc5.



By: Henry Fernandes (usinternet) 2011-12-14 16:46:53.435-0600

I have some more information.  It seems like this is a problem related to the hints.  

After I get the queue member state stuck "in use", I can clear this state by dialing out.  But this only works if I have a watcher for the hint for this phone.

By: Sébastien Couture (sysreq) 2013-04-11 11:45:33.196-0500

We are experiencing the same exact issue with 1.8.20.0, under the same configuration, namely Local channels as queue members (Realtime) with hints as state_interface.

By: Gregory Malsack (gmalsack) 2013-08-09 15:06:53.426-0500

I too am having the same issue... 1.8.22. This seems to have been an issue for nearly 2 years now. Is there any hope of someone fixing this issue???

By: Sébastien Couture (sysreq) 2014-01-14 18:58:33.007-0600

We are still experiencing the issue under 1.8.25.0.

By: Sébastien Couture (sysreq) 2014-01-20 19:02:16.825-0600

It turns out we are seeing the same issue with Asterisk 11.7.0.

The issue can be reproduced as easily as doing a 'dialplan reload' while an agent is either receiving a call ('Ringing') or is currently on a call ('In Use'). After having reloaded the dialplan, agents remain in the state they were in at the time, regardless of if the call they were originally on has ended.

Once again, we're using Realtime queues and queue members, and our agents are Local channels with dialplan hints as state_interface's.

The only difference we have noted between 1.8.25.0 and 11.7.0, as far as this issue goes, is that an agent stuck in the 'Ringing' state will not be given any more calls under 1.8.25.0 when 'ringinuse=yes', whereas under 11.7.0 he will. Under both versions, an agent stuck in the 'In Use' state will be given additional calls when 'ringinuse=yes'.

What would you guys recommend we do (ie. what compile flag to enable, tools to use, etc.) to troubleshoot this issue once and for all?

By: Rusty Newton (rnewton) 2014-02-27 16:27:17.758-0600

Sebastien, if you can reproduce the issue reliably, feel free to add additional debug to the issue.

You can pretty much follow the logging guide here: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information and probably provide output of "queue show <queuename>" at various times to demonstrate the issue. It always helps to attach your configuration.

More data is always helpful, but it looks like this issue is just waiting on a developer to work it.

By: Paul Belanger (pabelanger) 2014-03-01 23:41:15.045-0600

As a workaround, you could setup custom device states for your local channels. We do this in 1.8.7.1.

By: Sébastien Couture (sysreq) 2014-03-02 10:16:57.086-0600

Paul,

If I understand correctly, you're creating/destroying a custom device state as part of your agents' queue login/logout routines, and you then toggle the custom device state (from INUSE to NOT_INUSE and back) in the code that gets called when a call is distributed to your Local channel agents, as well as upon hang up, instead of using the Local channel's actual state or using hints?

By: Sébastien Couture (sysreq) 2014-03-02 10:32:41.963-0600

Rusty,

I'll be doing some more debugging this week on my end, trying to pinpoint where the problem could come from (ie. Realtime queues/queue_members vs. static config file, high number of contexts being loaded on a 'dialplan reload' vs. low number, etc.).

I'll keep you updated.

By: Paul Belanger (pabelanger) 2014-03-02 16:14:25.218-0600

Right, we needed to remove the dependency off realtime SIP and queues. Our SIP subscribers live in kamailio, and our queues / queue members live in an external application. We use AddQueueMember / RemoveQueueMember via dialplan / AMI and setup custom device states using local channels.  It works very well.

By: Sébastien Couture (sysreq) 2014-03-03 20:09:52.748-0600

All right, I think I may have figured it out. The issue has to do with dynamic hints with no watchers being lost following a {{dialplan reload}}, and queues hanging on to those hints' last states; it can get stuck in both {{In use}} or {{Not in use}}.

Consider the following dynamic hint, where func_odbc function {{HINT_GET(100)}} returns {{SIP/100}}:
{code}
[agents]
exten => _XXX,hint,${HINT_GET(${EXTEN})}
{code}

Now, if I log agent 100 into queue {{test}} through the CLI (it could be through AddQueueMember, or by adding a SQL entry in the queue_members Realtime table.. makes no difference) and have it use {{hint:100@agents}} as its {{state_interface}}:
{code}
*CLI> queue add member Local/100@dial-agent/n to test penalty 0 as 100 state_interface hint:100@agents
{code}

.. the queue will then internally "follow" the hint's device state and apply it to the agent using app_queue.c's  {{get_queue_member_status()}} function, which gets called when a new agent is added to a queue. This actually triggers the dynamic hints, and creates a specific hint for '100'. Indeed, a {{core show hint 100}} will then show the following:
{code}
*CLI> core show hint 100
100@agents  :   SIP/100  State:Idle        Watchers  0
{code}

Let's have the agent take a call from the queue:
{code}
*CLI> queue show test
test has 0 calls (max unlimited) in 'ringall' strategy (1s holdtime, 3s talktime), W:0, C:1, A:0, SL:0.0% within 0s
  Members:
     100 (Local/100@dial-agent/n from hint:100@agents) (ringinuse disabled) (dynamic) (In use) has taken 1 calls (last was 11 secs ago)
{code}

Now, if I do a {{dialplan reload}}, hint {{100@agents}} disappears because Asterisk thinks it's not watched by anything, which isn't totally true because a queue now relies on it to determine one of its agent's state. At that point, the queue just lost what it was using to determine agent 100's state, and when the call ends, the agent's state will be stuck {{In use}}. The fact that the queue has {{ringinuse=no}} here will prevent the agent from getting any more calls.

If I had had a phone subscribe to that hint, and thus had a "watcher", it wouldn't have disappeared and the queue would've still had a device state to use.

I would expect Asterisk to not destroy a hint (or to recreate it) following a {{diaplan reload}} if it has any watchers (which is the case) or if some module "follows" that hint's state internally.

I hope my explanations are somewhat clear.

By: Rusty Newton (rnewton) 2014-03-03 21:29:02.849-0600

Sebastien, that looks accurate. A quick test following your summary shows the same results.

By: Sébastien Couture (sysreq) 2014-03-10 10:27:57.520-0500

The rationale behind this patch is that when a hint is used as a 'state_interface', app_queue should explicitly subscribe to it and become a watcher. That will prevent a dynamic hint from disappearing after a {{dialplan reload}} because Asterisk thinks it's not being "watched" by anything. We should also unsubscribe from the hint when the agent is removed from the queue.

This patch was mainly tested under Asterisk 11.7.0, but I have also ported it for 1.8.26.0, 11.8.0 and 12.1.0 (applies and compiles). I have tested this patch by using either a dynamic hint, a static hint, a local channel or a SIP device as a queue member's 'state_interface'. I have also added/removed the queue member both through Realtime and the CLI.

By: Sébastien Couture (sysreq) 2014-03-27 20:58:37.225-0500

We've had this patch on about half a dozen 1.8/11 production systems (all averaging a thousand users each) for a little more than two weeks with no apparent issues. So that's promising, but I would really like if someone else could test it if possible.

Also, how do I go about submitting this patch to the reviewboard?

By: Matt Jordan (mjordan) 2014-05-08 12:25:49.008-0500

[~sysreq]: There's one last comment on the review. If a user is reloaded via {{reload_single_member}}, the subscription for the hint won't get purged. It should be relatively easy to fix - if you don't have time to tweak the patch, let me know and we can put a revised version up.

By: David Brillert (aragon) 2015-07-29 10:37:48.884-0500

The patch here https://reviewboard.asterisk.org/r/3508/ is waiting for updates and commit.  Can someone take a look or add the reviewboard patch to Gerrit and update the patch as per Matt Jordan?

By: Niccolò Belli (darkbasic) 2016-09-06 15:04:33.419-0500

Has the patch been merged? Is it fixed in latest Asterisk version?

By: Joshua C. Colp (jcolp) 2016-09-06 15:07:49.333-0500

The change has not been merged as of this time.

By: CGI.NET (nsnake) 2016-09-07 00:20:51.329-0500

Dose asterisk13 have this issue ?

By: Niccolò Belli (darkbasic) 2016-09-07 07:21:40.903-0500

I still have to upgrade and switch local channels to realtime, I wanted to know the exact same thing :)

By: Friendly Automation (friendly-automation) 2017-12-15 09:50:11.402-0600

Change 7542 merged by Jenkins2:
app_queue: Fix extension state subscriptions removed on dialplan reload

[https://gerrit.asterisk.org/7542|https://gerrit.asterisk.org/7542]

By: Friendly Automation (friendly-automation) 2017-12-15 09:56:14.676-0600

Change 7540 merged by Joshua Colp:
app_queue: Fix extension state subscriptions removed on dialplan reload

[https://gerrit.asterisk.org/7540|https://gerrit.asterisk.org/7540]

By: Friendly Automation (friendly-automation) 2017-12-15 10:19:02.872-0600

Change 7541 merged by George Joseph:
app_queue: Fix extension state subscriptions removed on dialplan reload

[https://gerrit.asterisk.org/7541|https://gerrit.asterisk.org/7541]