[Home]

Summary:ASTERISK-26133: app_queue: Queue members receive multiple calls
Reporter:Richard Miller (ulogic)Labels:
Date Opened:2016-06-21 10:51:06Date Closed:2016-07-15 11:59:44
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_queue
Versions:11.22.0 13.9.1 Frequency of
Occurrence
Occasional
Related
Issues:
is related toASTERISK-26299 app_queue: Queue application sometimes stops calling members with Local interface
Environment:Attachments:( 0) app_queue.diff
Description:Issue ASTERISK-16115 was considered closed by
https://gerrit.asterisk.org/#/c/2677/
https://gerrit.asterisk.org/#/c/2679/
for versions 13.9 and 11.22

However, multiple calls are still happening occasionally.  I have determined the cause.  I will upload the patch to the issue tracker first.  The same fix applies to both versions 11 and 13.  I will post the fix to gerrit once I have figured out how to use it.

In the troubleshooting, I adjusted the debug level of many of the calls to ast_debug() because there was just way too much output with "core set debug 1 app_queue".  Should I include those changes when posting the fix?

The intent of the above patches was to keep track of which agents had a call pending, then clear the pending status when a device state change event arrives.  What is causing the problem is that with some SIP clients, there are multiple events firing.

This is the scenario.  A call ends for an agent and the device is no longer in use.  A queue entry finds that there is now an agent available, then dials that member.  Before the call begins ringing, a second device state event from the prior call ending arrives showing that the device is not in use.  That event clears the call pending state, so the next one in queue gets dispatched and finds the same agent is available.  This is more likely to happen when the agent is at a remote location so there is more latency than when they are on the same local area network as the Asterisk server.

I must say that I am not satisfied with the implementation of this fix.  Prior to the above patches, there was already in place a "callattempt" structure that can be probably be used for call coordination.  A lot of the contention arbitration is done at the time the call is placed.  I want to move a lot of that logic earlier to where the queue entry determines if it is next to go.  That will also make it easier to implement cross-queue servicing of calls with the longest hold time.  See the following issues which are still marked as being open:
ASTERISK-17570
ASTERISK-17732

Here is the log information that I used to determine the problem.
{noformat}
[06-16 15:11:02.636] DEBUG[8432] app_queue.c: Device 'SIP/6347' changed to state '1' (Not in use)
[06-16 15:11:02.663] DEBUG[10586][C-00000790] app_queue.c: There is 1 available member.
[06-16 15:11:02.663] DEBUG[10586][C-00000790] app_queue.c: It's our turn (SIP/L3CO811-00000db7).
[06-16 15:11:02.663] DEBUG[10586][C-00000790] app_queue.c: SIP/L3CO811-00000db7 is trying to call a queue member.
[06-16 15:11:02.664] DEBUG[10586][C-00000790] app_queue.c: Trying 'SIP/6347' with metric 1000000
[06-16 15:11:02.664] DEBUG[10586][C-00000790] app_queue.c: Found matching member SIP/6347 in queue 'xxxxx'
[06-16 15:11:02.665] VERBOSE[10586][C-00000790] netsock2.c: [06-16 15:11:02.665]   == Using SIP RTP TOS bits 184
[06-16 15:11:02.665] VERBOSE[10586][C-00000790] netsock2.c: [06-16 15:11:02.665]   == Using SIP RTP CoS mark 5
[06-16 15:11:02.665] VERBOSE[10586][C-00000790] app_queue.c: [06-16 15:11:02.665]     -- Called SIP/6347

*********** this is the event that causes the problem ************************
[06-16 15:11:02.669] DEBUG[8432] app_queue.c: Device 'SIP/6347' changed to state '1' (Not in use)

[06-16 15:11:02.688] DEBUG[10744][C-00000794] app_queue.c: There is 1 available member.
[06-16 15:11:02.688] DEBUG[10744][C-00000794] app_queue.c: It's our turn (SIP/L3CO811-00000dbe).
[06-16 15:11:02.688] DEBUG[10744][C-00000794] app_queue.c: SIP/L3CO811-00000dbe is trying to call a queue member.
[06-16 15:11:02.689] DEBUG[10744][C-00000794] app_queue.c: Trying 'SIP/6347' with metric 1000000
[06-16 15:11:02.689] DEBUG[10744][C-00000794] app_queue.c: Found matching member SIP/6347 in queue 'xxxxx'
[06-16 15:11:02.690] VERBOSE[10744][C-00000794] netsock2.c: [06-16 15:11:02.690]   == Using SIP RTP TOS bits 184
[06-16 15:11:02.690] VERBOSE[10744][C-00000794] netsock2.c: [06-16 15:11:02.690]   == Using SIP RTP CoS mark 5
[06-16 15:11:02.690] VERBOSE[10744][C-00000794] app_queue.c: [06-16 15:11:02.690]     -- Called SIP/6347
[06-16 15:11:02.706] DEBUG[8432] app_queue.c: Device 'SIP/6347' changed to state '6' (Ringing)
[06-16 15:11:02.823] DEBUG[8432] app_queue.c: Device 'SIP/6347' changed to state '6' (Ringing)
[06-16 15:11:02.828] DEBUG[8432] app_queue.c: Device 'SIP/6347' changed to state '6' (Ringing)
[06-16 15:11:02.828] VERBOSE[10586][C-00000790] app_queue.c: [06-16 15:11:02.828]     -- SIP/6347-00000dcb is ringing
[06-16 15:11:02.828] VERBOSE[10586][C-00000790] res_musiconhold.c: [06-16 15:11:02.828]     -- Stopped music on hold on SIP/L3CO811-00000db7
[06-16 15:11:02.953] VERBOSE[10744][C-00000794] app_queue.c: [06-16 15:11:02.953]     -- SIP/6347-00000dcc is ringing
[06-16 15:11:02.953] VERBOSE[10744][C-00000794] res_musiconhold.c: [06-16 15:11:02.953]     -- Stopped music on hold on SIP/L3CO811-00000dbe
{noformat}

Also the "Found matching member" debug message is showing a different queue than the one being serviced.
Comments:By: Asterisk Team (asteriskteam) 2016-06-21 10:51:07.516-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Richard Miller (ulogic) 2016-06-21 17:19:45.802-0500

Even with the change I just put in, there was still one instance of this occurring today 2016-06-21.  Unfortunately, I did not have debug turned on.  The verbose log did show a ringing event before the second call was placed.  I see that a status of ringing is checked by is_member_available() when seeing if the queue entry is next to go but is not checked by member_status_available() when it attempts to dial.  This is why I would link the queue entry to a call attempt earlier in the process.  I will turn debug back on and monitor it for as long as necessary for it to happen again.

By: Richard Miller (ulogic) 2016-06-22 14:57:55.079-0500

I was finding instances of multiple calls by looking in the CEL for two calls within 1 second of each other involving the same extension.

I have very good news.  This happened twice yesterday and once today, but they all have explanations.  Two of the three times, the call was abandoned at the very moment the agent was being called, so the next call was immediately dispatched.  The other time, it was an after hours agent who happened to be placing an outside call just at the same moment there was an incoming queue call.

I will upload the patch to the issue tracker first just so there is a record of it.  I have never used git previously, so you will have to give me a few days to learn how to use it before uploading the patch to gerrit.

However, I am still inviting comments on using the "callattempt" structure to know when an agent has initiated a call instead of using the pending_members container that was introduced in the April 25, 2016 patch to address the multiple call issue.

By: Richard Miller (ulogic) 2016-06-23 10:07:10.210-0500

Only remove member from the pending_members container when the device state actually changes.

By: Luis Aguirre (laar789) 2016-09-20 17:35:08.016-0500

Hi, I hope somebody read this. Just one question, does this patch will be submitted to an asterisk 11 version? I use asterisk 11 to my application now, and it will take me some time to upgrade to 13 or 14, can somebody please add this to asterisk 11? Or just applying the patch will work? someone has test the patch in asterisk 11? Thank you a lot.

By: Joshua C. Colp (jcolp) 2016-09-20 17:41:40.533-0500

This change went into Asterisk 11, but a release including it has not yet happened.

By: Friendly Automation (friendly-automation) 2017-01-29 14:06:54.965-0600

Change 4814 merged by zuul:
app_queue: Only remove queue member from pending when state changes.

[https://gerrit.asterisk.org/4814|https://gerrit.asterisk.org/4814]