[Home]

Summary:ASTERISK-29578: app_queue: Custom device state using included hints do not update
Reporter:N A (InterLinked)Labels:
Date Opened:2021-08-15 11:26:58Date Closed:2021-09-21 14:30:47
Priority:MinorRegression?
Status:Closed/CompleteComponents:Applications/app_queue
Versions:18.4.0 Frequency of
Occurrence
Frequent
Related
Issues:
causesASTERISK-29806 app_queue: extension state incorrect
Environment:Debian 10 Tested on Asterisk 18.4 with chan_pjsip and Asterisk 18.3 with chan_sip, stock queues.conf configuration. Bug appears in both.Attachments:( 0) cli.txt
( 1) debug_log_29578.txt
( 2) queues.conf
Description:I have been encountering an issue for several weeks now where the device state that app_queue has is consistently outdated.

I am using statically configured queue members in queues.conf - the queue members themselves are not realtime, dynamic, etc.

Things will work fine initially, but eventually - and especially if a device was temporarily unavailable and is back online later - the device state that app_queue has will become outdated and inaccurate. Here is a recent call, for instance, where I manually use ${DEVICE_STATE()} to check the state of a member in the queue:

[2021-08-12 23:03:00]     -- Executing [s@queue:2] NoOp("IAX2/8142", "NOT_INUSE") in new stack
[2021-08-12 23:03:00]     -- Executing [s@queue:3] Set("IAX2/8142", "__queuechannel=IAX2/8142") in new stack
[2021-08-12 23:03:00]     -- Executing [s@queue:4] Set("IAX2/8142", "QUEUENAME=test") in new stack
[2021-08-12 23:03:00] WARNING[6505][C-000002bc]: app_queue.c:8356 queue_exec: Unable to join queue 'test'

If I run "queue show", all the queues with that member show "Unavailable".

Running "queue reload all" alone will not fix the issue, because app_queue is smart enough to realize the config file has not changed.

Running "touch queues.conf" and *then* "queue reload members" will fix the issue, forcing app_queue to purge its stale device status and then things are finally updated... for the moment. Before long, everything is stale again.

The problem seems to be that app_queue is consistently wrong and stale to the point where the application is unusable, and unless I call System(touch queues.conf; asterisk -rx "queue reload members") before any caller enters a queue, there is currently no way around this.

This all was using PJSIP + Asterisk 18.4. To try to replicate this, I also tested on an Asterisk 18.3 system using SIP.

If I add devices directly as queue members, this issue does not seem to come up. However, if I use hints for device state, things start to become less reliable there:

[Aug 12 20:09:40]     -- Executing [223@local:1] NoOp("IAX2/14159", "NOT_INUSE") in new stack
PBX*CLI> core show hints

   -= Registered Asterisk Dial Plan Hints =-
231@test-hints      : SIP/ATAxOffice1       State:Unavailable     Presence:not_set         Watchers  1

So, again DEVICE_STATE = NOT_INUSE, but hint state is Unavailable.

This is the hint: exten => 2127,hint,PJSIP/ATAxLA2

So perhaps this isn't an issue with queues, really, but an issue with the hints that app_queue is using not aligning with the actual device state. That's what it's starting to look like to me.

2127@hints-cen: PJSIP/ATAxLA2         State:Idle            Presence:not_set         Watchers  6

test has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 12s
  Members:
     PJSIP/ATAxLA2 (ringinuse disabled) (Not in use) has taken no calls yet
     231 (Local/115@phreaknet-queue-ring from hint:2127@hints-all) with penalty 5 (ringinuse disabled) (Unavailable) has taken no calls yet
  No Callers

Okay, so now let's add a member to the queue and do "queue reload members":

test has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 12s
  Members:
     PJSIP/ATAxLA2 (ringinuse disabled) (Not in use) has taken no calls yet
     231 (Local/116@queue-ring from PJSIP/ATAxLA2) with penalty 5 (ringinuse disabled) (Not in use) has taken no calls yet
     231 (Local/115@queue-ring from hint:2127@hints-all) with penalty 5 (ringinuse disabled) (Not in use) has taken no calls yet
  No Callers

Aha! So you see, when app_queue reloads members, the stale hints are refreshed. In this case, I've added a local channel but with device state to a device rather than a hint, so we can now compare all 3 cases. Many hours later:

2127@hints-cen: PJSIP/ATAxLA2         State:Idle            Presence:not_set         Watchers  6

test has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 12s
  Members:
     PJSIP/ATAxLA2 (ringinuse disabled) (Not in use) has taken no calls yet
     231 (Local/116@queue-ring from PJSIP/ATAxLA2) with penalty 5 (ringinuse disabled) (Unavailable) has taken no calls yet
     231 (Local/115@queue-ring from hint:2127@hints-all) with penalty 5 (ringinuse disabled) (Unavailable) has taken no calls yet
  No Callers

The device is currently available. I have added the same endpoint 3 different ways. app_queue things one of them is available (specified with PJSIP as the direct tech), and the other two, which are local channels using a custom hint or device state, are unavailable.

So, it doesn't matter whether a device is specified directly or a hint extension is used. Both suffer from the same problem. All three queue members above correspond to literally the same endpoint. The end device state should be equivalent, but in fact, the device state is only accurate if a real channel type is used for the queue member. If a hint or device is specified, it becomes stale and inaccurate.

So as can be seen, this doesn't affect all types of queue members and it isn't a bug with hints themselves. Rather, it seems to be an issue specifically with the way that app_queue figures out device state from hints. Initially, it's right, but over time, it becomes stale and basically screws up everything if that is how you are determining if queue members are available or not.

I tried this on another server using Asterisk 18.3 and chan_sip with a stock queues.conf config (except for the members):
[test-queue]
member => SIP/ATAxOffice1
member => Local/419@public,0,Conference,SIP/ATAxOffice1
member => Local/418@public,0,Conference,hint:231@test-hints

I do notice some of the same issues. For instance, this after a peer becomes unavailable:

PBX*CLI> queue show
msr has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  No Members
  No Callers

test-queue has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  Members:
     SIP/ATAxOffice1 (ringinuse enabled) (Not in use) has taken no calls yet
     Conference (Local/419@public from SIP/ATAxOffice1) (ringinuse enabled) (Not in use) has taken no calls yet
     Conference (Local/418@public from hint:231@test-hints) (ringinuse enabled) (Not in use) has taken no calls yet
  No Callers

[Aug 15 12:11:39] NOTICE[805]: chan_sip.c:30516 sip_poke_noanswer: Peer 'ATAxOffice1' is now UNREACHABLE!  Last qualify: 12
[Aug 15 12:11:39] NOTICE[805]: chan_sip.c:30516 sip_poke_noanswer: Peer 'ATAxOffice2' is now UNREACHABLE!  Last qualify: 10
PBX*CLI> queue show
msr has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  No Members
  No Callers

test-queue has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  Members:
     SIP/ATAxOffice1 (ringinuse enabled) (Unavailable) has taken no calls yet
     Conference (Local/419@public from SIP/ATAxOffice1) (ringinuse enabled) (Not in use) has taken no calls yet
     Conference (Local/418@public from hint:231@test-hints) (ringinuse enabled) (Unavailable) has taken no calls yet
  No Callers

PBX*CLI> queue show
msr has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  No Members
  No Callers

test-queue has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  Members:
     SIP/ATAxOffice1 (ringinuse enabled) (Unavailable) has taken no calls yet
     Conference (Local/419@public from SIP/ATAxOffice1) (ringinuse enabled) (Not in use) has taken no calls yet
     Conference (Local/418@public from hint:231@test-hints) (ringinuse enabled) (Unavailable) has taken no calls yet
  No Callers

All three of these agents have the same device state, but right after one becomes unavailable, 2 of them correctly show "Unavailable" while one of them is still "Not in use" (which is, actually, the opposite problem of the above). If I touch the file, then do "queue reload all", the middle one then correctly changes to "Unavailable". But watch what happens once the peers come back online:

test-queue has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  Members:
     SIP/ATAxOffice1 (ringinuse enabled) (Unavailable) has taken no calls yet
     Conference (Local/419@public from SIP/ATAxOffice1) (ringinuse enabled) (Unavailable) has taken no calls yet
     Conference (Local/418@public from hint:231@test-hints) (ringinuse enabled) (Unavailable) has taken no calls yet
  No Callers

[Aug 15 12:17:40] NOTICE[805]: chan_sip.c:24984 handle_response_peerpoke: Peer 'ATAxOffice1' is now Reachable. (12ms / 2000ms)
[Aug 15 12:17:40] NOTICE[805]: chan_sip.c:24984 handle_response_peerpoke: Peer 'ATAxOffice2' is now Reachable. (11ms / 2000ms)
PBX*CLI> queue show
msr has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  No Members
  No Callers

test-queue has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  Members:
     SIP/ATAxOffice1 (ringinuse enabled) (Not in use) has taken no calls yet
     Conference (Local/419@public from SIP/ATAxOffice1) (ringinuse enabled) (Unavailable) has taken no calls yet
     Conference (Local/418@public from hint:231@test-hints) (ringinuse enabled) (Not in use) has taken no calls yet
  No Callers

So on this particular Asterisk, it seems that the middle agent is consistently getting "stuck" in a stale state. Channel driver does not seem to matter - all that it takes is specifying custom device states. Hence, it seems there are general problems with app_queue itself with using custom device states/hints for availability.

This kind of thing now happened dozens of times in the past month. Since I need to use local channels for my queue members, this effectively makes app_queue completely unusable for me.
Comments:By: Asterisk Team (asteriskteam) 2021-08-15 11:27:01.851-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: Joshua C. Colp (jcolp) 2021-08-15 12:18:21.015-0500

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

Additionally this issue is extremely confusing because you're mixing output of different hints amongst others hints and extensions. Please simplify this down to a description with a single member and a single hint showing it, alongside the complete configuration not just bits and pieces.

By: N A (InterLinked) 2021-08-15 14:57:34.692-0500

All right, here is the debug.

The hints context is simply this:
[test-hints]
exten => _23[1-8],hint,SIP/ATAxOffice${EXTEN:-1}


By: Joshua C. Colp (jcolp) 2021-08-15 16:58:13.423-0500

And what exactly is the debug showing? What happened, what were you expecting that didn't occur within the specific debug given.

By: N A (InterLinked) 2021-08-15 17:39:49.312-0500

I don't know if I was expecting anything in particular which the debug would show. The CLI trace makes it very clear what's going wrong, though that might not translate into things a debug would show. I don't have a good enough understanding of app_queue to narrow it down any further than that.

However, it's what is *NOT* in the debug that is noteworthy. There are 3 queue members. We only see app_queue process events for two of them. Somewhere, there must be logic that leads to eventual consistency, but there seems to be no immediate updates in many cases, this being one of them. I have to manually force an update to purge the stale state and make it pick up the current state, for that queue member.

Expected:
Queue availability follows the device availability

What happened:
Queue member #2 is stale, and is only updated when queues.conf is updated (e.g. via touch, to force app_queue to look at it again) and "queue reload members" is run.

By: Joshua C. Colp (jcolp) 2021-08-16 04:38:24.438-0500

You need to be specific in your description of the debug. I'm not you, and without that information I have to deduce and figure it out.

For example: "Queue member PJSIP/bob should be available but it's not."

That tell tells me what device I need to look for and follow. Without that I don't know.

And if app_queue isn't seeing that and outputting a message, then it's likely something in the PBX core instead.

By: N A (InterLinked) 2021-08-16 06:53:43.545-0500

There are two stages. This is for an ATA that I reboot, to cause it to go offline then online.

In stage 1 of things that are wrong, Local/419@public is "NOT IN USE" when it should be "UNAVAILABLE". This is while the ATA is offline.

Once the ATA comes back online, "Local/419@public" should be "NOT IN USE" when it's stuck at "UNAVAILABLE" still. In both cases, forcing a reload using the touch/reload method causes it to get back on track, but app_queue fails to do this automatically.

There are no issues with hints or device state in general, just when it intersects with app_queue.

By: Joshua C. Colp (jcolp) 2021-08-16 07:07:29.507-0500

So in the given debug I see chan_sip peer "ATAxOffice1" go unreachable which then results in:

{noformat}
[Aug 15 13:55:29] DEBUG[56889] app_queue.c: Removed SIP/ATAxOffice1 from pending_members
[Aug 15 13:55:29] DEBUG[56889] app_queue.c: Device 'SIP/ATAxOffice1' changed to state '5' (Unavailable)
{noformat}

Later when it becomes reachable:

{noformat}
[Aug 15 13:57:59] DEBUG[56889] app_queue.c: Removed SIP/ATAxOffice1 from pending_members
[Aug 15 13:57:59] DEBUG[56889] app_queue.c: Device 'SIP/ATAxOffice1' changed to state '1' (Not in use)
{noformat}

So you're saying that despite this debug output, when you do "queue show" it says otherwise?

You also mentioned "There are 3 queue members. We only see app_queue process events for two of them.". Are you referring to (from your initial description) the same device state being referenced by two members at once? Looking at the code[1] I think it skips out when the first one is found. I don't think anyone envisioned the same device state being referenced by different members in the same queue at once.

[1] https://github.com/asterisk/asterisk/blob/master/apps/app_queue.c#L2558

By: N A (InterLinked) 2021-08-16 07:13:34.817-0500

That same device is part of the queue 3 different times (yes, same device state, all 3 queue members).
For 2 of the members, it works properly. It's the third way it's added that demonstrates the issue here.

It's more than just duplicate members in a queue, because on my production system, I have the same queue member in many queues and and a few other duplicates with the same device state, and the *only* queue members that work properly is, in fact, just 1 queue member - which directly references a device as the member, rather than a a local channel + custom device state/hint:

test has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 12s
  Members:
     PJSIP/ATAxLA2 (ringinuse disabled) (Not in use) has taken no calls yet
     231 (Local/116@queue-ring from PJSIP/ATAxLA2) with penalty 5 (ringinuse disabled) (Unavailable) has taken no calls yet
     231 (Local/115@queue-ring from hint:2127@hints-all) with penalty 5 (ringinuse disabled) (Unavailable) has taken no calls yet
  No Callers

I've added what appear to be 3 different members, and PJSIP/ATAxLA2 is currently available, so the first member's state is correct and the last two are, again, stale.

By: Joshua C. Colp (jcolp) 2021-08-16 07:16:39.845-0500

A "hint" is not considered the same referenced device state. It is separate, the queue has no idea what devices are behind the hint or what is feeding it the device state. In the given output you only have 2 that reference the same device state.

I would suspect it is still the code I've provided. I'll accept this issue, there is no time frame on if or when it would get looked into.

By: N A (InterLinked) 2021-08-16 07:21:28.845-0500

2 directly, yes, but the hint extension points to the same device (literally), so it's still 3 identical device states indirectly.

On the small demo I set up, it seems that only that duplicate is getting stuck, while on another system, anything that isn't a direct device as the member is getting stuck in a stale state. The underlying issues seems to be the same with both, because forcing app_queue to reload all members results in the correct states across all queue members.

By: N A (InterLinked) 2021-08-17 07:49:26.632-0500

I just tried ignoring that part of the code you referenced and the same thing continues to occur:

[Aug 17 08:47:23] NOTICE[82688]: chan_sip.c:30516 sip_poke_noanswer: Peer 'ATAxOffice1' is now UNREACHABLE!  Last qualify: 10
[Aug 17 08:47:23] NOTICE[82688]: chan_sip.c:30516 sip_poke_noanswer: Peer 'ATAxOffice2' is now UNREACHABLE!  Last qualify: 10
PBX*CLI> queue show
msr has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  No Members
  No Callers

test-queue has 0 calls (max unlimited) in 'ringall' strategy (0s holdtime, 0s talktime), W:0, C:0, A:0, SL:0.0%, SL2:0.0% within 0s
  Members:
     SIP/ATAxOffice1 (ringinuse enabled) (Unavailable) has taken no calls yet
     Conference (Local/419@public from SIP/ATAxOffice1) (ringinuse enabled) (Not in use) has taken no calls yet
     Conference (Local/418@public from hint:231@test-hints) (ringinuse enabled) (Unavailable) has taken no calls yet

/* check every member until we find one NOT_INUSE */
                       if (!avail) {
                               avail = is_member_available(q, m);
                       }
                       if (avail && found_member && 0) {
                               /* early exit as we've found an available member and the member of interest */
                               ao2_ref(m, -1);
                               break;
                       }


By: N A (InterLinked) 2021-09-05 12:17:28.916-0500

Here is a debug from a production server where this problem is rampant and always occurs:

[2021-09-05 17:12:02] DEBUG[24413]: res_pjsip/pjsip_options.c:623 sip_options_update_endpoint_state_compositor_aor: Endpoint state compositor 'ATAxLA2' is online as AOR 'ATAxLA2' is available
[2021-09-05 17:12:02]   == Endpoint ATAxLA2 is now Reachable
[2021-09-05 17:12:02]     -- Contact ATAxLA2/sip:ATAxLA2@REDACTED:5063;transport=TLS;x-ast-orig-host=192.168.10.103:5063 is now Reachable.  RTT: 15.803 msec
[2021-09-05 17:12:02] DEBUG[24413]: res_pjsip/pjsip_options.c:777 sip_options_contact_status_notify_task: AOR 'ATAxLA2' now has 1 available contacts
[2021-09-05 17:12:02] DEBUG[21797]: devicestate.c:361 _ast_device_state: No provider found, checking channel drivers for PJSIP - ATAxLA2
[2021-09-05 17:12:02] DEBUG[21797]: devicestate.c:466 do_state_change: Changing state for PJSIP/ATAxLA2 - state 1 (Not in use)
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21864]: app_queue.c:2589 device_state_cb: Device 'PJSIP/ATAxLA2' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.
[2021-09-05 17:12:02] DEBUG[21828]: manager.c:6200 match_filter: Examining AMI event:
Event: DeviceStateChange
Privilege: call,all
Timestamp: 1630861922.899554
SystemName: REDACTED
Device: PJSIP/ATAxLA2
State: NOT_INUSE

So, app_queue is indeed getting the changes (one for each time it's in a different queue) In this case, I pulled the power on an ATA, waited a minute, and then plugged it back in. This is what we see as it comes back online.

app_queue is improperly ignoring some device state changes when it shouldn't. I think the issue may be that the device itself isn't a member, so app_queue thinks it should be ignored, but in fact, the device state is a custom hint for a device that is in the queue. If it is doing this already, the logic is faulty.

By: Joshua C. Colp (jcolp) 2021-09-05 12:20:14.402-0500

It should be ignored, because it is listening for updates for the custom hint - not the device. The PBX core takes the device updates, and creates an extension state update, which then app_queue is supposed to receive and act on.


By: N A (InterLinked) 2021-09-05 12:28:06.200-0500

So are you saying that app_queue is supposed to receive a state change trigger for the Local channel directly, as opposed to the actual device?

Because these are the only app_queue.c debugs that I see during the experiment. app_queue never hears about the local channel which is the actual member, only the raw devices that are being used for state for that member.

PJSIP/ATAxLA2 is never specified in my queues.conf anywhere. The relationship is that a Local channel agent uses a hint: that returns PJSIP/ATAxLA2 in the dialplan, in [hints-all].

It seems either app_queue is supposed to be hearing about Local channels and isn't, or it's supposed to be doing something with the actual device and isn't.

By: Joshua C. Colp (jcolp) 2021-09-05 12:33:50.469-0500

I'm not actively pursuing this issue or digging into it. A hint is treated differently in the code, and uses extension state instead[1]. The app_queue module does NOT directly receive updates for anything (devices) that the hint may comprise of. It receives extension state updates[2].

[1] https://github.com/asterisk/asterisk/blob/master/apps/app_queue.c#L2718
[2] https://github.com/asterisk/asterisk/blob/master/apps/app_queue.c#L2633

By: N A (InterLinked) 2021-09-05 12:34:31.553-0500

Actually, let me back up a second.

This is the queue member:
[2021-09-05 17:12:02] DEBUG[21800]: app_queue.c:2670 extension_state_cb: Extension '4002702127@hints-centrex-1' changed to state '1' (Not in use) but we don't care because they're not a member of any queue.

The only difference is that in queues.conf, I define the hint as 4002702127@hints-all.

In the dialplan, [hints-all] includes [hints-centrex-1].

It's possible that that app_queue isn't anticipating included hint contexts and it should be checking if the queue member would match any included contexts, as it's being too literal right now with the name.

I am 99% sure this line is to blame: https://github.com/asterisk/asterisk/blob/master/apps/app_queue.c#L2652

It's doing literal string comparisons, which isn't good enough.

By: Joshua C. Colp (jcolp) 2021-09-05 12:36:38.715-0500

Yes, that is entirely possible. As well your provided example did not specify an include.

By: N A (InterLinked) 2021-09-05 12:39:23.813-0500

Thanks, Josh, at least now I think I finally understand what the bug is and why it's happening.

If nobody at Sangoma is working on this already, you can assign the issue to me. May not happen immediately but this is a serious issue so I will have to look into this if nobody else is.

By: Friendly Automation (friendly-automation) 2021-09-21 14:30:48.312-0500

Change 16527 merged by Friendly Automation:
app_queue: Fix hint updates for included contexts

[https://gerrit.asterisk.org/c/asterisk/+/16527|https://gerrit.asterisk.org/c/asterisk/+/16527]

By: Friendly Automation (friendly-automation) 2021-09-21 14:32:47.197-0500

Change 16528 merged by Friendly Automation:
app_queue: Fix hint updates for included contexts

[https://gerrit.asterisk.org/c/asterisk/+/16528|https://gerrit.asterisk.org/c/asterisk/+/16528]

By: Friendly Automation (friendly-automation) 2021-09-21 17:22:18.050-0500

Change 16529 merged by Kevin Harwell:
app_queue: Fix hint updates for included contexts

[https://gerrit.asterisk.org/c/asterisk/+/16529|https://gerrit.asterisk.org/c/asterisk/+/16529]

By: Friendly Automation (friendly-automation) 2021-09-21 17:22:42.631-0500

Change 16411 merged by Kevin Harwell:
app_queue: Fix hint updates for included contexts

[https://gerrit.asterisk.org/c/asterisk/+/16411|https://gerrit.asterisk.org/c/asterisk/+/16411]

By: Steve Davies (one47) 2021-12-15 06:19:27.109-0600

Could this issue be reopened - There is a bug in the patch that was applied. I have a patch to submit, which:

1) Does not leak `extension_state_cb` states across queues - In the current patch, the `found` variable is set, and then remains set as the update progresses to another queue, causing the 'found' state to be applied to the first queue member of every subsequent queue checked.
2) Unlike `device_state_cb`, where device will definitely be unique in  a queue, extension states might be specified multiple times, so breaking out when a member matches is inappropriate.


By: Joshua C. Colp (jcolp) 2021-12-15 06:22:29.989-0600

[~one47] A new issue should be created.