[Home]

Summary:ASTERISK-29020: Regression: Fix for app_queue: track masquerades in app_queue to avoid leaked stasis subscriptions broke queue log
Reporter:Grzegorz Garlewicz (garlew)Labels:
Date Opened:2020-08-06 10:08:44Date Closed:
Priority:MajorRegression?Yes
Status:Open/NewComponents:Applications/app_queue
Versions:16.15.0 Frequency of
Occurrence
Constant
Related
Issues:
is caused byASTERISK-25844 app_queue: Ghost channels in "core show channels" output
Environment:Attachments:
Description:{code}handle_masquerade{code}'s
{code}ast_string_field_set(queue_data, caller_uniqueid, new_channel_id);{code}

makes all subsequent entries in the queue_log use the new uniqueid which is not in any way associated to the original one.
This way you can no longer rely on the log entries.


{code}
1596621010|1596620945.17555|Consultant|NONE|ENTERQUEUE||209899255|1
1596621016|1596620945.17555|Consultant|Sip/3873|CONNECT|6|1596621010.17570|5
1596621065|1596621064.17580|Specialist|NONE|ENTERQUEUE||3873|2
1596621066|1596621064.17580|Consultant|Sip/3873|ATTENDEDTRANSFER|APP|Queue|6|50|1
{code}

the last entry used to have original uniqueid
Comments:By: Asterisk Team (asteriskteam) 2020-08-06 10:08:45.831-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Joshua C. Colp (jcolp) 2020-08-06 10:12:54.918-0500

Can you provide an example queue_log showing this in action?

By: Grzegorz Garlewicz (garlew) 2020-08-06 10:24:54.131-0500

Sure I do (will follow) , but it's kind of obvious - as {code}caller_uniqueid{code} is used in queue_log event emission, once it's changed, next events will have a different value.


By: Grzegorz Garlewicz (garlew) 2020-08-06 10:28:51.268-0500

{code}
1596621010|1596620945.17555|Consultant|NONE|ENTERQUEUE||209899255|1
1596621016|1596620945.17555|Consultant|Sip/3873|CONNECT|6|1596621010.17570|5
1596621065|1596621064.17580|Specialist|NONE|ENTERQUEUE||3873|2
1596621066|1596621064.17580|Consultant|Sip/3873|ATTENDEDTRANSFER|APP|Queue|6|50|1
{code}

the last entry used to have original uniqueid

By: Grzegorz Garlewicz (garlew) 2020-08-07 05:58:42.821-0500

Would it be enough?

By: Kevin Harwell (kharwell) 2020-08-07 15:51:58.158-0500

I'm not seeing this behavior. Could you please attach any relevant configuration information (_queues.conf_, _sip.conf_, _extensions.conf_, etc...) that might help us replicate the issue. As well please describe the steps involved in the call.

Also please attach an Asterisk debug log that includes the SIP trace [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Grzegorz Garlewicz (garlew) 2020-08-08 11:18:32.963-0500

I will try to gather some more data for this particular scenario. But just from looking at the new code only: in every scenario in which there is a queue log entry written after a masquerade, this queue log entry is going to have a different uniqueid than before.

By: Kevin Harwell (kharwell) 2020-08-10 14:08:59.176-0500

{quote}
But just from looking at the new code only: in every scenario in which there is a queue log entry written after a masquerade, this queue log entry is going to have a different uniqueid than before.
{quote}
I don't think this is necessarily always the case. If the stasis handler in question is even called the channel blob's {{uniqueid}} must match either the queue_data's {{caller_uniqueid}} or {{member_uniqueid}} before the value is overwritten.

As well the handler is only called on a channel that is "setup" to listen for the {{ast_channel_masquerade_type()}}. For example, in a scenario I ran where a caller enters a queue, the agent is dialed, and then the agent answers, the agent here is "setup" to listen for the event. That means in that scenario if the agent transfers the caller someplace the handler is _not_ called because the caller is not "setup" to listen for it. However the caller can potentially transfer the agent. If that happens the handler _may_ be called. I say _may_ because when I ran that scenario sometimes it was called, and sometimes it wasn't. Even when it was the queue_log value did not change as neither of the handler's checks passed. So it's important to understand the scenario where the conditions to overwrite are triggered.

Also, given the above I believe there might be a race condition where sometimes the stasis subscription is removed on the associated channel before that event gets triggered or completes propogation. I will say that's somewhat of a separate issue, but could explain if in your scenario you see it overwritten sometimes, and sometimes not.


By: Grzegorz Garlewicz (garlew) 2020-08-14 07:56:45.558-0500

It seems to be a race condition. Most of the time I'm getting the values the old way but if I manage to do the transfer just when the queue starts (I think so) then I'm getting them the new way.

{code}
1597409056|1597409054.52|test|NONE|ENTERQUEUE||916547207|1
1597409058|1597409054.52|test|Sip/103|CONNECT|2|1597409056.53|2
1597409068|1597409067.54|test2|NONE|ENTERQUEUE||103|1
1597409068|1597409054.52|test|Sip/103|ATTENDEDTRANSFER|APP|Queue|2|10|1
1597409077|1597409054.52|test2|Sip/101|CONNECT|9|1597409075.56|1
1597409078|1597409054.52|test2|Sip/101|COMPLETEAGENT|9|1|1
{code}

{code}
1597409085|1597409084.57|test|NONE|ENTERQUEUE||468126461|1
1597409089|1597409084.57|test|Sip/103|CONNECT|4|1597409085.58|3
1597409099|1597409098.59|test2|NONE|ENTERQUEUE||103|1
1597409099|1597409098.59|test|Sip/103|ATTENDEDTRANSFER|APP|Queue|4|10|1
1597409102|1597409084.57|test2|Sip/101|CONNECT|3|1597409099.60|3
1597409103|1597409084.57|test2|Sip/101|COMPLETEAGENT|3|1|1
{code}



The dialplan is:
{code}
exten => 666,1,NoOp
same => n,Playback(welcome)
same => n,Queue(test)

exten => 777,1,NoOp
same => n,Playback(welcome)
same => n,Queue(test2)
{code}

I call 666. Agent picks up and initiates an attended transfer to 777. After the welcome message, agent completes the transfer. Another agent picks up and after the call hangs up.

By: Grzegorz Garlewicz (garlew) 2020-08-14 09:12:25.137-0500

{code}
1597413714|1597413569.6|test2|Sip/101|CONNECT|34|1597413712.19|1
1597413724|1597413723.20|test2|NONE|ENTERQUEUE||101|1
1597413724|1597413723.20|test2|Sip/101|ATTENDEDTRANSFER|APP|Queue|34|10|1
{code}

{code}
[2020-08-14 16:02:04] DEBUG[20847] app_queue.c: Replacing caller channel 1597413569.6 with 1597413723.20 due to masquerade
...
[2020-08-14 16:02:04] DEBUG[20944] app_queue.c: Detected attended transfer in queue test2
{code}


for the  cases that were logged in the old way, there was no 'Replacing caller channel' line

By: Kevin Harwell (kharwell) 2020-08-14 14:10:29.950-0500

Opening this as there for sure seems to be a race condition.

Also, it's clear for the code that for some cases the value _is_ modified. It needs to be determined if, and why it would be valid to do so. And if valid then it should probably be documented somewhere (the wiki for instance).

By: Grzegorz Garlewicz (garlew) 2020-08-18 10:31:59.823-0500

This is my workaround until the case is resolved. It's probably inadequate as I did not check the events emitted by AMI for proper ids.

Edit: Removed inline patch

Unfortunately we cannot accept patches in comments. It can however be attached as a file after accepting the license agreement [1]. Or even better submitted to gerrit for inclusion [2].

[1] https://issues.asterisk.org/jira/secure/DigiumLicense.jspa
[2] https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process