Asterisk
  1. Asterisk
  2. ASTERISK-13626

Queue members associated with ZOMBIE channels (breaks wrapuptime and queue_log).

    Details

    • Type: Bug Bug
    • Status: Closed
    • Severity: Minor Minor
    • Resolution: Suspended
    • Affects Version/s: None
    • Target Release Version/s: None
    • Component/s: Applications/app_queue
    • Labels:
      None
    • Mantis ID:
      14527
    • Regression:
      No

      Description

      In some circumstances, queue members become associated with a ZOMBIE channel during the answer process.

      When the ZOMBIE channel is hung up under normal clearing as part of the answer process, an AgentComplete event is generated, and the wrapuptime counter is set, even though the agent is still on a call. The queue_log entry is also written at this time with just 1 second of talk time.

      No further AgentComplete event is generated when the call comes to an end. The wrapuptime is then counted from the start of the call, not the end of the call.

                • ADDITIONAL INFORMATION ******

      This seems to happen whenever multiple levels of Local channels and/or bridging are used to connect agent and caller. Here's an example dialplan that will reproduce this error.

      ----------------------------------
      [main]
      exten => 1,1,Answer
      exten => 1,2,Dial(Local/plainae@test)

      exten => 12345,1,Dial(SIP/12345)

      [test]
      exten => plainae,1,Queue(PlainAE)

      ----------------------------------
      We have one agent as a member of the queue:

      queue add member Local/12345@main to PlainAE

      When a call is placed from SIP/12344 to extension 1, a local channel dials the queue. The queue dials a second local channel to reach the agent, listening on SIP/12345. When the agent answers, both local channels are masqueraded out.

      When the agent is called, an AgentCalled event is generated:

      Event: AgentCalled
      Privilege: agent,all
      Queue: PlainAE
      AgentCalled: Local/12345@main
      AgentName: Local/12345@main
      ChannelCalling: Local/plainae@test-c6d0;2
      DestinationChannel: Local/12345@main-cc4e;1

      This is as it should be. When the call is answered, an AgentConnect event is generated:

      Event: AgentConnect
      Privilege: agent,all
      Queue: PlainAE
      Uniqueid: 1235323799.8
      Channel: Local/12345@main-cc4e;1
      Member: Local/12345@main
      MemberName: Local/12345@main

      This is also as it should be.

      As the local channels are masqueraded out, a complex sequence of Bridge, Masquerade and Rename events are generated (see attached log).

      Having had a look at the Manager stream of events, it appears that the Rename events are being followed, but not the Masquerade or Bridge events. The agent should be left as associated with the SIP channel for the agent's phone (SIP/12345-0825c248 in this case). Instead, it's eventually associated with Local/plainae@test-c6d0;1<ZOMBIE>, which is one of the leftover channels from the bridge/masquerade sequence.

      So, when this channel is hung up as part of the normal clearing process, the following event is generated:

      Event: AgentComplete
      Privilege: agent,all
      Queue: PlainAE
      Uniqueid: 1235323799.8
      Channel: Local/plainae@test-c6d0;1<ZOMBIE>
      Member: Local/12345@main
      MemberName: Local/12345@main
      HoldTime: 4
      TalkTime: 1
      Reason: caller

      This is coming way too early - we shouldn't get one of these until either the agent or caller hangs up the phone. The wrapuptime counter is also set at this time (instead of the end of the call as it should be). The call is also written as complete in the queue_log, with just one second of talk time.

      Finally, when the call ends, we get two Hangup events:

      Event: Hangup
      Privilege: call,all
      Channel: SIP/12345-0825c248
      Uniqueid: 1235323799.7
      CallerIDNum: 12345
      CallerIDName: <unknown>
      Cause: 16
      Cause-txt: Normal Clearing

      Event: Hangup
      Privilege: call,all
      Channel: SIP/12344-08242b40
      Uniqueid: 1235323799.6
      CallerIDNum: 12344
      CallerIDName: 08450045413
      Cause: 16
      Cause-txt: Normal Clearing

      This is when the AgentComplete event should be fired off, however there is no such event because the queue system thinks the call has already ended.

      I have attached a full capture of the Manager events associated with this broken process.

      This happens with both the 1.6 and 1.4 release groups (I haven't checked 1.2).

      A quick fix would be to prevent the system from following Rename events where the Newname: field ends with <MASQ>. This would leave the queue member associated with the correct channel at the end of the bridging process.

      I did take a look at the app_queue code, but couldn't work out where to start with implementing this (sorry!)...

        Activity

        Hide
        Matt King, M.A. Oxon. added a comment -

        Go on fix it... You know you want to...

        Please!

        Show
        Matt King, M.A. Oxon. added a comment - Go on fix it... You know you want to... Please!
        Hide
        Atis Lezdins added a comment -

        You could always follow UniqueID as workaround. That works perfectly for me with similar scenarios.

        Show
        Atis Lezdins added a comment - You could always follow UniqueID as workaround. That works perfectly for me with similar scenarios.
        Hide
        Matt King, M.A. Oxon. added a comment -

        Thanks for the helpful suggestion, but we're not having trouble following the Manager stream.

        It's the AgentComplete, queue_log and wrapuptime functions that are broken...

        Show
        Matt King, M.A. Oxon. added a comment - Thanks for the helpful suggestion, but we're not having trouble following the Manager stream. It's the AgentComplete, queue_log and wrapuptime functions that are broken...
        Hide
        Matt King, M.A. Oxon. added a comment -

        This is also broken for just regular dials to queue when Local Channels are used - we get two channels hanging up on answer:

        *CLI> – SIP/1002-090f25d8 answered Local/2002@main2-8d35;2
        – Local/2002@main2-8d35;1 answered SIP/1001-090efcf8
        [Dec 21 14:23:20] NOTICE[3522]: app_queue.c:3491 try_calling: Delaying member connect for 2 seconds
        [Dec 21 14:23:21] NOTICE[3523]: app_dial.c:1963 dial_exec_full: PEER context: outbound; PEER exten: ; PEER priority: 1
        – Executing [h@macro-agent-dial:1] NoOp("Local/2002@main2-8d35;1<ZOMBIE>", "DialHangup Local/2002@main2-8d35;1<ZOMBIE> CALLINGAGENT: AGENT:Local/2002@main2 INTERFACE: DIALSTATUS: ANSWEREDTIME:") in new stack
        – Executing [h@macro-agent-dial:2] GotoIf("Local/2002@main2-8d35;1<ZOMBIE>", "6?h-ignore,1") in new stack
        – Goto (macro-agent-dial,h-ignore,1)
        == Spawn extension (macro-agent-dial, dial, 5) exited non-zero on 'Local/2002@main2-8d35;2' in macro 'agent-dial'
        == Spawn extension (macro-agent-dial, dial, 5) exited non-zero on 'Local/2002@main2-8d35;2'
        – Executing [h@macro-agent-dial:1] NoOp("Local/2002@main2-8d35;2", "DialHangup Local/2002@main2-8d35;2 CALLINGAGENT: AGENT:Local/2002@main2 INTERFACE: DIALSTATUS:ANSWER ANSWEREDTIME:1") in new stack
        – Executing [h@macro-agent-dial:2] GotoIf("Local/2002@main2-8d35;2", "0?h-ignore,1") in new stack
        – Executing [h@macro-agent-dial:3] GotoIf("Local/2002@main2-8d35;2", "1?h-ignore,1") in new stack
        – Goto (macro-agent-dial,h-ignore,1)
        – Stopped music on hold on SIP/1001-090efcf8

        The second of these is problematic, as it is indistinguishable from the agent 'double-picking' the call.

        This means we can't use the DEVICE_STATE function to reliably set the device state for the Local Channels, which in turn means they cannot be used as a replacement for Agent Channels (because agent's phones are rung even if they are in use) - so what are we supposed to do instead now that Agent Channels are gone?

        Show
        Matt King, M.A. Oxon. added a comment - This is also broken for just regular dials to queue when Local Channels are used - we get two channels hanging up on answer: *CLI> – SIP/1002-090f25d8 answered Local/2002@main2-8d35;2 – Local/2002@main2-8d35;1 answered SIP/1001-090efcf8 [Dec 21 14:23:20] NOTICE [3522] : app_queue.c:3491 try_calling: Delaying member connect for 2 seconds [Dec 21 14:23:21] NOTICE [3523] : app_dial.c:1963 dial_exec_full: PEER context: outbound; PEER exten: ; PEER priority: 1 – Executing [h@macro-agent-dial:1] NoOp("Local/2002@main2-8d35;1<ZOMBIE>", "DialHangup Local/2002@main2-8d35;1<ZOMBIE> CALLINGAGENT: AGENT:Local/2002@main2 INTERFACE: DIALSTATUS: ANSWEREDTIME:") in new stack – Executing [h@macro-agent-dial:2] GotoIf("Local/2002@main2-8d35;1<ZOMBIE>", "6?h-ignore,1") in new stack – Goto (macro-agent-dial,h-ignore,1) == Spawn extension (macro-agent-dial, dial, 5) exited non-zero on 'Local/2002@main2-8d35;2' in macro 'agent-dial' == Spawn extension (macro-agent-dial, dial, 5) exited non-zero on 'Local/2002@main2-8d35;2' – Executing [h@macro-agent-dial:1] NoOp("Local/2002@main2-8d35;2", "DialHangup Local/2002@main2-8d35;2 CALLINGAGENT: AGENT:Local/2002@main2 INTERFACE: DIALSTATUS:ANSWER ANSWEREDTIME:1") in new stack – Executing [h@macro-agent-dial:2] GotoIf("Local/2002@main2-8d35;2", "0?h-ignore,1") in new stack – Executing [h@macro-agent-dial:3] GotoIf("Local/2002@main2-8d35;2", "1?h-ignore,1") in new stack – Goto (macro-agent-dial,h-ignore,1) – Stopped music on hold on SIP/1001-090efcf8 The second of these is problematic, as it is indistinguishable from the agent 'double-picking' the call. This means we can't use the DEVICE_STATE function to reliably set the device state for the Local Channels, which in turn means they cannot be used as a replacement for Agent Channels (because agent's phones are rung even if they are in use) - so what are we supposed to do instead now that Agent Channels are gone?
        Hide
        Leif Madsen added a comment -

        Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.4 and 1.6.x branches has ended. For continued maintenance support please move to the 1.8 branch which is a long term support (LTS) branch. For more information about branch support, please see https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions

        Show
        Leif Madsen added a comment - Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.4 and 1.6.x branches has ended. For continued maintenance support please move to the 1.8 branch which is a long term support (LTS) branch. For more information about branch support, please see https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions

          People

          • Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development