[Home]

Summary:ASTERISK-16340: [patch] After a blind transfer by the calling party the transferees peer cannot be dialed again within the same call
Reporter:Ramon Peek-Fares (ramonpeek)Labels:
Date Opened:2010-07-09 05:38:42Date Closed:2012-10-03 11:16:52
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 0001-Erase-the-already-dialed-interface-channel-datastore.patch
( 1) app_dial_allow_blind_xfer_back_to_originally_called_extension.patch
( 2) debug_log.tar.gz
( 3) fixRetransfering.patch
( 4) fixRetransfering1.6.2.15.patch
Description:After a blind transfer by the calling party the transferees peer cannot be dialed again within the same call. This ONLY occurs when dialing through a Local channel.

Asterisk will show this warning on the CLI>

[Jul  9 12:18:37] WARNING[27865]: app_dial.c:1296 dial_exec_full: Skipping dialing interface 'SIP/401' again since it has already been dialed


NOTE: See steps to reproduce (in advanced view)

****** STEPS TO REPRODUCE ******

I use this small dialplan to test:
===================================
exten = 400,1,Dial(Local/400@local,15)

exten = 401,1,Dial(Local/401@local,15)

exten = 402,1,Dial(Local/402@local,15,g)
exten = 402,n,Dial(Local/401@local,15,g)
exten = 402,n,Dial(Local/404@local,15)

exten = 403,1,Dial(Local/403@local,15)

[local]
exten = _XXX,1,Dial(SIP/${EXTEN},14)


THEN DO THE FOLLOWING:
=======================
1) Call from peer 400 to 401 and answer the call at 401
2) On peer 400 blind transfer the call to peer 403 and answer the call at 403
3) On peer 401 blind transfer the call to peer 402, BUT DO NOT ANSWER and see what happens...

Due to a fallback programmed into the dialplan, the call to peer 402 will return to peer 401 (return on noanswer).
Asterisk however thinks the peer is already being dialed, but it really isn't.
So this result in peer 401 not being dialed and the next priority being executed, which is the dialing of peer 404.
Comments:By: Ramon Peek-Fares (ramonpeek) 2010-07-09 05:45:20

I've already done some extensive debugging in the code..
And although I'm not to familiar with this part of the code I think I've found the possible source of this problem:

It looks like the 'dialed_interfaces' are copied during the first blind transfer from channel SIP/401 to channel SIP/403 via the datastore inheritance. ( see steps to reproduce !)
This causes the channel of peer 403 to contain an entry for peer SIP/401 being dialed (from the first call). This entry still exists after the second transfer since the channel belonging to SIP/403 is not affected.

IMHO we should prevent the 'dialed_interfaces' to be copied in case of blind transfer by the calling party.

Offcourse I might be totally off here ;-)
As I said; I'm not too familiar with this part of the code.
So I could use some help...



By: Paul Belanger (pabelanger) 2010-07-09 06:34:06

We require a complete debug log to help triage the issue.

This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue:

http://svn.digium.com/svn/asterisk/trunk/doc/HOWTO_collect_debug_information.txt

By: David Woolley (davidw) 2010-07-09 06:39:12

In the mean time, can I "me too" this. We discovered the same problem, although either we found it was a duplicate, or investigating it enough for a report here has been too low a priority, up to now.  I'll try to find our (virtual) paperwork on it.

By: David Woolley (davidw) 2010-07-09 07:04:44

OK.  Looks like we deferred analyzing this in detail.  However, this is my initial internal analysis, which may give some clues, although it really needs more explanation of what we were doing (I'm not claiming this meets the bug reporting guidelines, yet):

Tester's Report (some application specifics replaced by lower level details).

Using AMI, I made an outbound call to Party A (6902).

Once the call was established I performed an AMI Redirect to Party B (6902 – using the same directory number as Party A).

Both Party A and Party B calls were terminated. The error show in Asterisk was...

NoOp('SIP/cisco1-08ef80e8', '') in new stack
-- Executing [6902@internal:52005] Dial('SIP/cisco1-08ef80e8', 'SIP/6902@cisco1,20') in new stack
[2009-07-09 12:06:48.356] WARNING[21619]: app_dial.c:1485 dial_exec_full: Skipping dialing interface 'SIP/6902@cisco1' again since it has already been dialed
== Spawn extension (internal, 6902, 52005) exited non-zero on 'SIP/cisco1-08ef80e8'

Developer's Analysis.

The message is referring to a mechanism that ensures that when you dial numbers in parallel, Asterisk will not try to set up more than one connection to a number, even if it is specified multiple times.

I haven't looked at the code, but the symptoms suggest that party A is being included in that list.

What is more worrying is that Dial operation appears to be returning a call complete indication! That means we lose control of the call. It appears to be deliberate, but isn't explained in the code. I think it may be intended to deal with cases where a local channel is in the list of channels to be tried. It may be trying to get the CHANUNAVAIL status back quickly.

I think this will really need running svn blame, to find out which edit put this in, then trying to understand why it is done, if we want to be sure that there is a bug, rather than an awkward new feature.

Specifically, this sets the completed return code:

if (!outgoing) {
strcpy(pa.status, 'CHANUNAVAIL');
if (fulldial == num_dialed) {
res = -1;
goto out;
}

fulldial is only incremented for the case of already dialling the interface and outgoing == 0 will be true, as it has rejected the only available outgoing interface.

It sets the datastore that stores the dialled interfaces as inheritable. I think that is done to ensure the local channel sees, it but is probably having the side effect of making it inherit from the call to party A, and more worryingly, it may be left over from a failed party B call. Can you call party B again if the timed out on the first attempt? It does look like it may forget failed interfaces, but maybe it doesn't forget successful ones until the call completes.

By: David Woolley (davidw) 2010-07-09 07:08:45

I think the exact scenarios may differ, but I suspect the underlying problem is the same.

By: Leif Madsen (lmadsen) 2010-07-09 13:08:10

Thanks for the information guys. I'm acknowledging this issue.

By: Ramon Peek-Fares (ramonpeek) 2010-07-12 02:19:27

Uploaded debug log as requested.
The steps used are described in "Steps to Reproduce"

By: Kai Hoerner (kaii) 2010-11-08 03:33:41.000-0600

Another way to reproduce, it hit me while testing several internal call scenarios:

simple dialplan:
exten => _XX,1(dial),Dial(SIP/${EXTEN})
exten => _XX,dial+101,Busy()

1. SIP/10 dials SIP/20:

   -- Executing [execute_call@macro-internanruf:7] Dial("SIP/10-00000018", "SIP/20") in new stack
   -- Called 20
   -- SIP/20-00000019 is ringing

2. SIP/20 then sends a 302 redirect ("sip blind transfer") and targets extension 30:

   -- Got SIP response 302 "Moved Temporarily" back from 10.0.0.149
   -- Now forwarding SIP/10-00000018 to 'Local/30@internal' (thanks to SIP/20-00000019)
   -- Executing [30@internal] Dial("Local/30@internal-1b8d,2", "SIP/30") in new stack
   -- Called 30
   -- SIP/30-00000020 is ringing
   -- Local/30@internal-1b8d,1 is ringing

3. SIP/30 now wants to redirect back to extension 20 via 302 redirect and hits the bug:

   -- Got SIP response 302 "Moved Temporarily" back from 10.0.0.150
   -- Now forwarding Local/30@internal-1b8d,2 to 'Local/20@internal' (thanks to SIP/30-00000020)
    -- Executing [20@internal] Dial("Local/20@internal-1bc4,2", "SIP/20") in new stack
[Nov  5 16:53:34] WARNING[27403] app_dial.c: Skipping dialing interface 'SIP/20' again since it has already been dialed

It seems, that in case of a SIP 302 "moved temporarily", the redirecting peer (here: SIP/20) is (incorrectly) not removed from the list of dialed interfaces.
after the 302 redirect the peer SIP/20 is "not_inuse" and should be able to receive this call again.

i quickly worked around this in my setup by disabling the check if a peer is in the list of dialed_interfaces. i know this is not a general solution.



By: cmaj (cmaj) 2010-11-11 16:28:14.000-0600

Problem seen on 1.6.0.13 and 1.6.0.28

Nasty patch uploaded to allow blind transfers back to the originally called extension, but leaves dangling SIP dialogs.  These eventually timeout and die.  And considering the rarity of such a transfer scenario, maybe this is not much of a concern to most users.

(Probably kaii you have something similar to this already in place?)

By: Kai Hoerner (kaii) 2010-11-12 08:24:09.000-0600

correct, that is exactly what i did to avoid that kind of problem.
i did not commit a patch, because i'm sure this is not the correct solution for the issue.

the check if a peer is already in the list of "dialed_interfaces" is intentional and of good use to prevent redirection loops that can easily eat memory and CPU resources of no use:

SIP/10 (e.g. snom phone) has a forward to "20" which is executed as "Local/20@context" by app_dial, the new local channel then dials SIP/20.
SIP/20 (e.g. snom phone) has a forward to "10" which is executed as "Local/10@context" by app_dial, the new local channel then dials SIP/10.
SIP/10 (e.g. snom phone) has a forward to "20" which is executed as "Local/20@context" by app_dial, the new local channel then dials SIP/20.
[...]

Without the check of "dialed_interfaces" this would produce an endless (fastly growing) count of Local and SIP channels, resulting in growing memory, growing CPU usage, and growing garbage in the log files.

This is an architectural problem:
Asterisk can not know if a User Agent is going to forward the call (because it has been locally set up in the phone) and thus can not detect call forwarding loops in advance.
To prevent loops, it needs some magic which was intended to be the list of "dialed_interfaces". This works pretty well for the use case of call forwarding loops like i described in this note, but fails for the use case where a user does a blind transfer.
Unfortunately, SIP UAs (eg. snom phones) use "302 Moved Temporarily" both for blind transfer ("forward" on demand) and call forward ("forward" statically).

A possibly good workaround for this would be to remove an interface from the list of dialed_interfaces when a call_forward is received, but only if the call forward was executed manually by the user and not automatically by a static call forward.
how can we differentiate between "automatic" and "manual" call forward / call deflection?
In SIP, if a device returns directly "302 Moved Temporarily" after receiving the INVITE it is an automatic call forward. Otherwise, the phone would respond with "180 Ringing" or something else first and then "302 Moved Temporarily" possibly. When the phone sends "180 Ringing" that would mean the call was not automatically forwarded.

Caveats:
- I dont know if all UAs do it this way, maybe there are UAs that answer an INVITE with 180+302 even if a direct call forward is set up. snom directly responds an INVITE with 302, when local call forwarding is set up.
- This is a slightly bigger and more complicated change, thus i haven't tried to implement this yet.
- What about other channel types?

Please post your comments on this, as i would like to provide a patch to solve this issue the right way.



By: cmaj (cmaj) 2010-12-15 19:26:00.000-0600

I can sort-of reproduce this in 1.8.1 on Polycoms as follows:

1. Extension 701 is watching buddy 702.
2. Buddy 702 starts ringing and extension 701 hits the "Pickup" soft key.
3. Extension 701 hits "Transfer" hard key then "Blind" soft key then enters "702".
4. Dial() immediately fails with DIALSTATUS=INVALIDARGS.  The CLI WARNING is the same as OP, "Skipping dialing interface 'SIP/401' again since it has already been dialed."

So I worked around it by adding the 'g' option to Dial() then checking the DIALSTATUS variable set internally by app_dial.c:

Dial(SIP/1234,30,g);
if("${DIALSTATUS}"="INVALIDARGS") {
Voicemail(1234);
}


I know this is not a solution addressing the original bug directly, but it might help somebody coming at this "blind transfer failure to a twice-dialed extension problem" from a slightly different error scenario.

By: cmaj (cmaj) 2010-12-15 19:42:38.000-0600

I just built 1.8.2-rc2 and this bug no longer exists.  I can repeatedly blind transfer a caller between the same two extensions.  I can also pickup a buddy, park them, blind transfer, and any combination in between.  Asterisk 1.8.2 is shaping up to be a great release!

UPDATE: Other SIP deadlocks encountered after more repeated transfer scenarios.



By: oelewapperke (oelewapperke) 2011-01-13 05:32:29.000-0600

We are also hitting this bug. I have made a patch that erases the channel datastore upon starting the Dial application, which seems to me the correct course of action.

By: oelewapperke (oelewapperke) 2011-01-13 06:47:00.000-0600

It would be great if this patch could get some more testing. It does seem to work for us.

By: Kai Hoerner (kaii) 2011-01-13 06:48:50.000-0600

I'm quite sure (from reading only, not testing) that your patch does render the reason why we have dialed_interfaces list useless:

it does not avoid the forwarding loop i described in my notes above.

For which version of asterisk is your patch?



By: oelewapperke (oelewapperke) 2011-01-13 07:20:48.000-0600

It's for 1.6.2.15, but it should translate rather directly to 1.8.0

By: oelewapperke (oelewapperke) 2011-01-13 07:39:06.000-0600

In the end what you want to detect is an actual *loop*, the fact that a call passes the same phone twice is not, by itself, reason enough to drop a call.

How about we check if we've seen 10 redirects, and if we see 10 redirects (or some configurable number), we drop the call ?

Nobody should transfer the same call 10 times. Or we could take 100. Some number that doesn't eat too much memory but large enough to make it unlikely someone would do this themselves.

/me back to vim

By: oelewapperke (oelewapperke) 2011-01-13 11:04:43.000-0600

I've created a patch that deletes the dialed interfaces list upon 2 events :
1) normal channel answer
2) channel bridge

In both cases, the channel transitions to a talking state and we can (I hope) safely assume that if there were any loops, the channel would never have transitioned.

By: oelewapperke (oelewapperke) 2011-01-14 04:06:50.000-0600

I've tested the second patch, and it does allow re-transferring, and it does still prevent the forwarding loop from occuring, at least on snom phones.

By: Kai Hoerner (kaii) 2011-01-14 04:51:15.000-0600

From reading only, this appears to me to be the right solution for the problem.

But i have doubt it solves the problem in all scenarios.

The list of dialed interfaces has two purposes:
1. Avoid calling the same interface twice within the same call
(imagine huge dialplan logic involving local channels, call forwards from a database and stuff: with call waiting (=on) you could end up ringing the same device multiple times within the same call)
2. Avoid the forwarding loop which goes round and round when two extensions setup a static forward (SIP 302) in the device / SIP UA directly.
(as described above)

We have a few different scenarios to keep in mind, hopefully i didn't oversee something:

1. user answers (directly or via pickup) call and then transfers the caller either attended or unattended
(should clear the list of dialed interfaces, fixed with your patch)
2. user has a static forward set up as described above
(i.e. via call forward menu, executed as SIP 302 on snom)
(here the list of dialed interfaces is important to avoid endless looping)
3. user does NOT answer but forwards the call manually
(i.e. via transfer button, executed as SIP 302 on snom)

Note that scenario 2 and 3 are not (easily) distinguishable from each other.
In scenario 2 we really want to avoid looping.
In scenario 3 this is not optimal, imagine two users in the same room.
A forwards to B, then says "please send the call back to me".
(-> call drops)

I assume with your patch, the scenario 3 will not work, so a user that manually forwarded a call cannot receive it again.

Locally, we have a patch in place (for testing) that removes a specific interface from the list of dialed interfaces when two conditions are true:
1. interface is already in ringing state -> shows that device doesnt directly forward and could receive the call.
2. interface has sent a call forward request.

This works pretty well here, all above scenarios work fine.
Pity is, the patch is against a 1.4.30 which already has several custom modifications applied and i didn't have time yet, to write and test this again with a vanilla 1.4.33.

Also look at cmaj's comment above. He says all of these scenarios work fine with 1.8.2-rc. Again, didn't find time yet to confirm that it works well in 1.8.2.



By: Kai Hoerner (kaii) 2011-05-10 04:13:49

UPDATE: did some re-tests of the scenario i described above with 1.8.4-rc3.

when manually forwarding the call (without answering it) to another extension via SIP "302 Moved Temporarily", the forwarder is still not removed from the list and the call is being hung up when trying to forward it back the initial forwarder:

use case:
SIP/10 calls SIP/20
 ->> SIP/20 rings
SIP/20 manually forwards call to SIP/30
 ->> SIP/20 is now free, SIP/30 rings
SIP/30 manually forwards back to SIP/20
 ->> call is being hung up "Skipping dialing interface 'SIP/20' again"

will attach patch for 1.8.4-rc3 which removes the forwarder from the list of dialed interfaces.

By: Mark Michelson (mmichelson) 2012-10-03 11:16:52.957-0500

*dusts the cobwebs off this issue*

Well, it's been a while, but I believe that this issue has been resolved independently.

In the 1.8 branch, revisions  315644, 319529, and 328663 should resolve this issue. I have conducted my own tests and have found that the transfer scenarios provided by the original reporter no longer occur in version 1.8 of Asterisk. If you are still on Asterisk 1.4 or 1.6.2 you should also find this issue fixed as well, since the 1.8 revisions I listed were merged from those branches.

The final two revisions I listed were merged in May and July of 2011, after the final comments on this issue were made. They are similar to the patches uploaded here, but not quite the same. A similar patch has also been applied to app_queue. I am going to close this issue since I believe it to be fixed.