[Home]

Summary:ASTERISK-18975: Manager Redirect action on bridged channel pair causes intermittent hangup on second channel
Reporter:Ben Klang (bklang)Labels:
Date Opened:2011-12-06 17:24:46.000-0600Date Closed:2013-01-02 15:08:45.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/Channels
Versions:1.8.7.1 Frequency of
Occurrence
Frequent
Related
Issues:
is related toASTERISK-19948 Asterisk 1.8 manager redirect command fails when redirecting multiple channels currently bridged together via dial command.
Environment:Attachments:( 0) broken.log
( 1) working.log
Description:We have an application where two channels are first bridged and then split back out, with the option of re-bridging.  When the split occurs, we use an AMI Redirect action with both channels (Channel and ExtraChannel) filled out so both legs go somewhere in the dialplan.  When this happens, approximately 50% of the time, the ExtraChannel will be hung up.

Kevin Fleming was very helpful on IRC today working on the possible cause.  His last comment on the issue was:

kpfleming: so... theorizing here: if the Redirect action is creating a new PBX thread for the second channel to use after it has been pulled out of the bridge, and somehow that thread ends up on the second channel before the original thread has caused the masquerade to occur, things will get very messy

I have attached two DEBUG logs illustrating the issue.  In the first example, the app works as expected, where both channels are split and continue in the dialplan.  In the second example, the second channel (SIP/grant-00000019) *should* be masqueraded, but is instead hung up.

This feels like a race condition where, somehow, the AST_FLAG_ZOMBIE is getting set on the secondary channel, when it should not.
Comments:By: Ben Klang (bklang) 2011-12-07 11:24:36.754-0600

On further research, I found that adding a sleep(1) to main/pbx.c on line 8051 makes the bug consistently reproducible.  This sleep goes into ast_async_goto() just before the channel masquerade occurs.  My theory is that this sleep delays the masquerade so that the other thread has a chance to hang it up first, causing the masquerade to fail.  I have not yet identified the other thread.

By: Ben Klang (bklang) 2011-12-07 17:04:39.230-0600

I think I now understand the cause.  These two channels are originally connected by app_dial.  When the Redirect occurs, both calls are sent to new locations in the dialplan.  The race is caused by the cleanup behavior in app_dial.  Looking on line 2829 of apps/app_dial.c, we can see where the bridging of the two channels occurs.  That function call returns when the Redirect occurs.  Later, on line 2860, there is a check for the app_dial option OPT_CALLEE_GO_ON, which is not set.  The else side of that condition is a call to ast_hangup(peer), which is what ultimately kills our call.

I'm not entirely clear why the channel masquerade prevents the above call flow from happening, but I suspect that is the intent of the masquerade.  When the masquerade is delayed and happens after app_dial completes, then the peer is hung up.  If the masquerade beats app_dial to it, then the Redirect functions as expected.

However, I still don't know what the correct fix to the issue is.

By: Ben Klang (bklang) 2011-12-07 17:14:19.840-0600

And just to confirm the issue: commenting out the call to ast_hangup(peer) on line 2876 of apps/app_dial.c makes the Redirect work as expected, though obviously it's not a real solution.

By: Maciej Krajewski (jamicque) 2012-06-20 03:43:18.989-0500

It might be the same problem as mine - ASTERISK-19985

By: trol (trol) 2012-07-12 12:40:47.021-0500

I am also having this issue on 1.8.7.1. I was using this feature on 1.2 for years, without any problem. Redirect with extra channel is the only way to send a bridged call to a meet me room, as far as I know.
Any development or workaround?

Thanks

By: Matt Jordan (mjordan) 2012-08-13 09:25:39.862-0500

A potentially related issue was resolved in 1.8.15.0.  You may want to try reproducing the problem using that version (or later) to see if your issue still occurs.

If it does, please note in the comments here and I'll unlink the issue.

By: Jeremy Betts (freevoice) 2012-10-23 16:48:42.083-0500

This issue still exists in 1.8.15.0.

By: Richard Mudgett (rmudgett) 2012-12-12 14:26:01.623-0600

For those waiting for a fix.  A patch is available on reviewboard:
https://reviewboard.asterisk.org/r/2243/