[Home]

Summary:ASTERISK-18222: Pickupchan of a local channel segfaults if 2 users pickup at same time
Reporter:Alec Davis (alecdavis)Labels:
Date Opened:2011-08-03 03:28:34Date Closed:2011-08-31 10:20:48
Priority:BlockerRegression?
Status:Closed/CompleteComponents:Features
Versions:Frequency of
Occurrence
Related
Issues:
must be completed before resolvingASTERISK-18393 Asterisk 1.8.7.0 Blockers
is duplicated byASTERISK-18269 Calls getting stuck when dialing *8
is duplicated byASTERISK-18273 Orphaned channels after pickup
is related toASTERISK-18650 Asterisk hangs after failed directed call pickup attempt, logs show "Fixup failed on channel SIP/xxx, strange things may happen."
is related toASTERISK-18225 SIP channels are getting stuck after picking up calls
Environment:Attachments:( 0) bt-localpickup-trunk.txt
( 1) localpickup.diff.txt
( 2) localpickup-console.txt
( 3) localpickup-debug.diff.txt
Description:Using the following minimal dialplan

The crash senario
dial 801 from 1 phone.
from 2 phones simultaneously dial 800.
segfault!

{code}

exten => 801,1,NoOp(Local pickup debug: Ring Phones)
exten => 801,n,Dial(Local/823@en-phone&Local/824@en-phone)

exten => 800,1,NoOp(Local pickup: Pickup through Localchan call)
exten => 800,n,Dial(Local/824@en-pickup&Local/823@en-pickup)

[en-pickup]
exten => _[0-9*#]!, 1, PickupChan(Local/${EXTEN}@en-phone)

[en-phone]
exten => _[0-9*#]!, 1, Dial(SIP/gxp-${EXTEN},20,rwt)

{code}

The issue believe is that ast_hangup on 1 thread removes the channel while there is a pickup active on the same 'Local' channel.
Comments:By: Alec Davis (alecdavis) 2011-08-03 03:31:11.097-0500

bt-localpickup-trunk.txt:

100% repeatable.

crash after ringing 801 from snom phone, and picking up by dialling 800 from 2 gxp2000's.

By: Alec Davis (alecdavis) 2011-08-03 04:21:50.895-0500

localpickup-debug,diff.txt

Code that shows that a pickup is active when the hangup happens.

By: Alec Davis (alecdavis) 2011-08-03 04:25:06.957-0500

localpickup-console.txt

console output with localpickup-debug.diff.txt applied

By: Alec Davis (alecdavis) 2011-08-05 03:36:02.926-0500

uploaded localpickup.diff.txt

This has been tested with the sample dialplan in the description, but need further testing.



By: Richard Mudgett (rmudgett) 2011-08-05 09:47:55.727-0500

I think what this scenario is describing is:
Party A calls in and causes two extensions to ring.
Party B and Party C attempt to pickup the call.
Since there are two extensions ringing, Party B picks up one extension and Party C picks up the other.
The subsequent collision of answers and hangups causes the crash.

This is likely more a problem with the Dial application handling multiple answers than call pickup.
This seems similar to irroot's issue with the Queue application.
See https://reviewboard.asterisk.org/r/1323/

By: Gregory Hinton Nietsky (irroot) 2011-08-05 10:36:57.576-0500

Yes indeed and its much impproved still have some issues im working through have my spidysences on full ...

By: Alec Davis (alecdavis) 2011-08-05 14:48:16.071-0500

patches are for Asterisk SVN-trunk-r330650, so include https://reviewboard.asterisk.org/r/1323


By: Paul Belanger (pabelanger) 2011-08-22 11:22:09.009-0500

What is the status of this issue?

By: Alec Davis (alecdavis) 2011-08-22 15:23:45.202-0500

It's up for review at https://reviewboard.asterisk.org/r/1353/

All reports are that it's working perfectly.



By: Richard Mudgett (rmudgett) 2011-08-22 15:54:30.346-0500

The patch may work, but it likely has many unexpected side effects as mentioned also in the review.

By: Alec Davis (alecdavis) 2011-08-22 17:09:00.739-0500

Below is a simpler dialplan. not using localchan that causes orpaned channels.
fix this, and the original report may also be fixed.

dial 801 from 1 phone.
from 2 phones simultaneously dial *8
reports of NULL Objects etc. and an orpaned channel

{code}
exten => 801,1,NoOp(pickup debug: Ring Phones)
exten => 801,n,Dial(SIP/phone1&SIP/phone2)
{code}

By: Alec Davis (alecdavis) 2011-08-23 18:18:30.532-0500

It seems as though a similar techique of adding a 'pickup datastore' to the originating dialling channel of the multiple spawned calls is required, as is done for the target pickup channel.

Then when the 2nd pickup (a split second later) is attempted on one of the spawned calls, ast_can_pickup should check the dialling channel of the target extension and if it finds the dialling channel has a 'pickup datastore' then it's already being picked up elsewhere, and fails the pickup gracefully.

The above approach,to prevent orphaned channels, and messages that "strange things may happen", I believe is better than fixing the dial application, as I understand it - queues ringing multiple phones have the similar issues when concurrent pickup attempts happen.


By: Richard Mudgett (rmudgett) 2011-08-26 17:52:15.903-0500

I think this is the scenario that is causing all the grief:
1) Pickup target is selected
2) target is marked as being picked up in ast_do_pickup()
3) target is unlocked by ast_do_pickup()
4) app dial or queue gets a chance to hang up losing calls and calls ast_hangup() on target
5) SINCE A MASQUERADE HAS NOT BEEN SETUP YET BY ast_do_pickup() with ast_channel_masquerade(), ast_hangup() completes successfully and the channel is no longer in the channels container.
6) ast_do_pickup() then calls ast_channel_masquerade() to schedule the masquerade on the dead channel.
7) ast_do_pickup() then calls ast_do_masquerade() on the dead channel
8) bad things happen while doing the masquerade and in the process ast_do_masquerade() puts the dead channel back into the channels container
9) The "orphaned" channel is visible in the channels list.

By: Richard Mudgett (rmudgett) 2011-08-26 17:58:07.702-0500

A fix should do something like this:
1) ast_channel_masquerade() needs to detect that the original channel is already hung up and fail.
2) ast_hangup() needs some work to indicate that it has completed so ast_channel_masquerade() can detect it.  The ZOMBIE flag may be a good candidate for this.

By: Richard Mudgett (rmudgett) 2011-08-29 18:52:15.948-0500

I have posted review
https://reviewboard.asterisk.org/r/1400/
That implements the above fix.

By: Leif Madsen (lmadsen) 2011-09-19 11:24:34.559-0500

Revision 334009 in 1.8 branch.