[Home]

Summary:ASTERISK-17188: [patch] p->chan can disappear between test and lock in deadlock avoidance in local_hangup
Reporter:David Woolley (davidw)Labels:
Date Opened:2010-12-30 07:28:50.000-0600Date Closed:2015-03-13 21:15:48
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_local
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) Issue18558-patch1.diff.txt
( 1) Issue18558-patch2.diff.txt
Description:r 259899 introduces an unlock/lock sequence on p-> chan in the deadlock avoidance for a failed lock on p->owner, in local_hangup.  However, as noted in a comment added/modified in r 292867, and assumed by code which tests for NULL, p->chan may be nulled whilst the lock on p is off.

Depending on exactly where the NULL is set, as well as the case that is handled, this could result in:

1) deferencing NULL, when calling lock;
2) applying a lock to a structure that is being destroyed.

****** ADDITIONAL INFORMATION ******

This was noted whilst doing a code review of the conflicting changes whilst considering backporting r 292867, in order to try to avoid a crash due to a double free.

I have left this as minor simply because I don't have evidence that it is a significant problem in the wild, however, the worst outcomes include direct segmentation violations and may include indirect ones owing to manipulating free structures.
Comments:By: David Woolley (davidw) 2010-12-30 10:56:28.000-0600

Actually r 259899 doesn't introduce the problem for p->chan, that happened much earlier.  It does, however, introduce the equivalent problem for p->owner.

By: David Woolley (davidw) 2011-01-04 13:02:03.000-0600

I've uploaded a proposed patch.  I'd appreciate it if someone could check it for obvious mistakes, as theoretical race conditions are difficult to test.

Basically, it rearranges things, and uses try_lock, to ensure that references from local_pvt are only used when it is locked.

Note, in reality, I actually coded this against a backport, but then applied it against the latest 1.6.2 branch, to get correct line numbers.  Any testing I do is likely to be against the back version.

Also, note that I have some reservations about the change to using sched_yield that happened as the same time as the ao2_lock change.  As I see it, it will work well with real time priorities, but with normal priorities might result in going round the loop many times until ones CPU factor rises enough to lose you the processor.

By: David Woolley (davidw) 2011-01-05 04:52:56.000-0600

Oops! Really should have tried to compile it first.  Spurious ")" now removed.

By: David Vossel (dvossel) 2011-05-06 16:18:38

This is really difficult to fix in 1.6.2.

The right fix in 1.8 would be to grab a reference to the p->chan channel while p is locked to guarantee it won't go away.  The you have to lock the referenced channel and see if it is still equal to p->chan after p is locked again and all the stupid deadlock avoidance stuff is done.

By: Joshua C. Colp (jcolp) 2015-03-13 21:15:48.480-0500

This has been fixed in 1.8+ thanks to reference counting.