[Home]

Summary:ASTERISK-17917: [patch] app_dial may double free a channel datastore
Reporter:Mark Murawski (kobaz)Labels:
Date Opened:2011-05-24 18:13:05Date Closed:2011-07-18 15:55:27
Priority:MajorRegression?
Status:Closed/CompleteComponents:Applications/app_dial
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) dial_fix.patch
( 1) diff2.txt
Description:While a Dial() is running, a Bridge() is used to steal the call, there may be a datastore that's freed by app_bridge and then also freed in app_dial.



****** ADDITIONAL INFORMATION ******

in features.c for handling Bridge()
{code}
static void clear_dialed_interfaces(struct ast_channel *chan)
{
       struct ast_datastore *di_datastore;

       ast_channel_lock(chan);
       if ((di_datastore = ast_channel_datastore_find(chan, &dialed_interface_info, NULL))) {
               if (option_debug) {
                       ast_log(LOG_DEBUG, "Removing dialed interfaces datastore on %s since we're bridging\n", chan->name);
               }
               if (!ast_channel_datastore_remove(chan, di_datastore)) {
                       ast_datastore_free(di_datastore);
               }
       }
       ast_channel_unlock(chan);
}
{code}
in app_dial: dial_exec_full()
{code}
       ast_channel_lock(chan);
       if (!ast_channel_datastore_remove(chan, datastore)) {
               ast_datastore_free(datastore);
       }
{code}
app_dial does not check to make sure that the datastore still exists.
Comments:By: Michael L. Young (elguero) 2011-05-24 19:59:42

This would be related to the fix committed for issue ASTERISK-17874.

There was a patch for that issue uploaded limiting the clearing of the datastore to when a bridge did not happen (no answer, timeout, busy) although the channel locking was missing from it.

But, the patch was not used and instead the original change to app_dial.c that was made when adding the clearing of the dialed interfaces store to features.c, were reverted resulting is this double free that you are seeing.

Your patch is checking if it was freed no matter what and is probably safer.

I just wanted to comment here in order to get the relations setup for historical purposes.

I think this double-free was committed into other versions such as 1.4, 1.6.2 as well.

By: Mark Murawski (kobaz) 2011-05-24 20:04:56

Okay, I'll check into 1.4 and 1.6.2.  Marking relation.

By: Terry Wilson (twilson) 2011-07-14 16:14:26.126-0500

Have you actually seen a double free happen, or is this theoretical? ast_channel_datastore_remove() calls AST_LIST_REMOVE which should return NULL if the datastore has already been removed, causing ast_channel_datastore_remove to return 1 and thus not trying to free the datastore again.

By: Mark Murawski (kobaz) 2011-07-14 16:22:09.058-0500

The double free does happen, very often.  With my dialplan/scripts I crash often without this patch.

By: Mark Murawski (kobaz) 2011-07-14 16:28:35.482-0500

After Bridge() frees the datastore there is a dangling pointer.  You shouldn't call ast_channel_datastore_remove with a pointer to a datastore that may have been already freed.  There is no guarantee as to what data will be at that address.

By: Terry Wilson (twilson) 2011-07-14 16:46:08.255-0500

Well, AST_LIST_REMOVE *shouldn't* need to do anything with the pointer that is passed to it except use it for comparison, which should be entirely safe (unless it matches, in which case we can assume it hasn't been freed). The problem really appears to be that under all circumstances AST_LIST_REMOVE actually sets (elm)->field.next = NULL at the end. I *think* it should set (__res)\->field.next = NULL if __res != NULL instead. I'll see if I can reproduce the issue and see if that fixes it. Or, you can test that as a solution since you are reproducing it now.

By: Terry Wilson (twilson) 2011-07-14 19:21:43.635-0500

Please see if the diff2.txt also fixes the issue for you.

By: Terry Wilson (twilson) 2011-07-14 19:24:12.870-0500

I tried everything I could think of to reproduce the issue and can't get Bridge to happen between the Dial and Answer so that ast_bridge_call will delete the datastore before Dial() does. In any case, I *think* that the attached patch should fix the underlying problem with AST_LIST_REMOVE.

By: Mark Murawski (kobaz) 2011-07-15 08:00:17.468-0500

I'm still learning Jira.  Didn't quite know how to leave feedback properly.

Anyway...  I can try and put together some simple dialplan to reproduce this, but it may be difficult.  It's very timing dependent.

I agree that AST_LIST_REMOVE should have some safety built in to make sure you're not double removing, but I think just adding safety for AST_LIST_REMOVE isn't good practice, because you're relying on a comparison of a dangling pointer to freed data.  The higher level functions should make sure it's doing the-right-thing and not double free in the first place.


By: Terry Wilson (twilson) 2011-07-15 10:38:13.335-0500

I should have been more clear. Yes, I definitely don't think that we should be passing around a bad pointer and should fix that--I just want to fix AST_LIST_REMOVE as well since it *should* have protected us from ourselves in this case since we actually tried to protect the free() in a block that requires the element to be in the list.

I just wanted to be able to show that the AST_LIST_REMOVE change worked by reproducing the problem as well. It is obvious enough, though, that it isn't really necessary.

By: Mark Murawski (kobaz) 2011-07-18 15:20:13.852-0500

Good good.  I like the change to AST_LIST_REMOVE.

I'm going to commit the change to dial unless there are objections?



By: Terry Wilson (twilson) 2011-07-18 15:22:30.110-0500

Works for me.

By: Mark Murawski (kobaz) 2011-07-18 15:48:04.234-0500

r328663 | markm | 2011-07-18 16:47:04 -0400 (Mon, 18 Jul 2011) | 9 lines

app_dial may double free a channel datastore

When starting a call with originate, and having the callee channel run Bridge() on pickup, we will double free the dialed_interface_info datastore, causing a crash.  Make sure to check if the datastore still exists before trying to free it.

(closes issue ASTERISK-17917)
Reported by: Mark Murawski
Tested by: Mark Murawski


By: Mark Murawski (kobaz) 2011-07-18 15:49:16.247-0500

Not touching 1.4 and 1.6.2 due to them moving to security fixes only.

By: Mark Murawski (kobaz) 2011-07-18 15:51:59.452-0500

Merged to 1.10 (r328664)
Merged to trunk (r328665)


By: Mark Murawski (kobaz) 2011-07-18 15:55:06.526-0500

Terry, do you have a jira issue number for the AST_LIST_REMOVE changes?