[Home]

Summary:ASTERISK-06019: [patch] Segfault in pbx_builtin_setvar_helper. Looks like the channel is being closed down underneath us
Reporter:Steve Davies . (stevedavies)Labels:
Date Opened:2006-01-09 04:09:46.000-0600Date Closed:2009-08-23 09:39:10
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_macro
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20060121__bug6176.diff.txt
( 1) gdb_log_-_setvar_race
Description:Site had a segfault.  gdb backtraces attached.

I had a look at what happened...

Segfault was in pbx_builtin_setvar_helper, whilst executing in the AST_LIST_REMOVE macro.

pbx_builtin_setvar_helper was called by macro_exec, which was trying to set MACRO_DEPTH on the channel.

Guilty code at pbx.c around line 5751 (SVN trunk, r7839):

       AST_LIST_TRAVERSE (headp, newvariable, entries) {
               if (strcasecmp(ast_var_name(newvariable), nametail) == 0) {
                       /* there is already such a variable, delete it */
                       AST_LIST_REMOVE(headp, newvariable, entries);
                       ast_var_delete(newvariable);
                       break;
               }
       }

headp here points at varshead in the channel structure.

Apparently at the time the list was traversed, there must have been variables on the channel - given that an existing MACRO_DEPTH was found to delete.

However, by the time we got into the AST_LIST_REMOVE code, curelm was null (which comes from (head)->first, i.e. varshead.first).

(gdb) frame 0
#0  0x080900a2 in pbx_builtin_setvar_helper (chan=0x88d8a30, name=0x72734e "MACRO_DEPTH", value=0xb79ee2b0 "1") at pbx.c:5941
5941                            AST_LIST_REMOVE(headp, newvariable, entries);
(gdb) p curelm
$2 = (struct ast_var_t *) 0x0

Here's AST_LIST_REMOVE:

#define AST_LIST_REMOVE(head, elm, field) do {                          \
       if ((head)->first == (elm)) {                                   \
               (head)->first = (elm)->field.next;                      \
       }                                                               \
       else {                                                          \
               typeof(elm) curelm = (head)->first;                     \
               while (curelm->field.next != (elm))                     \
                       curelm = curelm->field.next;                    \
               curelm->field.next = (elm)->field.next;                 \
       }                                                               \
       if ((head)->last == elm)                                        \
               (head)->last = NULL;                                    \
} while (0)

So my theory is that the channel was busy being dismantled as we tried to remove the variable.

Here's the channel at the time of the segfault:

(gdb) frame 1
#1  0x00726701 in macro_exec (chan=0x88d8a30, data=0xb79f27e0) at app_macro.c:253
253             pbx_builtin_setvar_helper(chan, "MACRO_DEPTH", depthc);
(gdb) p *chan
$3 = {name = "Parking/Local/91001@from-internal-74b8,2<ZOMBIE>", '\0' <repeats 31 times>, tech = 0x80f4600, tech_pvt = 0x0,
 language = "en", '\0' <repeats 17 times>, type = 0x7ee627 "Local", fds = {-1, -1, -1, -1, -1, -1, 172, -1},
 musicclass = '\0' <repeats 19 times>, music_state = 0xb6602dd8, generatordata = 0x0, generator = 0x0, _bridge = 0x0, masq = 0x0,
 masqr = 0x0, cdrflags = 0, _softhangup = 1, whentohangup = 0, blocker = 3080698800, lock = {__m_reserved = 0, __m_count = 0,
   __m_owner = 0x0, __m_kind = 1, __m_lock = {__status = 0, __spinlock = 0}}, blockproc = 0x80f4a24 "ast_waitfor_nandfds",
 appl = 0xb7b437b0 "Macro", data = 0xb79f27e0 "dial|30|tTrh|91001", fdno = 6, sched = 0x8a3f2c8, streamid = -1, stream = 0x0,
 vstreamid = 0, vstream = 0x0, oldwriteformat = 64, timingfd = 172, timingfunc = 0, timingdata = 0x0, _state = 0, rings = 0,
 nativeformats = 8, readformat = 8, writeformat = 8, cid = {cid_dnid = 0x0, cid_num = 0x0, cid_name = 0x0, cid_ani = 0x0,
   cid_rdnis = 0x0, cid_pres = 0, cid_ani2 = 0, cid_ton = 0, cid_tns = 0},
 context = "macro-dial\000-vm\000able", '\0' <repeats 60 times>, macrocontext = "from-internal", '\0' <repeats 66 times>,
 macroexten = "91001", '\0' <repeats 74 times>, macropriority = 1, exten = "s\000001", '\0' <repeats 74 times>, priority = 10,
 dtmfq = '\0' <repeats 79 times>, dtmff = {frametype = 0, subclass = 0, datalen = 0, samples = 0, mallocd = 0, offset = 0, src = 0x0,
   data = 0x0, delivery = {tv_sec = 0, tv_usec = 0}, prev = 0x0, next = 0x0}, pbx = 0x890bcf8, amaflags = 3,
 accountcode = '\0' <repeats 19 times>, cdr = 0x88f6bc0, adsicpe = 0, call_forward = '\0' <repeats 79 times>, zone = 0x0,
 monitor = 0x0, insmpl = 0, outsmpl = 0, fin = 2969, fout = 1174, uniqueid = "1136789367.968", '\0' <repeats 17 times>,
 hangupcause = 16, varshead = {first = 0x0, last = 0x0}, callgroup = 0, pickupgroup = 0, flags = 528, transfercapability = 0,
 readq = 0x88a7a58, alertpipe = {-1, -1}, writetrans = 0x0, readtrans = 0x0, rawreadformat = 0, rawwriteformat = 0, spies = 0x88d9560,
 next = 0x88ad3c0}

Notice the name.  Notice, also, that the varshead.first is null.  Notice, lastly, that _softhangup == 1.

My proposed fix would be two-phase.

1) Make AST_LIST_REMOVE (and friends?) more defensive.
2) Investigate adding locking to avoid this potential race.

Regards,
Steve Davies


****** ADDITIONAL INFORMATION ******

Bug present in 1.2.1 and trunk.  Probably older versions too.
Comments:By: Peng Yong (ppyy) 2006-01-09 08:14:40.000-0600

can you provide your dial plan?

By: Steve Davies . (stevedavies) 2006-01-09 23:40:23.000-0600

ppyy:

I'm not sure why you feel you need to see the dialplan??

It's a customer's site.  The dialplan is hacked-on AMP.

As can been seen above, the channel was busy returning from the "macro-dial" context after finishing the Dial at step 10.  Its going back into "from-internal".  We were executing in macro_exec, trying to reset MACRO_DEPTH to the value it had on entry IIRC.

Anyway, here's macro-dial:

[macro-dial]
exten => s,1,GotoIf($[ "${MACRO_CONTEXT}" = "macro-rg-group" ]?4:2)  ; if this is from rg-group, don't strip prefix
exten => s,2,GotoIf($["${CALLERIDNAME:0:${LEN(${RGPREFIX})}}" != "${RGPREFIX}"]?5:4)  ; check for ring-group prefix
exten => s,3,SetCIDName(${CALLERIDNAME:${LEN(${RGPREFIX})}}) ; strip off prefix
exten => s,4,AGI,dialparties.agi
exten => s,5,NoOp(Returned from dialparties with no extensions to call)
exten => s,6,Set(DIALSTATUS=BUSY)
exten => s,10,Dial(${ds})                               ; dialparties will set the priority to 10 if $ds is not null
exten => s,20,NoOp(Returned from dialparties with hunt groups to dial )
exten => s,21,Set(HuntLoop=0)
exten => s,22,GotoIf($[$[${HuntMembers} >= 1]?30 )  ; if this is from rg-group, don't strip prefix
exten => s,23,NoOp(Returning there are no members left in the hunt group to ring)

exten => s,30,Set(HuntMember=HuntMember${HuntLoop})
exten => s,31,GotoIf($[$[${CALLTRACE_HUNT} != "" ] & $["${RingGroupMethod}" = "hunt" ]]?32:35 )  ; Set CAll Trace for Hunt member we are going to call
       exten => s,32,Cut(CT_EXTEN=ARG3,,$[${HuntLoop} + 1])
       exten => s,33,DBput(CALLTRACE/${CT_EXTEN}=${CALLTRACE_HUNT})
       exten => s,34,Goto(s,42)

exten => s,35,GotoIf($[$[${CALLTRACE_HUNT} != "" ] & $["${RingGroupMethod}" = "memoryhunt" ]]?36:50 )  ;Set Call Trace for each hunt member we are going
to call "Memory groups have multiple members to set CALL TRACE For hence the loop
       exten => s,36,Set(CTLoop=0)
       exten => s,37,GotoIf($[${CTLoop} > ${HuntLoop}]?42 )  ; if this is from rg-group, don't strip prefix
       exten => s,38,Cut(CT_EXTEN=ARG3,,$[${CTLoop} + 1])
       exten => s,39,DBput(CALLTRACE/${CT_EXTEN}=${CALLTRACE_HUNT})
       exten => s,40,Set(CTLoop=$[1 + ${CTLoop}])
       exten => s,41,Goto(s,37)
exten => s,42,Dial(${${HuntMember}}${ds} ) ; dialparties will set the priority to 20 if $ds is not null and its a hunt group
exten => s,43,Set(HuntLoop=$[1 + ${HuntLoop}])
exten => s,44,Set(HuntMembers=$[${HuntMembers} - 1])
exten => s,45,Goto(s,22)
exten => s,50,DBdel(CALLTRACE/${CT_EXTEN})
exten => s,51,Goto(s,42)


Steve

By: Tilghman Lesher (tilghman) 2006-01-20 23:13:31.000-0600

I disagree with the proposed solution.  The real problem is that after the channel starts its hangup procedure, Macro may still want to execute things on the channel.  So once it hangs up, Macro should take a hands-off approach to the channel -- don't access the channel structures at all at that point.

And it doesn't really need to be restoring the variables at that point, either.  The channel is gone; there's nothing else that is going to be seeing those variables.

By: Tilghman Lesher (tilghman) 2006-01-20 23:14:42.000-0600

Proposed fix uploaded.  Please test to ensure that it fixes the stated problem.

By: Steve Davies . (stevedavies) 2006-01-22 09:57:10.000-0600

Thanks for the patch.  I'll put it on the client's box and report back the results.

Steve

By: Steve Davies . (stevedavies) 2006-02-05 05:41:07.000-0600

Well,

There's been no repeat occurences, so this can be called fixed I think.

Steve

By: Tilghman Lesher (tilghman) 2006-02-05 10:27:32.000-0600

Committed to 1.2, merged to trunk.

By: Digium Subversion (svnbot) 2008-12-17 15:14:53.000-0600

Repository: asterisk
Revision: 165317

U   branches/1.4/apps/app_macro.c

------------------------------------------------------------------------
r165317 | tilghman | 2008-12-17 15:14:53 -0600 (Wed, 17 Dec 2008) | 4 lines

Reverse the fix from issue ASTERISK-6019 and add proper handling for that issue.
(Closes issue ASTERISK-13115, closes issue ASTERISK-12632)
Fixed by myself (license 14)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=165317

By: Digium Subversion (svnbot) 2008-12-17 15:18:52.000-0600

Repository: asterisk
Revision: 165319

_U  trunk/
U   trunk/apps/app_macro.c

------------------------------------------------------------------------
r165319 | tilghman | 2008-12-17 15:18:52 -0600 (Wed, 17 Dec 2008) | 11 lines

Merged revisions 165317 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
 r165317 | tilghman | 2008-12-17 15:14:37 -0600 (Wed, 17 Dec 2008) | 4 lines
 
 Reverse the fix from issue ASTERISK-6019 and add proper handling for that issue.
 (Closes issue ASTERISK-13115, closes issue ASTERISK-12632)
 Fixed by myself (license 14)
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=165319

By: Digium Subversion (svnbot) 2008-12-17 15:21:14.000-0600

Repository: asterisk
Revision: 165321

_U  branches/1.6.0/
U   branches/1.6.0/apps/app_macro.c

------------------------------------------------------------------------
r165321 | tilghman | 2008-12-17 15:21:14 -0600 (Wed, 17 Dec 2008) | 18 lines

Merged revisions 165319 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r165319 | tilghman | 2008-12-17 15:18:57 -0600 (Wed, 17 Dec 2008) | 11 lines
 
 Merged revisions 165317 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r165317 | tilghman | 2008-12-17 15:14:37 -0600 (Wed, 17 Dec 2008) | 4 lines
   
   Reverse the fix from issue ASTERISK-6019 and add proper handling for that issue.
   (Closes issue ASTERISK-13115, closes issue ASTERISK-12632)
   Fixed by myself (license 14)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=165321

By: Digium Subversion (svnbot) 2008-12-17 15:22:34.000-0600

Repository: asterisk
Revision: 165323

_U  branches/1.6.1/
U   branches/1.6.1/apps/app_macro.c

------------------------------------------------------------------------
r165323 | tilghman | 2008-12-17 15:22:33 -0600 (Wed, 17 Dec 2008) | 18 lines

Merged revisions 165319 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
 r165319 | tilghman | 2008-12-17 15:18:57 -0600 (Wed, 17 Dec 2008) | 11 lines
 
 Merged revisions 165317 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/branches/1.4
 
 ........
   r165317 | tilghman | 2008-12-17 15:14:37 -0600 (Wed, 17 Dec 2008) | 4 lines
   
   Reverse the fix from issue ASTERISK-6019 and add proper handling for that issue.
   (Closes issue ASTERISK-13115, closes issue ASTERISK-12632)
   Fixed by myself (license 14)
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=165323

By: Digium Subversion (svnbot) 2009-02-11 14:54:22.000-0600

Repository: asterisk
Revision: 174885

U   branches/1.4/apps/app_macro.c
U   branches/1.4/main/pbx.c

------------------------------------------------------------------------
r174885 | tilghman | 2009-02-11 14:54:19 -0600 (Wed, 11 Feb 2009) | 13 lines

Restore a behavior that was recently changed, when we fixed issue ASTERISK-13115 and
issue ASTERISK-12632 (related to issue ASTERISK-6019).  When a hangup occurs during a Macro
execution in earlier 1.4, the h extension would execute within the Macro
context, whereas it was always supposed to execute only within the main context
(where Macro was called).  So this fix checks for an "h" extension in the
deepest macro context where a hangup occurred; if it exists, that "h" extension
executes, otherwise the main context "h" is executed.
(closes issue ASTERISK-13256)
Reported by: wetwired
Patches:
      20090210__bug14122.diff.txt uploaded by Corydon76 (license 14)
Tested by: andrew

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=174885

By: Digium Subversion (svnbot) 2009-02-11 14:55:50.000-0600

Repository: asterisk
Revision: 174886

_U  trunk/

------------------------------------------------------------------------
r174886 | tilghman | 2009-02-11 14:55:47 -0600 (Wed, 11 Feb 2009) | 19 lines

Blocked revisions 174885 via svnmerge

........
 r174885 | tilghman | 2009-02-11 14:54:18 -0600 (Wed, 11 Feb 2009) | 13 lines
 
 Restore a behavior that was recently changed, when we fixed issue ASTERISK-13115 and
 issue ASTERISK-12632 (related to issue ASTERISK-6019).  When a hangup occurs during a Macro
 execution in earlier 1.4, the h extension would execute within the Macro
 context, whereas it was always supposed to execute only within the main context
 (where Macro was called).  So this fix checks for an "h" extension in the
 deepest macro context where a hangup occurred; if it exists, that "h" extension
 executes, otherwise the main context "h" is executed.
 (closes issue ASTERISK-13256)
  Reported by: wetwired
  Patches:
        20090210__bug14122.diff.txt uploaded by Corydon76 (license 14)
  Tested by: andrew
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=174886