[Home]

Summary:ASTERISK-28834: Segfault in taskprocessor_push
Reporter:laszlovl (lvl)Labels:
Date Opened:2020-04-15 06:46:10Date Closed:
Priority:MajorRegression?
Status:Open/NewComponents:Core/General
Versions:16.3.0 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:I see occasional segfaults in taskprocessor_push. Specifically,

{code}
tps->listener->callbacks->task_pushed(tps->listener, was_empty);
{code}

will crash because {{tps->listener}} is null. All traces I've seen somehow relate to hangups, so I guess a race condition/lack of locking somewhere allows for a task to be pushed onto a task processor while it's already being destructed.

I don't know how to properly fix that underlying issue, but will propose a simple NULL check to prevent the segfault.

Two example backtraces:

{code}
#0  0x00000000005c3643 in taskprocessor_push (tps=0x7f29ec144b60, t=0x7f2a60007180) at taskprocessor.c:1122
#1  0x00000000005c369c in ast_taskprocessor_push (tps=0x7f29ec144b60, task_exe=0x7f2a476e7df7 <hangup>, datap=0x7f2a60068b50) at taskprocessor.c:1128
#2  0x00007f2a4eae31c2 in ast_sip_push_task (serializer=0x7f29ec144b60, sip_task=0x7f2a476e7df7 <hangup>, task_data=0x7f2a60068b50) at res_pjsip.c:4631
#3  0x00007f2a476e7faa in chan_pjsip_hangup (ast=0x7f2a60169d30) at chan_pjsip.c:2377
#4  0x00000000004a3f25 in ast_hangup (chan=0x7f2a60169d30) at channel.c:2628

(gdb) p *tps
$11 = {
 local_data = 0x0,
 tps_queue_size = 0,
 tps_queue = {
   first = 0x0,
   last = 0x0
 },
 listener = 0x0,
 thread = 18446744073709551615,
 executing = 0,
 suspended = 0,
 subsystem = 0x7f29ec144bf4 "pjsip",
 name = 0x7f29ec144bc0 "pjsip/outsess/proxy-001531b2"
}

{code}

{code}
// from asterisk 15
#0  0x000000000063688c in taskprocessor_push (tps=0x7fd918394b38, t=0x7fd81805bdf0) at taskprocessor.c:900
#1  0x00000000006368e5 in ast_taskprocessor_push (tps=0x7fd918394b38, task_exe=0x7fd871679980 <hangup>, datap=0x7fd8180253b8) at taskprocessor.c:906
#2  0x00007fd87167975a in chan_pjsip_hangup (ast=0x7fd818048968) at chan_pjsip.c:2332
#3  0x00000000004c69be in ast_hangup (chan=0x7fd818048968) at channel.c:2649

(gdb) p *tps
$3 = {name = 0x0, stats = 0x0, local_data = 0x0, tps_queue_size = 0, tps_queue_low = 2250, tps_queue_high = 2500, tps_queue = {first = 0x0, last = 0x0}, listener = 0x0, thread = 18446744073709551615,
 executing = 0, suspended = 0}
{code}
Comments:By: Asterisk Team (asteriskteam) 2020-04-15 06:46:10.911-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: laszlovl (lvl) 2020-04-15 06:52:59.614-0500

Patch submitted. Another option might be to move {{ao2_unlock(tps);}} to after {{tps->listener->callbacks->task_pushed}} but I don't know how to estimate the impact of that.

By: Sean Bright (seanbright) 2020-04-15 10:45:46.240-0500

[~lvl], what version of Asterisk is the first backtrace from? Line numbers aren't matching up with 16.8.0.

By: laszlovl (lvl) 2020-04-15 11:15:20.085-0500

Apologies, this wasn't 16.8.0 but 16.3.0. The second one is from Asterisk 15.3.0.

By: Sean Bright (seanbright) 2020-04-15 11:17:23.602-0500

[~lvl], awesome. Can you reproduce with 16.9.0 and upload a backtrace?

By: laszlovl (lvl) 2020-04-15 11:24:30.548-0500

I haven't been able to reproduce this at will, so I'm dependant on production crashes. It will probably be at least a few months of testing before I'm able to run 16.9.0 there (by which time there will be a newer 16.x already, etc)