[Home]

Summary:ASTERISK-23755: SIGSEGV due to alignment bug on arm when destination callgroup/pickupgroup is set
Reporter:Peter Katzmann (pk16208)Labels:
Date Opened:2014-05-19 04:14:07Date Closed:
Priority:CriticalRegression?
Status:Open/NewComponents:Channels/chan_sip/General Core/Channels
Versions:11.9.0 13.18.4 Frequency of
Occurrence
Constant
Related
Issues:
is related toASTERISK-22572 Asterisk 11.5.1- SPARC don't start due to many ast_symbols not found
Environment:buildroot 2014.02, Marvel Kirkwood, linux 3.10Attachments:( 0) backtrace.txt
Description:I encountered an SIGSEGV during testing of asterisk 11 on a Kirkwood arm platform. I tracked it down to the case that it will only occur if the user has pickupgroup/callgroup set,
The problem does not exist with asterisk 1.8 .

During deeper examination  i figured out the it seems a alignment probel, becuase when i set /proc/cpu/alignment to 2 i get plenty of misalignment message but no SIGSEGV

The relevant exception part is:

{noformat}
Core was generated by `asterisk -g'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00088d00 in ast_channel_inherit_variables (parent=0xb6514774, child=0xb61a74) at channel.c:6527
6527                                    AST_LIST_INSERT_TAIL(ast_channel_varshead(child), newvar, entries);
#0  0x00088d00 in ast_channel_inherit_variables (parent=0xb6514774, child=0xb61a74) at channel.c:6527
       vartype = 2
       current = 0xaf3f20
       newvar = 0xb76ad8
       varname = <optimized out>
       __PRETTY_FUNCTION__ = "ast_channel_inherit_variables"
#1  0xb54938d8 in ?? ()
No symbol table info available.
#2  0xb54938d8 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
{noformat}
Comments:By: Matt Jordan (mjordan) 2014-05-19 08:16:50.221-0500

Please attach a full backtrace to this issue that has been generated from an instance of Asterisk with symbols. Attaching only a small portion is not sufficient for a developer to diagnose the problem.

Information on getting a backtrace can be found here: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Peter Katzmann (pk16208) 2014-05-19 09:59:14.485-0500

As requested, asterisk build with no optimize and backtrace as described

By: Peter Katzmann (pk16208) 2014-05-20 02:29:48.873-0500

New Lofiles uploaded

By: Rusty Newton (rnewton) 2014-05-21 18:03:20.315-0500

Can you provide specific dialplan and configuration with test steps to reproduce the issue, just so we are all clear?  That is, relevant extensions.conf, sip.conf, etc with instructions on how to reproduce.

By: Peter Katzmann (pk16208) 2014-05-22 09:08:47.819-0500

Hmm,
no easy going to strip it down.
Even if i only disable the agi calls in the current dialplan, no sigegv occurs.

So the problem is a little bit complexer then I thought to hunt down

peter


By: Peter Katzmann (pk16208) 2014-05-27 06:47:53.625-0500

When I build and use asterisk 1.8 instead of asterisk 11 then i have no alignment trap messages at all.
But when i switch to a asterisk 11 build, kernel immediately spills out alignment trap messages:

>Alignment trap: asterisk (817) PC=0xb53cb614 Instr=0xe1c120f0 Address=0xb6510ac4 FSR 0x801
<4>Alignment trap: asterisk (817) PC=0xb53cb614 Instr=0xe1c120f0 Address=0xb651931c FSR 0x801
<4>Alignment trap: asterisk (841) PC=0xb6aa5c14 Instr=0xe1c621d0 Address=0x00662644 FSR 0x001
<4>Alignment trap: asterisk (841) PC=0xb6aa5c18 Instr=0xe1c600d8 Address=0x0066263c FSR 0x001
<4>Alignment trap: asterisk (817) PC=0xb53cb614 Instr=0xe1c120f0 Address=0xb6504bcc FSR 0x801

The build system and library’s or completely identical, just a different asterisk selected.

By: Walter Doekes (wdoekes) 2014-05-27 07:36:17.723-0500

Might be the same issue as:
ASTERISK-21665

In that report, 1.8.x works fine, but 11.x behaves oddly:
{quote}
The problem – which I cannot explain at all – is that malloc(3) starts returning 4-byte aligned addresses after a while. And that obviously causes trouble.
{quote}

Assuming it is the 4-byte alignment it has trouble with.
{quote}
> > > What is "The Kirkwood"?  What size processor is this?
> >
> > It's an ARM926-like CPU from Marvell.
>
> 32 or 64bit?

All ARMs are 32-bit, but some have instructions for loading or
storing 2 x 32 bits at a time, but also require the corresponding
memory address to be 64-bit aligned.
{quote}

By: Peter Katzmann (pk16208) 2014-05-27 07:54:11.055-0500

Probably both bugs have same root of evil.
A quick look in the patches are in a area where asterisk created it's first misalignment during start-up.

It is in ast_task_processoer_get during mutex_init and acessing the task name