[Home]

Summary:ASTERISK-20128: Virtualized asterisk.org 1.8.14.0 no longer runs in a KVM virtualized environment. Compiles without error, but fails with Illegal instruction on launch Regression since 1.8.13.0 Last good 1.8.12.2
Reporter:mike (linux.ninja1)Labels:fax
Date Opened:2012-07-15 03:31:32Date Closed:
Priority:MajorRegression?
Status:Open/NewComponents:Core/BuildSystem
Versions:1.8.13.0 1.8.14.0 13.18.4 Frequency of
Occurrence
Constant
Related
Issues:
is caused byASTERISK-19462 asterisk Illegal Instruction (core dumped)
is related toASTERISK-22931 Impossible to execute Asterisk because if illegal instruction
is related toASTERISK-21967 CFLAG Improvement to prevent compiler error in Virtual Machine environments
Environment:verified on Centos 6.2 and 6.3 - 64 bit running in KVM virtualized instances with the following /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 2 model name : QEMU Virtual CPU version 0.15.1 stepping : 3 cpu MHz : 1497.505 cache size : 512 KB fpu : yes fpu_exception : yes cpuid level : 4 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm up unfair_spinlock pni cx16 popcnt hypervisor lahf_lm svm abm sse4a bogomips : 2995.01 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: Attachments:( 0) cat-proc-cpuinfo-with-different-settings.txt
( 1) console-logfile.txt
( 2) cpu-flags-defaultKVM-forcedKVM-baremetal.txt
( 3) KVM-Virt-Manager-defaults.png
( 4) KVM-Virt-Manager-forced-CPU.png
Description:We are running asterisk 1.8.X instances in virtual machines.
The asterisk virtual machines are running Centos 6.2 and 6.3.
Kernel of virtual machines is :  2.6.32-279.1.1.el6.x86_64 #1 SMP Tue Jul 10 13:47:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
KVM Hypervisor / host is a Fedora 16 with the latest kernel :  3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
Hardware is HP ProLiant N40L  MicroServer

Any asterisk version higher than 1.8.12.2 fails with a "illegal instruction" on launch of asterisk.
Comments:By: Paul Belanger (pabelanger) 2012-07-15 12:09:22.554-0500

Thank you for taking the time to report this bug and helping to make Asterisk better. Unfortunately, we cannot work on this bug because your description did not include enough information. You may find it helpful to read the Asterisk Issue Guidelines http://www.asterisk.org/developers/bug-guidelines. We would be grateful if you would then provide a more complete description of the problem. At a minimum, we need:

1. the specific steps or actions you took that caused you to encounter the problem,
2. the behavior you expected, and
3. the behavior you actually encountered (in as much detail as possible).

This likely includes output from the console with debug level logging, a SIP trace (if this is SIP related), and configuration information such as dialplan (e.g. extensions.conf) and channel configuration (e.g. sip.conf). Thanks!



By: mike (linux.ninja1) 2012-07-15 13:25:52.705-0500

steps to reproduce:

started with a working Centos 6.3 64 bit with asterisk
"Connected to Asterisk 1.8.12.2 currently running on XXXXX (pid = 1189)"

download 1.8.14,
tar -zxf
./configure
make
make menuselect ( default choices)
make
make install
asterisk -rvv
---> last command generates the "Illegal instruction"

I will attach the console logfile but basically, the download and compiles work without errors,
but the compile job may be done for the wrong CPU target ?
The server works again mhen I do a compile / install with the 1.8.12 codebase.





By: mike (linux.ninja1) 2012-07-15 13:34:30.280-0500

console log file.
basically a download, untar, configure, make, make menuselect and then finally
asterisk -rvvv to generate the error.

The asterisk service of course does no longer run after a reboot.

file does seem to report that /usr/sbin/asterisk is a 64 bit application.

file /usr/sbin/asterisk
/usr/sbin/asterisk: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped



By: Matt Jordan (mjordan) 2012-07-16 08:19:50.541-0500

This was changed in the following revision:

* Makefile.rules, makeopts.in, codecs/lpc10/Makefile, Makefile,
         build_tools/cflags.xml, build_tools/menuselect-deps.in,
         codecs/gsm/src/k6opt.s, configure, codecs/gsm/Makefile,
         configure.ac: Simplify build system architecture optimization
         This change to the build system rips out any usage of PROC along
         with architecture-specific optimizations in favor of using
         -march=native where it is supported. This fixes broken builds on
         64bit Intel systems and results in better optimized code on
         systems running GCC 4.2+. Review:
         https://reviewboard.asterisk.org/r/1852/ (closes issue
         ASTERISK-19462)

You may need to disable the BUILD_NATIVE menuselect flag and - possibly - specify the architecture you wish to compile for using the CFLAGS environment variable.

By: Kinsey Moore (kmoore) 2012-07-16 08:43:07.802-0500

What version of gcc is in use on this system?

By: mike (linux.ninja1) 2012-07-16 10:10:03.385-0500

gcc version

gcc --version
gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



By: mike (linux.ninja1) 2012-07-16 18:37:23.256-0500

it's something with the configure script related to the CPU type.

The default KVM emulated CPU is processor
vendor_id : AuthenticAMD
cpu family : 6
model : 2
model name : QEMU Virtual CPU version 0.15.1
which generates the error,

forcing the CPU architecture to be the same as the KVM host
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : AMD Phenom(tm) 9550 Quad-Core Processor

is a work around.  With this option via Virt-Manager, Processor, Configuration, Model -> Copy host CPU configuration making the emulated CPU  Opteron_G3,
the illegal instruction is not generated.

Of course this workaround will block live "vmotion style" migrations locking the virtual server to a certain CPU type.



By: mike (linux.ninja1) 2012-07-16 18:38:45.268-0500

details of the CPU KVM presents to the virtual servers.

By: mike (linux.ninja1) 2012-07-16 18:40:44.250-0500

default KVM processor selection

By: mike (linux.ninja1) 2012-07-16 18:41:57.778-0500

KVM virt Manager with forced CPU,
making the CPU presented to the virtual server running asterisk the same as the underlying real hardware.

Good for performance, bad for live migrations of virtual servers

By: Kinsey Moore (kmoore) 2012-07-17 09:26:24.133-0500

This is almost certainly a bug in either the presentation of instruction set capabilities to Linux or in gcc's detection of said capabilities.  Can you get more information on the instruction that is causing the problem and the instruction sets offered by the QEMU virtual CPU?

By: mike (linux.ninja1) 2012-07-17 10:46:09.655-0500


CPU flags with forced CPU
fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb lm up rep_good extd_apicid unfair_spinlock pni cx16 popcnt hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw npt


CPU flags with "default KVM CPU" type:
fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm up unfair_spinlock pni cx16 popcnt hypervisor lahf_lm svm abm sse4a


The "default KVM CPU" flags are  a subset of the CPU type that is forced.
Every flag of the "default KVM CPU" type is present in the forced CPU type that does not generate the illegal instruction on startup of asterisk

Not sure, as a non-developer what I can do to find the exact illegal instruction at startup.
I might be able to setup a dev system that is ssh accessable over the internet.


By: Kinsey Moore (kmoore) 2012-07-20 13:22:10.355-0500

Having a system I can get into would work.  If not, you can run Asterisk in GDB until it crashes and then disassemble the code and find the address where it threw the SIGILL.  What are the CPU flags for the non-virtualized install on that machine?  Do they differ from the forced CPU option?  You can send the connection info to kmoore@digium.com if you go that route.

By: mike (linux.ninja1) 2012-07-20 16:13:55.571-0500



The CPU flags of the native machine, the bare metal thing running the hypervisor and the cpu specs are below:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save

So they are different from the "default KVM CPU" type and the "forced CPU", sometime I would expect somewhat.

A ssh key request was send by private mail to kmoore@



By: mike (linux.ninja1) 2012-07-20 16:16:56.917-0500


CPU flags compilation of

default KVM
forced KVM
native hypervisor

By: Kinsey Moore (kmoore) 2012-07-23 10:21:34.628-0500

SSH key provided via email.

By: Kinsey Moore (kmoore) 2012-07-24 14:32:49.409-0500

From the disassembly I'm looking at, the (first) invalid instruction is "8F EA F8 10" which GDB promptly refused to translate into assembly.

By: Brian Raynor (hraynor) 2012-07-26 10:32:03.009-0500

I was having the same problems mentioned above, trying to install PBX In A Flash 2.0.6.2 (32 bit) with Asterisk 1.8.13.0.  PIAF compiles Asterisk as part of the install.  

Running with KVM under ProxMox 2.1 on an AMD Athlon II X3 host (giving 1 vCPU/core to PIAF/Asterisk, 1 GB RAM), I constantly got the illegal instruction errors when trying to start Asterisk after the compile.  

Following the suggestion by Matt Jordan above, I used make menuselect during the PIAF install (prior to Asterisk compile of course) to turn OFF the BUILD_NATIVE flag.  I did not need to mess with CFLAGS at all. BUILD_NATIVE flag turned off was the ONLY change I made.

With this, Asterisk starts successfully with NO issues.  Prior to doing this - I tried just about everything with no success.

Just wanted to confirm that this appears to have fixed things for me.  Haven't put any stress yet on Asterisk (will be doing so over the next few days to make sure things are stable before putting this system into production), but previously it wouldn't even start so seems like this has resolved my issues.


By: Kinsey Moore (kmoore) 2012-07-26 10:47:00.057-0500

The problem here is that GCC seems to be handling that particular flag incorrectly in this case and I have yet to figure out why.  What version of GCC are you using?  The version linux ninja1 is having issues with is 4.4.6 on CentOS 6.3.  Disabling BUILD_NATIVE disables all optimizations which could be terrible for a production environment.

By: Kinsey Moore (kmoore) 2012-07-26 12:03:23.281-0500

Configuring Asterisk with CFLAGS="-march=k8 -msse4a" should cause asterisk to build correctly and be decently optimized for your system.

By: Brian Raynor (hraynor) 2012-07-26 12:23:19.496-0500

GCC version (from running gcc --version) is 4.4.6 20120305 (Red Hat 4.4.6-4) and is running on CentOS 6.2 (Final) kernel 2.6.32-220.17.1.el6.i686 - 32 Bit if that matters.

Understand that turning off the BUILD_NATIVE can slow things down.  For my purposes though I'd imagine it won't affect much as for this particular install we're just looking at handling 3 phones plus 1 fax (pass through) and initially only a single PSTN trunk (through an ObiHai GW).  Might have two SIP trunks to other locations, though suspect that this won't be used much.

So only see around 3 max calls through the system.  And thought is that once this is "fixed" in Asterisk, I can always download the newest and recompile.  

Will see what my testing bears out though with BUILD_NATIVE disabled.  


By: Brian Raynor (hraynor) 2012-07-26 12:24:15.625-0500

May try later today with CFLAGS.  Thanks for the tip!

By: Kinsey Moore (kmoore) 2012-07-26 13:37:09.304-0500

The only real "fix" for this will be updating to a newer or properly patched version of gcc.  Until I have time to do more testing, I probably won't be able to tell you which version.

By: Brian Raynor (hraynor) 2012-07-26 14:47:10.228-0500

Thanks!  BTW - anyway to change CFLAGS directly from menuselect?  (my guess is no).  No provision to break out of the PIAF install to add it to the environment without breaking said install.  Course could always build with BUILD_NATIVE off, let PIAF finish, and recompile Asterisk with BUILD_NATIVE on and CFLAGS set.

Oh, and BTW - (and I'm sure this is probably general knowledge) - Asterisk builds and runs just fine from VMWare Workstation (Windows 7) when running on an Intel i7-2760QM host without needing to mess with any compiler options from default.


By: Andrew Latham (lathama) 2012-08-20 18:26:12.576-0500

Kinsey is this still an active topic or was it resolved?

By: David Woolley (davidw) 2012-08-21 05:21:36.142-0500

This forum thread looks like it may be the same issue:

http://forums.digium.com/viewtopic.php?f=1&t=83743

By: Andrew Latham (lathama) 2012-08-21 08:08:30.977-0500

Ok, so in summary this is an issue of an older/outdated version of GCC.  This was mentioned a few months ago when discussing longterm support

By: Kinsey Moore (kmoore) 2012-08-21 08:39:39.570-0500

Unfortunately, I wouldn't even say that this particular version of GCC is outdated since it's in recent releases of CentOS 6.  The only resolution thus far is to disable BUILD_NATIVE in menuselect or set CFLAGS during configure (which disables BUILD_NATIVE).  I still haven't had time to determine what the exact issue here is, but it seems it isn't restricted to AMD hardware based on the thread linked above.

By: mike (linux.ninja1) 2012-08-21 16:46:20.103-0500

For me this seems  more a regression in the sourcecode that all of a sudden stops working in a KVM virtualization environment.   Or can somebody help build a case for a gcc issue ?

I concide that running asterisk in a virtualized environment is new, but with all the cloudiness, with a KVM hypervisor, in every linux kernel and the recent release of oVirt 3.1 I  can only see more people trying it.

Even Cisco is going overboard with virtualizing all their CUCM callmanagers, callcenter and voicemail servers.
A Cisco PBX on native hardware are going the way of the dinosaurs ...

I can bring the testserver back online around Friday if further testing is needed.



By: Matt Jordan (mjordan) 2012-08-21 17:09:37.452-0500

This is a situation in which there is no good solution.

ASTERISK-19462 pointed out some of the flaws around Asterisk's previous attempts to handle the architecture flags inside its own build system.  The more we attempted to accomadate these settings, the more we would find ourselves in new corner cases where things would break.

The solution that ended up in Asterisk 1.8.14.0 and later chose the safest route for most people, not all.  You're unfortunately one of the 'not all'.  As has been pointed out, you can set the CFLAGS during configure to the appropriate compiler flags for your system in order to have Asterisk build properly.

I'm fine having this documented, but I don't see us putting the previous mess of architecture dependency checking back in.

By: Ward Mundy (wardmundy) 2012-09-30 14:56:45.003-0500

Hi Matt,
The problem is the new design breaks generic (but optimized) compiles in every virtual machine environment without significant tweaking. At least in the PBX in a Flash world, that would rule out using Asterisk VMs for many folks. Was this really a problem with the old design? We haven't had a single reported problem with Asterisk compiles in the last several years, and we compile Asterisk on the fly with every new install. Just my $.02.
--wm

By: Matt Jordan (mjordan) 2012-10-01 08:49:00.517-0500

Hey Ward -

The issue that ended up causing this one (ASTERISK-19462) was another in a long line of issues where Asterisk 'guessed' the architecture incorrectly.  The previous approach, where Asterisk would attempt to infer things - and sometimes get them wrong - created a headache on the maintenance front, as we'd often have to try to figure out if the compilation problem was related to Asterisk's inference problem, or if it was something more deep rooted.

This feels like one of those situations where no matter what we pick, someone is going to be profoundly unhappy.  I'd prefer to pick the solution that solves the problem for the majority of people, but to be frank, I'm not sure which one that would be.  If there was a solution that made the default work on all distros, *and* solved compilation problems on the majority of VMs, *and* allowed passed in flags to allow compilation on that subset of VMs and/or other environments that are just flat out outside of the 'mainstream', I'd be thrilled.  But I'm not sure what that would be, and it feels as if the previous incarnation of 'guessing' wasn't hitting it.

Do you know what VM environments PBX in a flash is typically deployed in?  That would at least start to help us create a 'candidate pool' of things we can target.

Matt

By: Ward Mundy (wardmundy) 2012-10-01 11:42:55.949-0500

Matt, I would say the Top 3 VM platforms for us are probably VMware, Xen, and Proxmox. But people use all sorts of different hardware with a variety of processors from Intel, AMD, et al. Would it make more sense to peel off the old detection code and make it a separate (optional and unsupported-by-Digium) loadable module. Perhaps then some group other than Digium could pick up the maintenance ball going forward without losing all of the good work you guys already have put into this??

By: smast123 (smast) 2012-10-11 05:20:36.551-0500

hey is this bug 'illegal instruction' fixed ????
i m also having the same issue
when i run /usr/sbin/asterisk -cvv
it shows illegal instruction

By: Trent Creekmore (tcreek) 2013-01-09 17:06:26.590-0600

There has been no update on this severe bug. Will we be stuck using up to 1.8.12.2 for years to come in all virtual environments?

By: Matt Jordan (mjordan) 2013-01-09 17:54:38.158-0600

{quote}
There has been no update on this severe bug. Will we be stuck using up to 1.8.12.2 for years to come in all virtual environments?
{quote}

No. You simply have to deselect {{BUILD_NATIVE}} in menuselect and - possibly (although not always) - pass your CPU architecture to make.

By: Trent Creekmore (tcreek) 2013-01-09 17:58:09.438-0600

I have done that and getting unpredictable behavior such as commands in the CLI not responding, indicating there is no such command, and asterisk refusing to start or stop, for example.

By: Matt Jordan (mjordan) 2013-01-09 18:03:35.994-0600

That isn't behavior any one else has reported. So far, people who have run into this issue have been able to build successfully by specifying the correct CPU architecture in {{CFLAGS}} and not building native. This, in effect, is what the patch reverted - except that we no longer attempt to guess the values for you (as we would usually guess wrong).

It sounds like not all modules were compiled in the correct fashion, so that some modules were not loaded.

Either way, as I said previously on this issue, no one has identified a good solution to being able to build Asterisk reliably for virtualized CPUs. If someone would like to propose a good patch to be able to accomplish this, that would be much appreciated.

By: Trent Creekmore (tcreek) 2013-01-09 20:50:08.706-0600

I am curious of how there are many applications, such as webmin, which can tell you the proper CPU for you, but the Asterisk make file fails

By: Tzafrir Cohen (tzafrir) 2013-04-23 07:38:49.364-0500

Extra data point (a message from asterisk-users): the problem still exists with Ubuntu 12.04, which should probably have gcc 4:4.6.3-1ubuntu5.

By: PowerPBX (PowerPBX) 2013-06-16 15:54:37.851-0500

This is what worked for us on CentOS 6 KVM using Asterisk 11.4.0 which also apparently has this problem.  Your mileage may vary.

./configure CFLAGS=-mtune=generic

If you want to be more precise get your cputype:
cat /proc/cpuinfo

Find it in gcc cpu options.
http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html

Then use:
./configure CFLAGS=-march=mycputype

My guess is that Asterisk is using
-march=native
Which is getting it wrong for VM's sometimes.




By: PowerPBX (PowerPBX) 2013-06-16 20:19:32.759-0500

./configure CFLAGS=-mtune=native

Works for me too and is probably the best one overall. It looks like that was added to newer versions of gcc. It shows up in the v4.3.4 documentation.
http://gcc.gnu.org/onlinedocs/gcc-4.3.4/gcc/i386-and-x86_002d64-Options.html

By: PowerPBX (PowerPBX) 2013-06-24 15:26:32.102-0500

I just had this happen again with Asterisk v1.11.4.0 on brand new Intel Xeon hardware using KVM with CentOS 6.  I think perhaps the default compile flag is set too aggressively on recent version of Asterisk.  Please consider changing this default.  I think using -mtune=generic or -mtune=native would be a better choice.  -mtune=native corrected the issue for me again.

By: Tzafrir Cohen (tzafrir) 2013-06-26 13:15:47.576-0500

Anybody has performance numbers regarding -march=native vs. -mtune=native on various environments?

By: PowerPBX (PowerPBX) 2013-06-27 13:14:08.830-0500

I added an improvement request to change this default.
ASTERISK-21967

By: Sebastian Gutierrez (sum) 2013-08-14 09:10:20.007-0500

Just to inform with the latest version of Asterisk 10 and Amazon virtual machine on Ubuntu 12.04 same problem

By: Private Name (falves11) 2013-12-02 04:38:24.305-0600

I am affected by this bug
searching I found the issue, and it is described here
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52411
"the asterisk makefile detecting and using incorrect compilation flags

ifeq ($(OSARCH),linux-gnu)
 ifeq ($(PROC),x86_64)
   # You must have GCC 3.4 to use k8, otherwise use athlon
#    PROC=k8
PROC=nocona
   #PROC=athlon
 endif
"

By: Sean Bright (seanbright) 2021-08-10 13:38:00.285-0500

Is anyone able to reproduce this on a supported version of Asterisk (16+ at the time of this message)?