[Home]

Summary:ASTERISK-27531: Compiler optimizations can break module load sequence.
Reporter:abelbeck (abelbeck)Labels:patch
Date Opened:2017-12-23 16:45:58.000-0600Date Closed:2017-12-27 19:36:51.000-0600
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Core/General
Versions:13.19.0 15.2.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:AstLinux 1.3.x, Linux 3.16.51, cross-compiledAttachments:( 0) asterisk-13-attr-opt-O0-loader.patch
( 1) asterisk-13-loader.patch
( 2) asterisk-13-x86_64-config.log
( 3) ASTERISK-27531.patch
( 4) without-usleep-debug_log_27531.txt
( 5) with-usleep-debug_log_27531.txt
Description:No modules load, all module logs of the form ...
{noformat}
loader.c:583 load_dlopen: Module 'res_musiconhold.so' did not register itself during load
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2017-12-23 16:45:59.241-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: abelbeck (abelbeck) 2017-12-23 16:51:00.791-0600

Tested patch that fixes the issue.

Some small amount of time is needed after calling dlopen() before resource_being_loaded is guaranteed to change.

The value "1" for usleep is adequate for a tested 2 GHz 4-core box.

By: Corey Farrell (coreyfarrell) 2017-12-23 17:22:17.100-0600

This is odd since ast_module_register should run from the same thread as dlopen, so it should be impossible for dlopen to return before ast_module_register.  We will need debug logs to troubleshoot the cause of your issue.  In addition:
* Please run {{core show settings}} from Asterisk CLI and tell us what your "Build Options:" say.
* Please attach the config.log produced when you ran ./configure on the Asterisk source.

----

We require additional debug to continue with triage of your issue. Please follow the instructions on the wiki \[1\] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

\[1\] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: abelbeck (abelbeck) 2017-12-23 17:48:05.842-0600

...run {{core show settings}} from Asterisk CLI and tell us what your "Build Options:" say.
{noformat}
PBX Core settings
-----------------
 Version:                     13.19.0-rc1
 Build Options:               OPTIONAL_API
{noformat}

After my current test build is finished, I'll post the config.log .

By: abelbeck (abelbeck) 2017-12-23 19:29:51.665-0600

Attachment: Build system generated config.log as asterisk-13-x86_64-config.log

By: Corey Farrell (coreyfarrell) 2017-12-23 19:35:57.540-0600

Please collect a debug log.

By: abelbeck (abelbeck) 2017-12-23 19:40:02.854-0600

This issue was discovered on a x86_64 system.

For an additional data point, I built an image using an i586 toolchain for a 1-core 500 Mhz AMD Geode LX (Soekris net5501).  The exact same results ... without the {{usleep(1)}} no modules loaded, but with the {{usleep(1)}} the modules loaded just fine.

Corey, if ast_module_register is indeed running in the same thread as dlopen, could this be a compiler optimization that is causing the problem ?

By: Corey Farrell (coreyfarrell) 2017-12-23 19:56:17.130-0600

Could be compiler optimizations, you can see if DONT_OPTIMIZE (without the usleep) works around it but that will have a negative impact on performance.

*We still need debug logs.*  I won't be responding further until they're posted.

By: abelbeck (abelbeck) 2017-12-23 20:22:25.874-0600

Attachment: without-usleep-debug_log_27531.txt (fails)
Attachment: with-usleep-debug_log_27531.txt (succeeds)

By: Corey Farrell (coreyfarrell) 2017-12-23 21:53:51.620-0600

I've acknowledged this issue, I still need more help to determine the best solution.  I don't think usleep is the solution.  I suspect usleep is just causing the compiler to disable an incorrect optimization.  A couple more things:

* Does setting {{DONT_OPTIMIZE}} work around the issue?
* What do I need to setup a build VM?  Can this issue be reproduced without cross-compiling?

By: abelbeck (abelbeck) 2017-12-23 22:55:31.210-0600

Attachment: New patch to disable optimizations for function load_dlopen()

Tested and it solves this issue.

By: Corey Farrell (coreyfarrell) 2017-12-23 22:56:32.395-0600

[~abelbeck]: Could you try [^ASTERISK-27531.patch]?  Please use your original build settings and do not add the usleep.

By: abelbeck (abelbeck) 2017-12-23 22:59:03.440-0600

I tested by building with {{--enable DONT_OPTIMIZE}} and the problem goes away, but not a solution.

Alternatively, I tested with the attached {{asterisk-13-attr-opt-O0-loader.patch}} patch, which solves the problem.

By: abelbeck (abelbeck) 2017-12-23 23:02:09.556-0600

I first tried a similar patch as [^ASTERISK-27531.patch], but the compiler complained {{volatile}} does not work with pointers, but I will try your exact patch to be sure, stay tuned ...

By: abelbeck (abelbeck) 2017-12-23 23:16:21.230-0600

Corey, your [^ASTERISK-27531.patch] also solves the problem.  (My test had the {{volatile}} left of the {{*}} and did not work)

Your patch looks like a solution to me.

Thanks !


By: Corey Farrell (coreyfarrell) 2017-12-23 23:19:47.172-0600

I'll get this submitted for review, thanks for testing.

By: Friendly Automation (friendly-automation) 2017-12-27 19:36:52.348-0600

Change 7733 merged by Jenkins2:
loader: Add volatile to resource_being_loaded.

[https://gerrit.asterisk.org/7733|https://gerrit.asterisk.org/7733]

By: Friendly Automation (friendly-automation) 2017-12-27 19:44:33.797-0600

Change 7734 merged by Jenkins2:
loader: Add volatile to resource_being_loaded.

[https://gerrit.asterisk.org/7734|https://gerrit.asterisk.org/7734]

By: Friendly Automation (friendly-automation) 2017-12-27 19:46:50.601-0600

Change 7735 merged by Jenkins2:
loader: Add volatile to resource_being_loaded.

[https://gerrit.asterisk.org/7735|https://gerrit.asterisk.org/7735]

By: Friendly Automation (friendly-automation) 2017-12-27 21:09:41.416-0600

Change 7736 merged by Jenkins2:
loader: Add volatile to resource_being_loaded.

[https://gerrit.asterisk.org/7736|https://gerrit.asterisk.org/7736]

By: Friendly Automation (friendly-automation) 2017-12-27 23:01:00.438-0600

Change 7737 merged by Richard Mudgett:
loader: Add volatile to resource_being_loaded.

[https://gerrit.asterisk.org/7737|https://gerrit.asterisk.org/7737]