[Home]

Summary:ASTERISK-28776: Non async-signal-safe syscalls used after fork before exec
Reporter:nappsoft (nappsoft)Labels:patch
Date Opened:2020-03-10 08:04:42Date Closed:2020-05-18 23:53:07
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Applications/app_system
Versions:16.8.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:musl libc based linuxAttachments:( 0) asterisk_asyncsafe2.patch
Description:When using System or TrySystem in a dialplan a Channel sometimes hangs forever and a forked process can be seen on the system which can only be killed with kill -9

While I could already observe this in the past very very very rarely it is quite easy to reproduce with musl 1.2.0 and the next-gen malloc implementation.

When describing this behavior on the musl mailinglist the musl devs pointed out, that this is expected behavior as obviously asterisk uses some non async-signal-safe syscalls after fork() before exec().

While there seem to be more tolerant libc implementations it doesn't change the fact that asterisk doesn't behave Posix conform:

Rich Felker cited the following:

See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html

   - A process shall be created with a single thread. If a
     multi-threaded process calls fork(), the new process shall
     contain a replica of the calling thread and its entire address
     space, possibly including the states of mutexes and other
     resources. Consequently, to avoid errors, the child process may
     only execute async-signal-safe operations until such time as one
     of the exec functions is called.

Indeed I was able to work around the issue by slightly changing the asterisk sources. While the attached patch fixes the issue and we can happily work with it (the mentioned issues with the patch do not matter at all in our environment), it is far from being production ready. It is only meant to give you a hint for which parts of the code should be revised!

The problems with the attached patch:

- it is slower (the closefrom() implementation is by far faster than the dump loop I've used to replace it => with max open files set to 22000 I meassured 7ms to complete the fd closing compared to 70ns with the closefrom() implementation)
- it is less secure (as the cap_set_proc code has been removed after fork)
- some logging needed to be removed (as I assumed that ast_log and ast_verb involve async-signal-unsafe operations)
Comments:By: Asterisk Team (asteriskteam) 2020-03-10 08:04:43.813-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: nappsoft (nappsoft) 2020-03-11 03:46:19.927-0500

Improved patch

By: nappsoft (nappsoft) 2020-03-11 03:51:02.316-0500

I've just uploaded an improved patch which works 10 times faster (with 22000 fds) and which doesn't remove the cap calls (it uses them in a safe way instead, making sure that no memory allocation needs to be done in the child process).

However: this is still not a final solution but simply sort of a POC (as the modifications should better be done in ast_close_fds_above_n directly instead of replacing this function at the calling place in the code).

By: nappsoft (nappsoft) 2020-03-11 07:15:17.131-0500

Uploaded a third version of the patch which (at least on linux) might be good to go...

=> the closefrom() implementation has been changed instead of doing the operation in place, this way other asterisk apps profit as well from the changes and not only the System/TrySystem dialplan application
=> the cap_set_proc logic has been changed in the second place where it is used in a child proc as well

By: Joshua C. Colp (jcolp) 2020-03-11 08:33:54.703-0500

Do you plan on putting this up for review?

By: Friendly Automation (friendly-automation) 2020-05-08 09:26:23.588-0500

Change 14391 merged by Friendly Automation:
app.c: make sure that no non-async-signal-safe syscalls are used after fork before exec

[https://gerrit.asterisk.org/c/asterisk/+/14391|https://gerrit.asterisk.org/c/asterisk/+/14391]

By: Friendly Automation (friendly-automation) 2020-05-08 09:35:40.563-0500

Change 14392 merged by Friendly Automation:
app.c: make sure that no non-async-signal-safe syscalls are used after fork before exec

[https://gerrit.asterisk.org/c/asterisk/+/14392|https://gerrit.asterisk.org/c/asterisk/+/14392]

By: Friendly Automation (friendly-automation) 2020-05-08 13:45:15.994-0500

Change 14209 merged by George Joseph:
app.c: make sure that no non-async-signal-safe syscalls are used after fork before exec

[https://gerrit.asterisk.org/c/asterisk/+/14209|https://gerrit.asterisk.org/c/asterisk/+/14209]

By: Friendly Automation (friendly-automation) 2020-05-08 13:45:40.586-0500

Change 14393 merged by George Joseph:
app.c: make sure that no non-async-signal-safe syscalls are used after fork before exec

[https://gerrit.asterisk.org/c/asterisk/+/14393|https://gerrit.asterisk.org/c/asterisk/+/14393]