[Home]

Summary:ASTERISK-24833: [patch] audit of startup order reveals logger concerns
Reporter:Corey Farrell (coreyfarrell)Labels:
Date Opened:2015-02-26 11:03:25.000-0600Date Closed:2016-09-23 14:23:10
Priority:MinorRegression?
Status:Closed/CompleteComponents:Core/Logging
Versions:SVN 11.16.0 13.2.0 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) ast_log-early-init.patch
( 1) callid-int-r2.patch
( 2) callid-int-r3.patch
Description:I've been auditing startup order, and found some things about the logger that concern me.

ast_log / ast_verbose do not protect against recursion:
* __ast_str_helper can call ast_verbose during an allocation failure.
* ao2 functions (for callid).  If an existing callid is unref'ed too many times, the object would be freed with threadstorage still pointing to it.  This could cause an infinite loop where ast_log tries to retrieve a callid, causing ao2 to report an error with ast_log.

The probability of these recursive loops is very low, but worth considering.  A thread-local variable could allow ast_log/ast_verbose to ignore recursive calls and calls from the logger thread.

In theory trunk could stop using ast_str and ao2.  Switching from ast_str would be a pain but straight forward and self contained.  Removal of ao2 from the logging process would involve API/ABI changes.  callid doesn't actually need to be an AO2 object.  It could just be an int.  This would require no ref-counting as the memory wouldn't be allocated.  Thread storage would get an allocation to stop a copy of the int, but that allocation would take care of itself.


Initialization order:
Everything uses the logger.  Unfortunately the 'exec_after_rotate' option creates a dummy channel to do variable substitution.  This pushes logging pretty far down the initialization list.  Many start-up messages are lost.  If the logger was initialized earlier it would require a way to block rotation until after it's safe to create a dummy channel for variable substitution.

I'm not sure if it's reasonable to avoid using channels (trunk only).  This would involve having exec_after_rotation specifically replace $\{filename}.  We would also need to document that functions, expressions, and other variables are no longer supported.  Would be nice if the new procedure produced an error if unsupported expressions are detected.
Comments:By: Matt Jordan (mjordan) 2015-02-26 12:09:50.713-0600

{quote}
The probability of these recursive loops is very low, but worth considering. A thread-local variable could allow ast_log/ast_verbose to ignore recursive calls and calls from the logger thread.
{quote}

I would say that any improvement that gets made should be subject to performance issues as well. Checking a threadlocal variable on every logging statement could get expensive.

{quote}
In theory trunk could stop using ast_str and ao2. Switching from ast_str would be a pain but straight forward and self contained. Removal of ao2 from the logging process would involve API/ABI changes. callid doesn't actually need to be an AO2 object. It could just be an int. This would require no ref-counting as the memory wouldn't be allocated. Thread storage would get an allocation to stop a copy of the int, but that allocation would take care of itself.
{quote}

I wouldn't mind seeing call-id become an atomic integer. It'd probably be a lot faster than the current implementation.

{quote}
Initialization order:
Everything uses the logger. Unfortunately the 'exec_after_rotate' option creates a dummy channel to do variable substitution. This pushes logging pretty far down the initialization list. Many start-up messages are lost. If the logger was initialized earlier it would require a way to block rotation until after it's safe to create a dummy channel for variable substitution.
{quote}

There are some *extremely* difficult subtle traps with changing the initialization of the logger. Consider that Asterisk may attempt to use the logger before it has forked itself, at which point any thread primitive that has been initialized is invalid in the forked process - and is liable to cause extremely difficult to debug problems. I would highly suggest not mucking around with logger initialization unless you have a very definitive plan in mind :-)

{quote}
I'm not sure if it's reasonable to avoid using channels (trunk only). This would involve having exec_after_rotation specifically replace ${filename}. We would also need to document that functions, expressions, and other variables are no longer supported. Would be nice if the new procedure produced an error if unsupported expressions are detected.
{quote}

I'm not sure there's a way to implement the {{exec_after_rotate}} functionality (with variable substitution, which appears to be a lot of the usefulness of it) without using a dummy channel. Consider that without variable substitution, an external script could do the work just as well. Removing it feels like a backwards break with existing systems that we probably shouldn't do unless we are sure that functionality isn't needed.

By: Corey Farrell (coreyfarrell) 2015-02-26 13:55:10.871-0600

Though I've flagged this as an issue for 11 and 13, most of I've proposed is inappropriate for already released branches.  Most of this discussion is ideas of what can be done in trunk.  I'm OK with saying "it is what it is" about 11 and 13.

{quote}
I would say that any improvement that gets made should be subject to performance issues as well. Checking a threadlocal variable on every logging statement could get expensive.
{quote}
This leads me to believe that replacing code that uses 'struct ast_str' with standard C string routines and removing ao2 from callid's would be the better fix.

{quote}
There are some extremely difficult subtle traps with changing the initialization of the logger. Consider that Asterisk may attempt to use the logger before it has forked itself, at which point any thread primitive that has been initialized is invalid in the forked process - and is liable to cause extremely difficult to debug problems. I would highly suggest not mucking around with logger initialization unless you have a very definitive plan in mind
{quote}
I was not suggesting that logger init should happen before fork, but it should happen as soon as possible after fork.  There are some other inits that have to happen first (in case we have a DEBUG compile option on).  The idea is to minimize the stuff we do before initializing the logger, that way we minimize the number of messages that are lost.

{quote}
I'm not sure there's a way to implement the exec_after_rotate functionality (with variable substitution, which appears to be a lot of the usefulness of it) without using a dummy channel. Consider that without variable substitution, an external script could do the work just as well. Removing it feels like a backwards break with existing systems that we probably shouldn't do unless we are sure that functionality isn't needed.
{quote}
It might be best to defer {{rotate_file}} and logger reloads until after fully booted.  If it's called too early save to a list of files to rotate after fully booted.  Rotation would be done after all modules are loaded, since for example {{pbx_config}} could initialize a global variable for use by {{exec_after_rotate}}.  It could be appropriate to look at this for 11 and 13.

By: Corey Farrell (coreyfarrell) 2015-02-26 19:43:37.215-0600

Attached is the first draft of callid to int conversion.
* I cannot build chan_dahdi on my dev system, so it might not even compile.
* I built what I can, did searches for {{struct ast_callid}}, {{ast_callid_ref}} and {{ast_callid_unref}}.
* I ran a single test with {{channel originate Local/s@demo application Echo}}, logging looked correct.

For now I just want to post this so no one else spends time doing the same work.  Needs further testing before it can hit reviewboard.

By: Rusty Newton (rnewton) 2015-03-03 18:38:41.532-0600

I'm going to open this, even if it is exploratory or experimental work. You can change the title if needed to reflect that better.

By: Corey Farrell (coreyfarrell) 2015-03-09 12:46:05.870-0500

First patch missed chan_sip dialplan function built by TEST_FRAMEWORK.

By: Corey Farrell (coreyfarrell) 2015-03-09 13:25:20.380-0500

r3 patch includes fixes to chan_dahdi in a couple places where callid's were initialized to NULL instead of 0.

By: Corey Farrell (coreyfarrell) 2015-03-12 20:15:28.036-0500

Committed callid conversion to trunk.

By: Corey Farrell (coreyfarrell) 2015-03-15 21:29:34.133-0500

Further work on logger recursion avoidance to be posted against ASTERISK-24155.  The remaining issue brought up for this ticket is startup order.

I'm working on a patch that will add an 'int early' parameter to init_logger.  Early init would abort if exec_on_rotate is set.  This way the logger can start first thing on many systems and all testsuite runs.  The one concern is situations where logger.conf is changed to include exec_on_rotate after early logger init.  CLI commands can be delayed so they can't be used to reload, but SIGXFSZ can happen at any time.  It may be possible to defer responding to SIGXFSZ until the final init of logger.  If the SIGXFSZ was associated with a log file write error, that log channel would remain disabled at least until final init.

By: Corey Farrell (coreyfarrell) 2015-06-02 20:35:55.431-0500

This is what I've come up with for a patch against master to initialize logging as early as it can.  Early initialization is canceled (default console logging enabled) if exec_after_rotate is used.  A new option post_rotate_exec is created which should use bash shell variable replacement for $\{filename\}.

[~mjordan]: I have not yet has a chance to test this well (been busy with another change), but I wanted to post it here see if you think it's a reasonable plan.  Should we go with a better variable name for post_rotate_exec?  Maybe AST_LOG_FILENAME or something?