[Home]

Summary:ASTERISK-28470: Mutex deadlock in audio_audiohook_write_list
Reporter:Andre Heber (a_heber)Labels:
Date Opened:2019-07-03 07:43:26Date Closed:2019-07-08 13:15:09
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:16.3.0 Frequency of
Occurrence
One Time
Related
Issues:
Environment:CentOS 6.10Attachments:( 0) info_threads.png
( 1) mutex_1.png
( 2) mutex_2.png
( 3) stack_trace_1.png
( 4) stack_trace_2.png
Description:In main/audiohook.c in the function "audio_audiohook_write_list" is the following code:
{code:java}
   ast_audiohook_lock(audiohook);
if (audiohook->status != AST_AUDIOHOOK_STATUS_RUNNING) {
AST_LIST_REMOVE_CURRENT(list);
removed = 1;
ast_audiohook_update_status(audiohook, AST_AUDIOHOOK_STATUS_DONE);
ast_audiohook_unlock(audiohook);
{code}

But "ast_audiohook_update_status" also locks "audiohook" "if (audiohook->status != AST_AUDIOHOOK_STATUS_RUNNING)", which results in a frozen thread.

This happens 3x in "audio_audiohook_write_list" and 1x in "dtmf_audiohook_write_list".
Comments:By: Asterisk Team (asteriskteam) 2019-07-03 07:43:27.650-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Kevin Harwell (kharwell) 2019-07-03 10:41:10.496-0500

There shouldn't be a problem here. From what I can tell the lock is reentrant. Meaning the same thread can acquire the lock multiple times.

How have you become aware you have a deadlock? Please provide a backtrace from the system when the problem manifests. You can use the ast_coredumper script to get a running backtrace (note however getting a running backtrace can interrupt calls). More info here:

https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Andre Heber (a_heber) 2019-07-04 05:15:06.831-0500

Our problem:
Asterisk sometimes didn't accept new calls (INVITES). Through logging, we saw that the do_monitor thread of chan_sip is locked.
The endless for-loop in do_monitor should completed every 1 second, so we implemented, that the loop makes a timestamp at the end. So, the timestamp should not be older than a second.
And we created a parallel thread, which checked every second the timestamp and send a SIGABRT to the do_monitor thread, if it hangs for 3.9 seconds or longer.
So, asterisk crashes and we have a core dump.

Here is my analysis with gdb:
[stack_trace_1]
[mutex_1]

Through the owner, we see the thread, which locked the mutex (thread LWP 39260).

[info_threads]
[stack_trace_2]
[mutex_2]

The __kind of the audiohook->lock mutex is 0, which means, it is non-recursive and the thread locks itself.

But we use an old asterisk (version 13.1.2) and an update is planned. I saw the same code in audio_audiohook_write_list in the actual version, so I decided to create an issue.
But as you stated, this should be an recursive mutex, created in ast_audiohook_init through ast_mutex_init.

So, maybe you know that this is fixed with the actual asterisk version 16.3.0 or have an idea what is going on.

Otherwise, I think we can close this issue.


By: Kevin Harwell (kharwell) 2019-07-08 13:15:09.548-0500

Yes, it's quite possible a newer version could resolve your issue. I'm closing this issue for now, but if you experience the same problem on the latest version feel free to respond and re-open this issue.

Also, it sounds like you are running a modified (custom patched) version of Asterisk. If this is the case you'll also have to isolate the problem without any custom patches applied or we will be unable to look into it.