[Home]

Summary:ASTERISK-25837: file: Blocking when using FIFO
Reporter:Jean Aunis - Prescom (PrescomJA)Labels:
Date Opened:2016-03-09 11:19:42.000-0600Date Closed:
Priority:MajorRegression?
Status:Open/NewComponents:Applications/app_mixmonitor
Versions:13.7.2 Frequency of
Occurrence
One Time
Related
Issues:
Environment:OS : RedHat 7.2 Kernel: 3.10.0 Asterisk: 13.7.2Attachments:( 0) backtrace-threads.txt
( 1) backtrace-threads-2016-03-16.txt
( 2) core-show-locks-2016-03-16.txt
( 3) core-show-taskprocessors.txt
( 4) core-show-taskprocessors-2016-03-16.txt
( 5) core-show-threads.txt
( 6) core-show-threads-2016-03-16.txt
( 7) debug_log_25837.txt
( 8) full-20160310
Description:We encountered an Asterisk freeze recently. The machine was not especially loaded, less than 10 calls where going on.
It seems that two channels where stuck in the application StopMixMonitor.
Please find attach the results of "core show threads", "core show taskprocessors" and a backtrace.

h3. Reproduction

{code:title=extensions.conf|borderStyle=solid}
[default]
exten => 601,1,System(mkfifo /var/spool/asterisk/monitor/${UNIQUEID}-in.alaw)
same  => n,System(mkfifo /var/spool/asterisk/monitor/${UNIQUEID}-out.alaw)
same  => n,MixMonitor(/dev/null,bt(${UNIQUEID}-out.alaw)r(${UNIQUEID}-in.alaw))
same  => n,Set(DB(test/monitoredChannel)=${CHANNEL})
same  => n,Dial(SIP/601)

exten => 602,1,System(mkfifo /var/spool/asterisk/monitor/${UNIQUEID}-in.alaw)
same  => n,System(mkfifo /var/spool/asterisk/monitor/${UNIQUEID}-out.alaw)
same  => n,MixMonitor(/dev/null,bt(${UNIQUEID}-out.alaw)r(${UNIQUEID}-in.alaw))
same  => n,Dial(SIP/602,,F)
same  => n,StopMixMonitor()
same  => n,AGI(stopMixMonitorOnOtherChannel.php,${DB(test/monitoredChannel)})
{code}

The AGI script stopMixMonitorOnOtherChannel.php connects to the AMI and calls the command StopMixMonitor on the provided channel.

Then, here are the actions to perform to freeze :
- 600 dials 601
- 601 unhooks the call
- on a seconds line, 601 dials 602
- 602 unhooks the call
- 601 hangups the call with 602

Here asterisk freezes.
Comments:By: Asterisk Team (asteriskteam) 2016-03-09 11:19:43.410-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Joshua C. Colp (jcolp) 2016-03-10 07:15:52.568-0600

You have a lot of threads blocked on trying to create the recording file on the filesystem. What is your underlying filesystem and is it having any problems?

By: Jean Aunis - Prescom (PrescomJA) 2016-03-10 07:32:10.878-0600

The filesystem if of type XFS, we did not encounter any problem with it.
But I forgot to point out that we are not recording the calls to regular files : MixMonitor writes to a Unix FIFO (previously created with mkfifo), and the FIFO is immediately read by another channel with Playback.

By: Rusty Newton (rnewton) 2016-03-13 14:24:38.722-0500

[~PrescomJA] did you have any logs running when the issue occurred? Can you attach them if so?

Otherwise if the issue happens again soon can you turn up logging according to this guide so that we'll have more information:

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Jean Aunis - Prescom (PrescomJA) 2016-03-14 02:59:44.376-0500

Please find attached the full log. Unfortunately debug was not enabled.
The freeze occurred at 15:18:24.

By: Jean Aunis - Prescom (PrescomJA) 2016-03-16 04:27:58.517-0500

We just reproduced the issue. This time we activated debug logs and recompiled asterisk with debug flags. Please find attached :
- the debug log (the freeze happens at 09:47:54)
- the output of "core show locks"
- the output of "core show threads"
- the output of "core show taskprocessors"
- a backtrace


By: Rusty Newton (rnewton) 2016-03-22 08:39:14.626-0500

bq. We just reproduced the issue.

Did the issue simply happen again? Or did you find a way to reliably reproduce it?

If the case falls into the latter , would you share instructions on how to reproduce? Thanks!



By: Jean Aunis - Prescom (PrescomJA) 2016-03-22 12:04:51.085-0500

Yes, we reproduce it systematically. Here is a piece of dialplan to reproduce it :

{code:title=extensions.conf|borderStyle=solid}
[default]
exten => 601,1,System(mkfifo /var/spool/asterisk/monitor/${UNIQUEID}-in.alaw)
same  => n,System(mkfifo /var/spool/asterisk/monitor/${UNIQUEID}-out.alaw)
same  => n,MixMonitor(/dev/null,bt(${UNIQUEID}-out.alaw)r(${UNIQUEID}-in.alaw))
same  => n,Set(DB(test/monitoredChannel)=${CHANNEL})
same  => n,Dial(SIP/601)

exten => 602,1,System(mkfifo /var/spool/asterisk/monitor/${UNIQUEID}-in.alaw)
same  => n,System(mkfifo /var/spool/asterisk/monitor/${UNIQUEID}-out.alaw)
same  => n,MixMonitor(/dev/null,bt(${UNIQUEID}-out.alaw)r(${UNIQUEID}-in.alaw))
same  => n,Dial(SIP/602,,F)
same  => n,StopMixMonitor()
same  => n,AGI(stopMixMonitorOnOtherChannel.php,${DB(test/monitoredChannel)})
{code}

The AGI script stopMixMonitorOnOtherChannel.php connects to the AMI and calls the command StopMixMonitor on the provided channel.

Then, here are the actions to perform to freeze :
- 600 dials 601
- 601 unhooks the call
- on a seconds line, 601 dials 602
- 602 unhooks the call
- 601 hangups the call with 602

Here asterisk freezes.

By: Sean Bright (seanbright) 2017-06-27 10:30:09.938-0500

[~PrescomJA], is there a specific reason you are passing {{/dev/null}} instead of just passing the empty string, i.e.

{noformat}
same  => n,MixMonitor(,bt(${UNIQUEID}-out.alaw)r(${UNIQUEID}-in.alaw))
{noformat}

By: Jean Aunis - Prescom (PrescomJA) 2017-06-28 01:40:22.288-0500

[~seanbright] No, I just didn't think to this possibility.