[Home]

Summary:ASTERISK-19497: ConfBridge recording does not work reliably
Reporter:Ben Klang (bklang)Labels:
Date Opened:2012-03-07 12:05:23.000-0600Date Closed:2012-05-16 14:30:12
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Applications/app_confbridge
Versions:SVN 10.2.0 Frequency of
Occurrence
Constant
Related
Issues:
is related toASTERISK-19426 Mixmonitor does not create file and record anything
is related toASTERISK-19727 MixMonitor does not work on local channels
is related toASTERISK-19471 ConfBridge does not record anything
Environment:OpenSolaris 2009.06 i386Attachments:( 0) 32-bit-time-overflow-10-2012-04-26.diff
( 1) ASTERISK-19497-2012-04-26.diff
( 2) audiohook_debug.diff
( 3) confbridge-testing-1331152688.wav
( 4) extensions.conf
( 5) patched-tincan-debug.log
( 6) solaris-confbridge-record-debug.log
( 7) tincan-debug.log
Description:When activating ConfBridge recording on Solaris I consistently end up with files that are only 44 bytes long.  This appears to be only the file header and not any audio contents.  I tested the same version of Asterisk on OSX and I was able to get a good ConfBridge recording.

Since ConfBridge appears to use MixMonitor I tried doing a MixMonitor directly on a channel that was later joined to a conference.  This resulted in a good recording on Solaris.
Comments:By: Matt Jordan (mjordan) 2012-03-07 13:18:42.144-0600

We require a complete debug log to help triage the issue. This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Ben Klang (bklang) 2012-03-07 14:42:33.454-0600

Example broken recording file

By: Ben Klang (bklang) 2012-03-07 14:42:48.707-0600

Example extensions.conf used in the test

By: Ben Klang (bklang) 2012-03-07 14:43:00.411-0600

debug logging

By: Ben Klang (bklang) 2012-03-07 14:43:29.699-0600

Requested logging attached.

By: Ben Klang (bklang) 2012-03-07 14:45:01.352-0600

For the latest test I tried on a newer version of Solaris, just to be safe.  It exhibited the exact same behavior.  The logs are for SVN-branch-10-r358531M running on OpenIndiana oi_151a2 (the latest release).

By: Jonathan White (londonnet) 2012-04-09 20:00:59.689-0500

I have the exact same issue on CentOS 5.8 Just ggetting a 44k file everytime. Codec related? I'm only running G.711A

By: Ben Klang (bklang) 2012-04-12 13:05:42.047-0500

I am now seeing this issue on Linux as well.  It seems intermittent though.  I had several recordings work fine and then I had many in a row today that were exactly 44 bytes.

Linux tincan.mojolingo.com 3.2.0-0.bpo.1-amd64 #1 SMP Sat Feb 11 08:41:32 UTC 2012 x86_64 GNU/Linux
Debian Squeeze
Asterisk SVN-branch-10-r358644

I realize this is slightly old (the build is dated the same time I originally opened this ticket), so I will update to the latest 10 branch and try again.

By: Ben Klang (bklang) 2012-04-12 13:20:23.777-0500

Confirmed on SVN-branch-10-r361981

Here's an example with AGI debugging turned on.  Note the size of the file at the end of the paste:

AGI Debugging Enabled
[Apr 12 14:18:05]   == Using SIP RTP CoS mark 5
[Apr 12 14:18:05]     -- Executing [678235####@speakwrite:1] Goto("SIP/grant-00000000", "hsw-interviews,s,1") in new stack
[Apr 12 14:18:05]     -- Goto (hsw-interviews,s,1)
[Apr 12 14:18:05]     -- Executing [s@hsw-interviews:1] AGI("SIP/grant-00000000", "agi://localhost:1055/hsw") in new stack
[Apr 12 14:18:05] AGI Tx >> agi_network: yes
[Apr 12 14:18:05] AGI Tx >> agi_network_script: hsw
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_request: agi://localhost:1055/hsw
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_channel: SIP/grant-00000000
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_language: en
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_type: SIP
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_uniqueid: 1334254685.0
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_version: SVN-branch-10-r361981
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_callerid: 4044754841
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_calleridname: Ben Desk
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_callingpres: 0
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_callingani2: 0
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_callington: 0
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_callingtns: 0
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_dnid: 678235####
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_rdnis: unknown
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_context: hsw-interviews
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_extension: s
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_priority: 1
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_enhanced: 0.0
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_accountcode:
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> agi_threadid: -179913872
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >>
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Rx << ANSWER
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> 200 result=0
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Rx << GET VARIABLE "UNIQUEID"
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> 200 result=1 (1334254685.0)
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Rx << SET VARIABLE "CONFBRIDGE(bridge,record_file)" "/tmp/hsw-1334254685.0.wav"
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> 200 result=1
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Rx << SET VARIABLE "CONFBRIDGE(bridge,record_conference)" "yes"
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Tx >> 200 result=1
[Apr 12 14:18:05] <SIP/grant-00000000>AGI Rx << EXEC ConfBridge "hsw"
[Apr 12 14:18:05]     -- AGI Script Executing Application: (ConfBridge) Options: (hsw)
[Apr 12 14:18:05]     -- <SIP/grant-00000000> Playing 'conf-onlyperson.gsm' (language 'en')
[Apr 12 14:18:09]   == Begin MixMonitor Recording ConfBridgeRecorder/conf-hsw-uid-597186215
[Apr 12 14:18:09]     -- <SIP/grant-00000000> Playing 'confbridge-join.gsm' (language 'en')
[Apr 12 14:18:09]     -- <Bridge/0xf57eb100-input> Playing 'confbridge-join.gsm' (language 'en')
[Apr 12 14:18:16]     -- <Bridge/0xf57eb100-input> Playing 'confbridge-leave.gsm' (language 'en')
[Apr 12 14:18:17]   == MixMonitor close filestream (mixed)
[Apr 12 14:18:17]   == End MixMonitor Recording ConfBridgeRecorder/conf-hsw-uid-597186215
[Apr 12 14:18:17] <SIP/grant-00000000>AGI Tx >> 200 result=0
[Apr 12 14:18:17] <SIP/grant-00000000>AGI Tx >> HANGUP
[Apr 12 14:18:18] <SIP/grant-00000000>AGI Rx << HANGUP
[Apr 12 14:18:18] <SIP/grant-00000000>AGI Tx >> 511 Command Not Permitted on a dead channel
[Apr 12 14:18:18]     -- <SIP/grant-00000000>AGI Script agi://localhost:1055/hsw completed, returning 4
[Apr 12 14:18:18] <SIP/grant-00000000>AGI Tx >> HANGUP
[Apr 12 14:18:18] ERROR[4204]: utils.c:1170 ast_carefulwrite: write() returned error: Broken pipe
[Apr 12 14:18:18]   == Spawn extension (hsw-interviews, s, 1) exited non-zero on 'SIP/grant-00000000'
tincan*CLI> !ls -lh /tmp/hsw-1334254685.0.wav
-rw-r--r-- 1 speakwrite speakwrite 44 Apr 12 14:18 /tmp/hsw-1334254685.0.wav

By: Ben Klang (bklang) 2012-04-13 08:31:11.481-0500

A few additional observations:

* I tried changing the recording format to sln, just to rule out any issues with the .wav format specifically.  This results in 0-byte files (I suppose since there is no .sln header).

* I also tried adding a MixMonitor on the channel directly before it joins the conference.  This resulted in a good recording on each channel that joins the conference, but still an empty file for the conference recording itself.

What bothers me most is that this system has worked several times in the past.  Now, after probably 40 tries in two days, I have not made a single good recording using the ConfBridge recording option.

By: Matt Jordan (mjordan) 2012-04-23 17:25:44.453-0500

Ben:

I've been looking into this issue and haven't been able to reproduce it as of yet, but have a few suspicions as to what's going on.

I've attached a patch against the current revision of Asterisk 10 (r363310) that adds a bunch more logging into the audiohooks. If you can run the same test with this patch applied and attach the log here, I'd appreciate it.

Thanks

Matt


By: Ben Klang (bklang) 2012-04-23 21:53:57.542-0500

I did an svn up (now running 363212) and applied the suggested patch.  The problem persists.  I will attach the requested debug logging.

By: Ben Klang (bklang) 2012-04-23 21:54:16.931-0500

Debug log with patch

By: Ben Klang (bklang) 2012-04-24 07:10:27.054-0500

I should probably explain the debug log in case it is not clear.  There are two MixMonitor recordings going on:

1) The first recording is labeled with "-backup" and is MixMonitor executed directly on a channel.  I started doing this as a backup when ConfBridge recording was failing me so I'd at least have one recording.  This is, and seems to have always been, working.

2) The second recording is the one from ConfBridge itself that is triggered by channel variables.  This recording is now requested in slin format instead of .wav, but the result is the same as the original bug report: an empty file (0 bytes in the case of slin).

By: Matt Jordan (mjordan) 2012-04-26 09:24:37.323-0500

Ben:

I have a few suspicions as to what is going on here, and I've attached a patch (ASTERISK-19497-2012-04-26) that should work around the problem.  I don't think its a long term solution, but it should confirm if what I think the problem is actually is the problem.  It also tweaks up the debugging a bit around where I think the problem is occurring.  

If you can test again with this patch, that'd be great.  (If you can remove the other MixMonitors on the joining channels that would clean up some of the logging, but if not, that's okay too - I've put a few extra statements in there to clarify just which audiohook is dumping what debug statement)

As an aside, it appears as if this issue was found on a 32-bit machine.  Have you experienced this problem on any 64-bit machines?

Thanks

Matt

By: Ben Klang (bklang) 2012-04-26 14:58:27.596-0500

In some very initial testing this does indeed seem to fix the issue.  However I'm not quite ready to declare victory since this was working at least once in the past.  I will continue sending test calls through and, if it is still working in a few days, call this one fixed.

Thanks for the patch Matt!  Do you still want my debug logs?

By: Matt Jordan (mjordan) 2012-04-26 16:42:13.240-0500

Yes, I could use the DEBUG logs.  The patch was a brute force fix for what I figured had to be the problem, but I'd like to put together a more elegant solution (and one that stands a prayer of getting through code review :-)) - the DEBUG logs should confirm it for me.

Thanks for testing!

By: Ben Klang (bklang) 2012-04-26 21:48:56.739-0500

Logs from the patched Asterisk build.  This seems to result in working recordings.

By: Matt Jordan (mjordan) 2012-04-27 10:26:59.278-0500

Ben:

I've attached what I believe is the correct fix for this problem.

The long story:

An overflow can occur on some 32-bit machines (ones where a long is a 32-bit quantity, and not a 64-bit quantity) when calculating the difference in milliseconds between two timeval differences.  If one of those timeval instances is never set (and is 0 or some other very small value), the calculation will overflow, resulting in a negative number being returned.

With respect to audiohooks - and in particular, the audiohook used by MixMonitor (which is used by ConfBridge under the hood) - read/write 'factories' are used to collect audio samples from the channel - one for each direction of the audio on the channel.  Since MixMonitor needs audio from both channels, the audiohook attempts to gather samples from both factories.  If one of the factories provides samples, but the other one doesn't, it checks to see if the factory that didn't provide samples didn't provide the samples we wanted because we simply checked it a little bit before it would have the samples we requested.

The problem is, if one of the factories never provides us samples (because audio is only flowing in one direction - which would be the case in a conference, where all the audio is going out through the Bridge channel), the time calculation overflows and we constantly think the factory is just a little slow.

The patch here fixes the overflow, which makes the original calculations in audiohook.c work.  I had hacked around the calculations to make the audio actually get recorded, but your last DEBUG log confirmed my suspicions:

{noformat}
[Apr 27 14:21:44] DEBUG[24327] audiohook.c: [0xb7100570] Size: 8; Samples: 160; Sample threshold: 40; Current time: 1335550904.637836; write time: 1335550904.637806; write diff: 0; write compare: 1;
read time: 0.0; read diff: -183924419; read compare: 1
{noformat}

That read diff being a negative number is no bueno.

I managed to get my hands on a 32-bit machine (I seem to have a preponderance of 64-bit machines around here) and tested out the patch locally.  I was able to reproduce the behavior you were seeing, and the patch I've attached (32-bit-time-overflow-10-2012-04-26.diff) fixed it for me.

If you'd like to test/run with this to confirm (one last time :-)) that this does indeed fix the behavior, that'd be great.

Thanks!

Matt