ASTERISK-27816: func_talkdetect's logic is completely broken

[Home]

Summary: ASTERISK-27816: func_talkdetect's logic is completely broken

Reporter: Moritz Fain (maurice2k) Labels:

Date Opened: 2018-04-19 10:27:29 Date Closed: 2021-10-19 10:22:01

Priority: Major Regression? No

Status: Closed/Complete Components: Functions/func_talkdetect

Versions: 15.3.0 Frequency of
Occurrence Constant

Related
Issues:
is duplicated by ASTERISK-28462 func_talkdetect: TALK_DETECT firing immediately even if phone microphone is muted

Environment: Attachments:

Description: Logic in func_talkdetect.c is completely wrong regarding the thresholds. It is mixing millisecond thresholds (the one's that are passed to TALK_DETECT via set) with the DSP's avg. amplitude thresholds.

Comments: By: Asterisk Team (asteriskteam) 2018-04-19 10:27:30.017-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].
By: Moritz Fain (maurice2k) 2018-04-19 10:59:39.034-0500

I've made a patch: https://gist.github.com/maurice2k/5b25c1dbf528a398736d08e52989cab9
By: Richard Mudgett (rmudgett) 2018-04-19 11:07:20.919-0500

This looks like another place where the threshold *documentation* needs fixing. I updated the documentation for app_confbridge for the same thing in this review (changes were first released in 13.20.0 and 15.3.0):
https://gerrit.asterisk.org/#/c/8115/
By: Moritz Fain (maurice2k) 2018-04-19 11:18:14.372-0500

The input paramters for TALK_DETECT are called dsp_(talking|silence)_threshold which is actually wrong as they have nothing to do with the DSP. These are millisecond thresholds and they are used as millisecond thresholds in func_talkdetect (so the documentation is rather correct) but they're sALSO also used as input for DSP (which is wrong).

By: Moritz Fain (maurice2k) 2018-04-19 11:58:04.202-0500

Ok, the documentation is misleading as well.

The documentation is generated from the c file? I could update that part as well if the patch is making it into Asterisk.
By: Joshua C. Colp (jcolp) 2018-04-19 12:09:46.853-0500

The documentation is kept in the .c file as XML. We can only accept contributions which are attached here or posted on Gerrit, with a signed license agreement.
By: Moritz Fain (maurice2k) 2018-04-19 13:49:44.208-0500

Unfortunately the gerrit login isn't working. I tried resetting the password, but no login possible. Server openid.asterisk.org always says "Invalid Authentication". Username is "maurice2k".
By: Joshua C. Colp (jcolp) 2018-04-19 14:08:10.056-0500

Gerrit can only be accessed once the license agreement has been accepted.
By: Kevin Harwell (kharwell) 2018-04-19 14:38:11.162-0500

[~maurice2k], if you haven't already started the process, this should get you started:

https://wiki.asterisk.org/wiki/display/AST/Digium+License+Agreement
By: Moritz Fain (maurice2k) 2018-04-19 14:47:52.694-0500

I see.

Regarding the TALK_DETECT function I could add a 3rd parameter to control the DSP silence threshold (in a backward compatible way).

1st parameter: talking_time_threshold (milliseconds)
2nd parameter: silence_time_threshold (milliseconds)
3nd parameter: dsp_silence_threshold (avg. magnitude)

What do you think? Currently the patch uses the default dsp_threshold from the config.

By: Richard Mudgett (rmudgett) 2018-04-19 16:22:03.500-0500

A third parameter is not needed. The problem with dsp_talking_threshold is only a documentation problem as the talking threshold was never time related. The code does not use the dsp_talking_threshold value for timing and only passes it to the DSP code. The DSP code uses it for the signal magnitude threshold between silence and talking. The documentation in func_talkdetect.c for dsp_silence_threshold and dsp_talking_threshold should be updated to correspond with dsp_silence_threshold and dsp_talking_threshold in app_confbridge (https://gerrit.asterisk.org/#/c/8115/). The confbridge parameters used similar language to TALK_DETECT before being corrected to what they really meant.

There is also a cut and paste error in talk_detect_fn_write() where it uses the dsp_silence_threshold variable when it should use dsp_talking_threshold.
By: Moritz Fain (maurice2k) 2018-04-20 03:14:18.200-0500

I disagree with "only a documentation problem".

I would say func_talkdetect is broken by design: it *always* triggers a talking started event, regardless of DSP silence or noise/talking because it doesn't use DSP's information whether there has been noise/talking at all. Right from the start {{total_silence}} is 0 and {{td_params->talking}} is also 0 which makes {{update_taking}} to be set to 1 which in turn triggers the event.

{code}
if (total_silence < td_params->dsp_silence_threshold) {
if (!td_params->talking) {
update_talking = 1;
td_params->talking_start = ast_tvnow();
}
td_params->talking = 1;
} else {
...
}
{code}

I think the idea to have both, a talking threshold and a silence threshold *in milliseconds* (as the documentation states and what maybe was the author's initial intention) is quite useful because otherwise TALK_DETECT would always detect a short crackling or a cough as "talking". Of course you can play around with the DSP threshold but then real talking might not be detected. So I would rather change the code to what the documentation says (plus fix the documention).

That way you could say "talking" is DSP detected noise with a length of at least talking_threshold milliseconds and we fell "silent" as soon as silent_threshold milliseconds have passed without the DSP detected noise.

From the argument description of the current documentation:
{code}
dsp_silence_threshold - The time in milliseconds before which a user is considered silent.
dsp_talking_threshold - The time in milliseconds after which a user is considered talking.
{code}
By: Richard Mudgett (rmudgett) 2018-04-20 19:28:35.358-0500

{quote}
commit 53968c00b34a913e0467200b95add483f79dd3b6
Author: Matthew Jordan <mjordan@digium.com>
AuthorDate: Fri May 30 12:42:57 2014 +0000
Commit: Matthew Jordan <mjordan@digium.com>
CommitDate: Fri May 30 12:42:57 2014 +0000

TALK_DETECT: A channel function that raises events when talking is detected

This patch adds a new channel function TALK_DETECT that, when set on a
channel, causes events indicating the start/stop of talking on a channel to be
emitted to both AMI and ARI clients.

The function allows setting both the silence threshold (the length of silence
after which we decide no one is talking) as well as the talking threshold (the
amount of energy that counts as talking). Parameters can be updated on a channel
after talk detection has been enabled, and talk detection can be removed at
any time.

The events raised by the function use a nomenclature similar to existing AMI/ARI
events.
For AMI: ChannelTalkingStart/ChannelTalkingStop
For ARI: ChannelTalkingStarted/ChannelTalkingFinished

Review: https://reviewboard.asterisk.org/r/3563/

#ASTERISK-23786 #close
Reported by: Matt Jordan
........

Merged revisions 414934 from http://svn.asterisk.org/svn/asterisk/branches/12

git-svn-id: https://origsvn.digium.com/svn/asterisk/trunk@414935 65c4cc65-6c06-0410-ace0-fbb531ad65f3
{quote}

The initial commit of the function shows the intent of TALK_DETECT to be just like the other talk detect options and behavior in the system. There is a talking level threshold to specify the energy level boundary between silence and talking. There is a silence time threshold to determine when to declare talking ended. The user documentation is just wrong. It does not match the commit message description which is what the code does.

The code does use information from the DSP to look for edges where talking/noise starts and stops. The total_silence value comes from the DSP code's accumulated silence history. What you point out is a problem in the code when determining the *initial* talking state.

What you are proposing to do is alter the definition of the talk detection event slightly to add the talking time threshold option. I was initially concerned that changing the definition would cause inconsistencies in behavior between other modules generating talk detection events. After some research I see that no other code generates the *same* talking events as generated by the TALK_DETECT function. e.g., ConfBridge generates its own version of the talking events with just the two detection options.

To remain backward compatible with existing implementations the new option needs to be last. Thus the {{Set(TALK_DETECT(set)=a,b,c)}} parameters should be:
# silence_time_threshold (in milliseconds, default 2500 ms)
# dsp_silence_threshold (average magnitude, default from dsp.conf silencethreshold option)
# talking_time_threshold (in milliseconds, default 0 ms)

These changes need to be broken into two patches. The first patch needs to fix the documentation and the two bugs identified: a) The cut and paste error in talk_detect_fn_write() where dsp_silence_threshold is used when it should be dsp_talking_threshold. b) The initial talking state. The second patch to alter the talking event definition to add the talking time threshold option.

The first patch can go into Asterisk v13, v15, and master as it only fixes the documentation and bugs. The second patch can only go into master unless testsuite tests are created to test the new feature.

By: Moritz Fain (maurice2k) 2018-04-24 06:44:23.497-0500

Thanks [~rmudgett] for the explanation!

I just found another solution using the {{duration}} field in the talking finished event.
That way I could implement the talking threshold client side, which would reduce the patch and keep it 100% compatible.

The patch now fixes two things:
- only emit a talking started event if there was talking/noise (currently *always* emits an event right after the channel is being answered)
- fixes negative durations for very small durations caused by timing issues by setting everything <0 to zero.
By: Moritz Fain (maurice2k) 2018-04-24 06:47:24.413-0500

Ok, will sign your documents once you say that the patch is alright:
https://gist.github.com/maurice2k/cc60f8960f51ca6f43b2a109e079ce28

By: Richard Mudgett (rmudgett) 2018-04-24 11:01:51.578-0500

Yes, that patch looks fine and will fix the identified bugs. The documentation still needs to be corrected to state what the code actually does. (https://gerrit.asterisk.org/#/c/8115/ provides example documentation)
By: Friendly Automation (friendly-automation) 2021-10-19 10:22:02.926-0500

Change 16580 merged by Friendly Automation:
func_talkdetect.c: Fix logical errors in silence detection.

[https://gerrit.asterisk.org/c/asterisk/+/16580|https://gerrit.asterisk.org/c/asterisk/+/16580]
By: Friendly Automation (friendly-automation) 2021-10-19 10:22:36.846-0500

Change 16615 merged by Friendly Automation:
func_talkdetect.c: Fix logical errors in silence detection.

[https://gerrit.asterisk.org/c/asterisk/+/16615|https://gerrit.asterisk.org/c/asterisk/+/16615]
By: Friendly Automation (friendly-automation) 2021-10-19 10:24:55.695-0500

Change 16617 merged by Friendly Automation:
func_talkdetect.c: Fix logical errors in silence detection.

[https://gerrit.asterisk.org/c/asterisk/+/16617|https://gerrit.asterisk.org/c/asterisk/+/16617]
By: Friendly Automation (friendly-automation) 2021-10-19 10:25:02.492-0500

Change 16616 merged by Friendly Automation:
func_talkdetect.c: Fix logical errors in silence detection.

[https://gerrit.asterisk.org/c/asterisk/+/16616|https://gerrit.asterisk.org/c/asterisk/+/16616]