[Home]

Summary:ASTERISK-05815: [patch] RFC-2833 DTMF support is broken in Asterisk 1.2.x
Reporter:Michael Platov (platovm)Labels:
Date Opened:2005-12-11 07:29:52.000-0600Date Closed:2006-09-27 16:31:43
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/RTP
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20060118__bug5970.diff.txt
( 1) 20060119__bug5970__50ms.diff.txt
( 2) 20060128__bug5970.diff.txt
( 3) ast_wave.mp3
( 4) Bad_Events_Sequence_Causes_Duplicate_Digits.gif
( 5) broadvox-dtmf-5970.cap
( 6) dtmf-bvox-borked_rev0.txt
( 7) Duplicate_Digits.txt
( 8) Event_Between_End_Events_Causes_Duplicate_Digit.cap
( 9) p2p_wave.mp3
(10) p2pgw.dump
(11) rtp_changes.patch
(12) trace.txt
Description:I have two SIP Gateways working with Asterisk server. In sip.conf both configured as  dtmfmode=rfc2833. When I making call from one gateway to another and press digits to generate dtmf tones I'm getting 'clicking' sounds - when I press keypad button on phone I hear cliks on the other side (meanwhile phone continually generates tone), when I release button on phone I getting another click from other side. If I press button shortly I can get 3 different results:
1. digit was not recognized at other side at all
2. digit was recognezed at oter side
3. digit was recognized at other side as 2 digits

In peer-2-peer mode (without asterisk) gateways works fine, so I think that this is Asterisk bug. Moreover, I looked into asterisk sources (rtp.c) and I think the problem is in ast_rtp_senddigit(). Here for every digit we are trying to send 6 RTP packet - first 3 for starting digit and last 3 for ending digit. RFC2833 defines two scenarios of transmitting DTMF tones, it seems that asterisk use first one, but at the moment it still do this wrong way. The point is, that in first 3 packets it send 0 for event duration (RFC says that it must be timestamp here), and for last 3 packet it send event duration 800 (according to RFC here we should have real timestamp when button was released)

In RFC-compliant scenario when we press button and hold it we should get RTP packets with constantly increasing sequence number time stamps and event duration fields at least. When keypad button is released we should send RTPs with increasing sequence number, and timestamp, but event duration should be same for several packets.

Comments:By: damin (damin) 2005-12-11 09:54:04.000-0600

To help use debug this problem, could you please upload a SIP trace from the console as well as a packet capture of the SIP session using tcpdump or ethereal? This will help us directly see what is going accross the wire.

By: Tilghman Lesher (tilghman) 2005-12-11 10:57:53.000-0600

"RFC says that it must be timestamp here".

No, the RFC says it must be in timestamp UNITS:

          Duration of this digit, in timestamp units. Thus, the
          event began at the instant identified by the RTP timestamp
          and has so far lasted as long as indicated by this parameter.
          The event may or may not have ended.

Hence, when the DTMF is initially pressed, the DTMF has just begun and therefore has a duration of 0.

By: Michael Platov (platovm) 2005-12-12 06:24:31.000-0600

" the DTMF has just begun and therefore has a duration of 0"

Yes, but at same time we already have voice RTP stream with it's own timestamp. I think that "instant identified by the RTP timestamp" means that first value in duration should be timestamp from voice RTP stream. Moreover, my voip Gateways works exactly in this way (you can see it in the tcpdumps attached).

The sound files with 'problem clicks' attached. When pressing button I'm hearing first click and when releasing it out - another click. (see files attached)

I think that problem is in the way asterisk create RFC2833 tones. Maybe gateway's RTP playout buffer can't normally play DTMF sound with *bad* duration and timestamps from Asterisk?

By: Tilghman Lesher (tilghman) 2005-12-12 09:13:51.000-0600

Please generate a SIP trace, as requested, and upload it as a text file.  Please do NOT upload .tar.gz files, as they have to be downloaded and unpacked to be read.

By: Michael Platov (platovm) 2005-12-12 09:37:47.000-0600

ast_wave.mp3
p2p_wave.mp3

p2pgw.dump - tcpdump of call beetwen 2 voip gateways (normal dtmf sounds of any duration are in file p2p_wave.mp3)

ast.dump - tcpdump of call between same gateways, but RTP goes through asterisk (problem 'clicks' can be seen in file p2p_wave.mp3)


172.22.1.18 - asterisk server
172.22.1.28 - FXS gateway
172.22.1.36 - FXO gateway

By: Tilghman Lesher (tilghman) 2005-12-12 09:55:24.000-0600

Please read the documentation on generating a SIP backtrace and upload your backtrace when complete.

By: Kevin P. Fleming (kpfleming) 2005-12-13 10:40:32.000-0600

This is completely wrong analysis. Corydon is right; 'duration' means _duration_, it has absolutely nothing to do with timestamps or marking when the event started. If your gateways are generating them that way or expecting them, they are broken.

Asterisk does not support variable length DTMF events over RTP at all, so the remainder of you what you are asking for will not happen.

If you can provide an RTP capture showing the behavior of your gateways _and_ document how it is RFC compliant, we'll consider it. Otherwise, we can't accomodate a broken implementation. Asterisk's RFC2833 implementation has been thoroughly analyzed and tested against hundreds of different endpoints without experiencing this issue.

By: Leif Madsen (lmadsen) 2006-01-12 22:23:07.000-0600

I have been experiencing some problems with DTMF in RFC2833 mode. I'm hoping that I can help provide some more information about the problem. Over the last couple of days I have been working with Robin Repas from Broadvox.

My behaviour is that I have been getting duplicate DTMF at the far end of the call. His email regarding our testing is as follows.

"Leif,

I've attached an Ethereal capture of our earlier test call.  It contains both the signaling and the media stream.  During the call you pressed digits 1 through 9 in sequence.  I heard in my earpiece double and triple bouncing on some of the digits.  The capture reveals why this is happening.

The observed behavior is that the Asterisk unit sends six RTP Event packets for each digit press.  The first three packets are typical "start of event" packets.  The first of these three has the SDP marker set to True (per RFC 2833).  The other two have this marker set to False which is typical.  All three of these "start" packets have the event duration set to zero.  This is also typical.  Basically, these three packets serve as a preamble telling the terminating gateway that a DTMF event is coming.  See frames 3633-3635 of the attached capture for an example.

The last three packets of the six are all identical "end of event" packets.  Each has the end-of-event bit set which is a different flag than the marker bit used in the first start packet.  Each of these "end" packets also provide the total duration of the tone event.  The Asterisk behavior observed is the total event duration is always 800 clocks ticks.  Clock ticks are in reference to an 8kHz clock.  So, 8 clock ticks represent 1 millisecond.  And, 800 clock ticks  would indicate a 100 ms tone duration.  See frames 3636-3638 for examples.

The above behavior seems to indicate that when an Asterisk detects a digit press it calls a routine that always generates the above six packets (three start, three end, volume always 10, and tone duration always 100 ms).  Other originating devices (i.e. Cisco) will actually sample the digit press for volume and duration and send event "update" packets (my terminology) between the start and end packets.  These update packets increase the event duration throughout the length of the digit press.  The samples are usually sent at an interval equivalent to the ptime of the codec (i.e. 20 ms for G.711).  However, there's nothing inherently wrong with the 6-packet method used by Asterisk.  And, I believe it would be considered to conform to RFC 2833.  There is however another problem with the way Asterisk delivers these six packets that's at the heart of the double digit problem.

Let's look at two digit press examples from the capture.  One that was interpreted correctly (digit press of a "1") and another that experienced a triple bounce (digit press of a "2").

Frames 3633 through 3638 represent one digit press event of the number "1".   This digit played out correctly.  Note the sequence numbers of these three frames show they arrived in sequence.

 Frame 3633 - seq 56430 - start marker true  - end of event false - event duration 0
 Frame 3634 - seq 56431 - start marker false - end of event false - event duration 0
 Frame 3635 - seq 56432 - start marker false - end of event false - event duration 0
 Frame 3636 - seq 56433 - start marker false - end of event true  - event duration 800
 Frame 3637 - seq 56433 - start marker false - end of event true  - event duration 800
 Frame 3638 - seq 56433 - start marker false - end of event true  - event duration 800

Now look at frames 3741 through 3746 for a single press of the number "2".  Note how the frames arrive out of sequence.

 Frame 3741 - seq 56485 - start marker true  - end of event false - event duration 0
 Frame 3742 - seq 56488 - start marker false - end of event true  - event duration 800
 Frame 3743 - seq 56488 - start marker false - end of event true  - event duration 800
 Frame 3744 - seq 56486 - start marker false - end of event false - event duration 0
 Frame 3745 - seq 56488 - start marker false - end of event true  - event duration 800
 Frame 3746 - seq 56487 - start marker false - end of event false - event duration 0

The above plays out as a triple press.  The first three frames are played out as one tone.  The fourth and fifth frames play out a second tone.  And the sixth frame starts a third tone.  Packets arriving out of sequence like this are the cause of the double and triple digits heard by IVRs on the terminating side.

Examination of the time delta between each of the above frames indicates the Asterisk is sending the 6 packets that represent an RTP event as a single burst.  The time differential between the packets is extremely small.  It is often sub-10 micro-seconds (that's micro, not milli).  Given the nature of latency on the public Internet it's easy to see why packets sent so close together may arrive out of sequence at the other end.

I suggest that you find the Asterisk routine responsible for generating the 6-packet RTP event and insert some "sleep" time between each of the six packets to space them out better.  Or perhaps, insert just one sleep cycle between the three "start" packets and the three "end" packets.  I think the latter is a better suggestion.  I believe a 20 ms sleep cycle (the typical ptime for G.711 and G.729) between the three start packets and the three end packets should be enough to compensate for IP jitter in transport.  However, you can experiment with different values to see what makes DTMF transport most reliable."

I've attached his Ethereal capture. I basically call him, we chat for a few moments, then I start to dial 1 2 3 4 5 6 7 8 9. As per his explanation you can probably see which digits duplicated and which did not.

Feedback appreciated. I should also mention that I've been getting this behaviour with multiple providers. Only Level 3 seems to be able to handle it correctly.



By: Tilghman Lesher (tilghman) 2006-01-13 00:04:30.000-0600

This patch should be close to correct.

By: Leif Madsen (lmadsen) 2006-01-13 01:14:52.000-0600

Unfortunately the patch segfaulted my system once I entered a digit. I've attached bt-output.txt and bt-full-output.txt.

By: Matthew Simpson (matthewsimpson) 2006-01-14 20:19:02.000-0600

Blitz/Robin are correct about the DTMF problem.  I see this same issue between our Lucent Plexus softswitch and Asterisk. Double, triple DTMF at times.



By: Tilghman Lesher (tilghman) 2006-01-16 16:20:45.000-0600

Okay, let's try that patch again.  This one worked for me, applies against 1.2.

By: damin (damin) 2006-01-17 10:03:18.000-0600

I attempted to use the latest patch here using both g729 and g711, and both resulted in a segfault. I have attached a backtrace from the coredump.

By: damin (damin) 2006-01-18 12:44:03.000-0600

I have tested this patch on a newly installed, Centos 4.2 box running SVN-branch-1.2-r8173M. The only patch that I've included is Corydon's (hence the "M" in the version).

As soon as I press the first DTMF key, Asterisk cores. See the attached dtmf-5970-backtrace-2.txt file.

[root@gw1 asterisk]# asterisk -rvc
Asterisk SVN-branch-1.2-r8173M, Copyright (C) 1999 - 2006 Digium, Inc. and others.
Created by Mark Spencer <markster@digium.com>
Asterisk comes with ABSOLUTELY NO WARRANTY; type 'show warranty' for details.
This is free software, with components licensed under the GNU General Public
License version 2 and other licenses; you are welcome to redistribute it under
certain conditions. Type 'show license' for details.
=========================================================================
Connected to Asterisk SVN-branch-1.2-r8173M currently running on gw1 (pid = 711)
Verbosity is at least 3
   -- Accepting AUTHENTICATED call from 207.166.192.188:
      > requested format = g729,
      > requested prefs = (),
      > actual format = g729,
      > host prefs = (),
      > priority = mine
   -- Executing Dial("IAX2/207.166.192.188:4569-6", "SIP/broadvox-noct-test/12166192000") in new stack
   -- Called broadvox-noct-test/12166192000
   -- SIP/broadvox-noct-test-908a is making progress passing it to IAX2/207.166.192.188:4569-6
   -- SIP/broadvox-noct-test-908a answered IAX2/207.166.192.188:4569-6
gw1*CLI> *** glibc detected *** double free or corruption (!prev): 0x08bcaea8 ***

Disconnected from Asterisk server
Executing last minute cleanups
Asterisk cleanly ending (0).

By: Tilghman Lesher (tilghman) 2006-01-18 17:40:35.000-0600

Third try.  I didn't realize the scheduler retried when you returned non-zero.

By: Leif Madsen (lmadsen) 2006-01-19 16:25:58.000-0600

I've been able to duplicate this problem locally in the lab. Using NISTnet application on the "router" it successfully creates enough jitter that packets arrive to Asterisk out of order. This can clearly be seen with pinging (ICMP). The problem is seen on the far Asterisk box when sending duplicate DTMF in this topology:

X-Lite -1-> Asterisk -2-> Router (NISTnet creating jitter) -3-> Asterisk -4-> TDM400P (FXS) -5-> phone

Jitter is created in leg 3, but not in legs 1 or 2. Duplicate and Triplicate DTMF can be seen on the Asterisk console (DTMF logging) and audibly on the phone.

After patching, the same thing happens. I'm going to try the patch from Delvar first, then I'll come back here and attach a packet trace for your perusal (assuming the Delvar patch also doesn't work).

By: damin (damin) 2006-01-19 21:18:56.000-0600

I am happy to report that Corydon's 3rd generation patch does not segfault Asterisk. I was able to make several test calls using Broadvox this afternoon. DTMF seems to work reliably -IF- I space out the keypresses enough. However, if I blast through a series of keys in rapid succession some IVR systems did not report the correct number and indentification of digits.

I haven't been able to verify w/ Broadvox wether the patch actually has the intended effect yet, but cursory investigation seems to point to yes.

By: Tilghman Lesher (tilghman) 2006-01-19 22:34:05.000-0600

Well, we could certainly space out the DTMF further apart than 20ms, if you think that might help.  I went for 20ms, since that was the recommendation in a prior bugnote.



By: Tilghman Lesher (tilghman) 2006-01-19 22:39:24.000-0600

Let's try that patch with a 50ms delay; see if that has the desired effect.

By: damin (damin) 2006-01-20 09:34:08.000-0600

I haven't looked at the patch, but if it is not too much trouble, can you tell me where I can modify the timing? I'll then setup and test w/ Broadvox to see how things go as we vary the timing.

By: Tilghman Lesher (tilghman) 2006-01-20 10:13:13.000-0600

Line 1199 or thereabouts, in the call to ast_sched_add, the second parameter is the number of milliseconds to delay.

By: damin (damin) 2006-01-20 10:48:20.000-0600

I did some packet capture's w/ Broadvox today. Basically, this patch creates very very very wacky behavior, and is not useable. It doesn't crash things, but it doesn't actually work correctly either. Attached, please find an ethereal capture from BroadVox. As I understand it, this patch creates even more fluctuations in the DTMF event stream.

By: damin (damin) 2006-01-20 11:13:06.000-0600

As an update of information to help decode the RTP capture. My keypress sequence was "# 2 0 3" followed by "# 2 0 3". As you can see from the capture, it appears that a significantly large number of events were streamed out to the other side starting in Packet ASTERISK-1081.

By: damin (damin) 2006-01-27 01:35:09.000-0600

Attached, please find a patch that is wrong. It is SOOOO wrong. But it works. Basically, we noticed that when we machine gun the DTMF events all in the same microsecond, the gateway on the other side, no matter what the duration, would clip when the dtmf end of event packet was received. We tried all sorts of crazy things w/ the code and finally settled on sending the duration in the second and third start events. Then, we DO NOT set the End Event Marker in the last three events. This works. Obviously, Broadvox is messed up. Obviously, they need to update / fix / replace their Cisco IOS versions on their AS-5300s.

With this patch, I have 100% DTMF detection on everything that I have tried it with. This includes, PlaingVoip, BroadVox and Asterlink.

One could argue that it is the remote gateway's responsibility (and I would agree) to interpret the DTMF events properly no matter how rapidly they are received, but what do I know? All I know is that this solved my problem.

Comments? Corydon's patch appears to not really fix the issue, but appears to be a reasonable approach to working around the problem, except that DTMF events would STILL be clipped at 50 Milliseconds, as soon as the scheduled end of event is received at the remote end.

Here is how I propose the patch be re-archicted...

0 MS - Start Event (Duration == 0)
0 MS - Start Event (Duration == 0)
0 MS - Start Event (Duration == 0)
50 MS  - Update Event (Duration == 400)
100 MS - End Event (Duration == 800)
100 MS - End Event (Duration == 800)
100 MS - End Event (Duration == 800)

This mimics the behavior that we observed from inbound DTMF events coming FROM a Cisco AS5300 and correlates w/ our drunken understanding of RFC 2833.

By: damin (damin) 2006-01-27 02:17:23.000-0600

From RFC-2833;

"3.6 Sending Event Packets

  An audio source SHOULD start transmitting event packets as soon as it
  recognizes an event and every 50 ms thereafter or the packet interval
  for the audio codec used for this session, if known. (The sender does
  not need to maintain precise time intervals between event packets in
  order to maintain precise inter-event times, since the timing
  information is contained in the timestamp.)

     Q.24 [5], Table A-1, indicates that all administrations surveyed
     use a minimum signal duration of 40 ms, with signaling velocity
     (tone and pause) of no less than 93 ms.

  If an event continues for more than one period, the source generating
  the events should send a new event packet with the RTP timestamp
  value corresponding to the beginning of the event and the duration of
  the event increased correspondingly. (The RTP sequence number is
  incremented by one for each packet.) If there has been no new event
  in the last interval, the event SHOULD be retransmitted three times
  or until the next event is recognized. This ensures that the duration
  of the event can be recognized correctly even if the last packet for
  an event is lost.

     DTMF digits and events are sent incrementally to avoid having the
     receiver wait for the completion of the event.  Since some tones
     are two seconds long, this would incur a substantial delay. The
     transmitter does not know if event length is important and thus
     needs to transmit immediately and incrementally. If the receiver
     application does not care about event length, the incremental
     transmission mechanism avoids delay. Some applications, such as
     gateways into the PSTN, care about both delays and event duration."

Here is how I think we can make Asterisk be closer to the RFC, given the set 100 Ms DTMF length;

1. Send three start events
2. At 50 Ms, send an event w/ a duration of 800 clock ticks
3. At 100 Ms, send an event w/ a duration of 800 clock ticks
4. Send three Stop Events w/ a duration of 800 clock ticks

By: damin (damin) 2006-01-28 19:59:57.000-0600

While I was sitting in the Airport in San Francisco, I called Broadvox back to ask them for an update. I was able to have a conversation with Robin, which I believe provides us with two options to move forward.

I expressed that I felt that we were RFC2833 compliant per my reading of the document and that I believed it was the responsibility of the Gateway manufacturer to correct their implementation. Robin's response was that Cisco, as well as several other vendors have the same issue because the RFC2833 stacks are implemented per the RFC to listen for duration events while the DTMF tone is being generated. In theory, if a stack is listening for duration events, it should never receive an end of event packet w/ the end of event bit set until after it has already played the DTMF for a period of time. He mentioned that Lucent, Cisco, Quintum as well most other vendors he has worked with react in the same way, and that they have seen this quite a bit with Asterisk.

I explained that Asterisk does not currently support DTMF timing events and that all events are hardcoded as 800 clock ticks (100 ms) and because of the nature of the RTP stack it is a bit more challenging to implement true DTMF keypress monitoring. I reiterated that I believed it was the vendor's responsibility to correct their stack again and requested that we work together to open a ticket w/ Cisco to see if they might modify their stack behavior.

At this point, Robin stated that Gary Tabachnick, his boss, had ordered him to cease working on the issue, as they see this as an Asterisk issue, and not a vendor issue. He apologized and explained that they do not maintain support contracts w/ Cisco on their gear and as a result he could not help us any more. He suggested that he believed we, as a community have two options;

1. We could all beat on our Vendors to try and get them to modify their RFC2833 stacks to operate correctly.

or

2. We could just patch our code to send the duration in the start bits and then send the end of event markers 100ms later which would solve many of the issues of Double / Triple and/or Clipped DTMF events with a MUCH WIDER scope of vendor implementations.

While I do not like the fact that Broadvox isn't interested in helping fix this "the right way" by working w/ Cisco and their other vendors, I understand their position and agree with Robin's suggestion that it is probably easier for Asterisk to implement it's DTMF relay in a slightly modified way that emulates "more correctly" the behavior described in the RFC. I.E. to space DTMF events out over a period of time, rather than blasting them all off in the same millisecond.

I would like to see comments on this.

By: Jason Parker (jparker) 2006-01-28 20:23:43.000-0600

Well, in all reality, I think it needs to be both 1) and 2).  Regardless of what we do with 1), will it actually WORK with Cisco when we're done?  Will it work with other (perhaps broken, but not so bad) RTP stacks?

By: Leif Madsen (lmadsen) 2006-01-28 21:14:13.000-0600

Well, the real problem I guess is that RFCs are ultimately interpreted as the implementors see it should be done, and if a number of other vendors have already in one way, it seems to make sense that Asterisk is more readily capable of changing to follow "them" -- isn't that kind of the whole point of this? :)

You could either modify the functionality of dtmfmode=rfc2833 to work with them, or add another option such as dtmfmode=vendors2833.

By: Jason Parker (jparker) 2006-01-28 22:37:28.000-0600

The below text is per Kevin (slightly modified) in #asterisk-dev.

on the output side, rtp.c will need ast_rtp_send_digit_begin and ast_rtp_send_digit_end

ast_rtp_send_digit_begin will send _only_ the start event frame (the 'mark bit' frame), and remember what digit is being sent and at what timestamp it started

ast_rtp_send_digit_end would send the three 'end of event' frames for the digit that was last started

for extra credit, a timer could generate 'continuation' frames every 50ms

on the input side, when rtp.c receives the 'mark bit' frame, it would queue an AST_FRAME_DTMF_BEGIN onto the channel (and nothing else)

when it receives the 'end of event' frame, it would queue AST_FRAME_DTMF_END and AST_FRAME_DTMF (like it does now)

when those are done, chan_sip can add those callbacks to its channel_tech structure

we need to find a way to make do_senddigit in channel.c work for channel drivers that implement only begin/end, so applications (and old channels) that send only a single DTMF frame will still work once chan_sip uses BEGIN/END instead

By: Amy (amyl) 2006-02-17 10:37:47.000-0600

There's another aspect to this issue that doesn't seem to be covered here. In the testing that I have done, it seems that Asterisk does not increment the timestamp between the last audio frame and the first RFC2833 frame, yet the (real) time between the two can be hundreds of milliseconds when converting between types of DTMF. In what appears to be a related bug, Asterisk increments the timestamp by 120ms between digits generated with SendDTMF regardless of the timeout_ms parameter. This affects DTMF generated programmaticly and when converting one type of DTMF to another. For an example, in one test case I have, the RFC2833 type packets sent from asterisk when it converts from INFO messages arrive after a 300-400ms gap in the RTP stream, with the same timestamp as the last audio packet. Likely, the delay is caused by a gap between the RTP and the INFO message on the SIP Phone leg of the call, which should result in a corresponding gap in RTP, but the result is broken for the reason stated above.

Note that RFC2833 clearly states that the timestamp must reflect the correct playout time for the digit, relative to the timebase established at initiation of the stream. By the time the event has arrived at the receiver, the playout pointer has already passed the timestamp in the event packet. Depending on exactly how far past it is, it might be completely discarded, and that would be a correct and standard implementation.

Timestamp: The RTP timestamp reflects the measurement point for
          the current packet. The event duration described in Section
          3.5 extends forwards from that time.

Receiver implementations MAY use different algorithms to
          create tones, including the two described here. In the first,
          the receiver simply places a tone of the given duration in
          the audio playout buffer at the location indicated by the
          timestamp.

For an example of the issue as exposed by the internal programmatic generator, look at the output from the following command sequence:
exten => 1234,1,Wait,1;
exten => 1234,2,Answer;
exten => 1234,3,Playback(transfer);
exten => 1234,4,SendDTMF(123456|500);
exten => 1234,6,Hangup();

In one run of that, this is what got:
At 15.922s I got the last 20ms audio frame with a timestamp of 17920.

At 15.942s I got the first of 6 RFC2833 style messages all with a timestamp of 17920 regarding DTMF digit 1. Correct behavior according to RFC2833 would be to mix the DTMF with that last frame of audio and play it out starting at timestamp 17920. Based on the end packets, the length is 800 ts units (100ms) so the tone should stop at 18720 (15.022s).

At 16.456s I got the first of 6 RFC2833 style messages all with a timestamp of 18880 regarding DTMF digit 2. Correct behavior according to RFC2833 would be to  play it out starting at timestamp 18880 (16.042s). Based on the end packets, the length is 800 ts units (100ms) so the tone should stop at 19680 (16.142s). Problem is, since this arrived at 16.456s, I'd need a jitter buffer of 414ms to play the whole of this tone. This only gets worse as the tones progress.

It seems to me that Asterisk needs to keep track of the offset between the system clock and the timebase of a given call, in order to correctly determine the event timing for generated events. This bug creates real problems in any circumstance where Asterisk does translation between DTMF types or generates DTMF. As stated above, in the translation case, the results can be far worse than the above test case.

This is not just nitpicking about standards here. This causes real failure to interoperate with the Class 5 softswitch we use (Metaswitch VP3510). It may be the cause of failure to interoperate with other strictly implemented VoIP equipment, which is why you're seeing this problem with Cisco, with Class 5 switches, etc. It seems that getting clean DTMF at the receiver is coming at the grace of the adaptive jitterbuffer, and not due to correct input. Most importantly to us, the problems with conversion need to be resolved, as it is crippling to those customers of ours who are using Asterisk with our switch for PSTN termination. I'm taking this position not only on my own analysis but on the result from Metaswitch's engineering team, who are experts on the standards.

Here's what they had to say about correct behavior for agents implementing RFC2833:
1. All the packets that make up the digit should have a timestamp that indicates when the digit starts. This timestamp should be against the same timebase as the media packets. The first packet must be sent in a timely manner - i.e. it is no good telling MetaSwitch to start playing a digit a significant time in the past or the future. Best to send the packet at the next point a normal packet would have been sent (i.e. on the packetization interval) after the digit is detected/requested
2. The first packet should not be repeated.
3. Best to send a digit packet at subsequent packetization intervals. Timestamp should remain the same, but duration and sequence number should increase. Duration should increase by the packetization interval for each packet.
4. The final packet should be sent 3 times with "end" set. No need to space the repeats out. Duration should not increase in the repeats.

I have clarified with them that #3 is indeed recommended, not required and that aspect of the Asterisk implementation is acceptable and will play out properly on our switch. Testing confirms this.

As far as the issues with packets arriving out of order causing playout stuttering, that's a definite (but common) fault with the playout buffering on the equipment who is receiving the RTP stream being unable to resequence packets within the limits of the jitter buffer. It is certainly aggravated by sending the end of packet message very closely after the start message, but it is legal, and a properly operating device should handle it correctly.

By: Amy (amyl) 2006-02-17 10:43:18.000-0600

Sorry, I forgot to mention, my testing was done with 1.2.3, production release, no patches.

By: alric (alric) 2006-03-14 15:53:28.000-0600

I'm running into this problem as well, like amyl said, with a Metaswitch.  Anything that can be done to help along this bug, from a tester standpoint or something?

By: Serge Vecher (serge-v) 2006-03-14 16:22:09.000-0600

Alric, a fix to a related bug ASTERISK-6491 has been committed to 1.2 (will be in 1.2.6 when that's released) and trunk (http://lists.digium.com/pipermail/svn-commits/2006-March/011321.html). Please test with the latest version of either and report if that fix works or not for you. Thanks for testing!

By: Amy (amyl) 2006-03-14 16:47:23.000-0600

Vechers,

While I haven't tested this patch, my read of it and my examination of the source suggest that the problem starts much deeper in the ast_bridge code, since the problem seems to be that the timstamp passed to the digit generation code you referenced isn't correct.

After looking through the source somewhat, it looks like this issue is born out of some rather deeply seated design decsisions about timing and buffering. What I'm getting at there is that Asterisk doesn't have an RTP buffer for transcoded calls, instead relying on transcoding the packets recieved as they arrive. While this may be a win performance wise, it limits your ability to handle situations where the packet timing, sequence, or timestamps should be different or run at different rates between the call legs. Without looking at the exact circuimstances tickling this bug, I'd venture to guess that if there is a fix done for this in the current framework, it will be a very ugly fix upon several other oddly patched together bits of code.

On the other hand, perhaps there's just a bug in the transcoding code, I haven't dug in deep enough to tell. In any case, the problem with the SendDTMF is pretty straightforward. The code added for the delay between digits doesn't add the appropriate time to the timestamp. Instead a standard increment is used. Note that that whole routine is a bit odd as it is, and probably should be rewritten a bit to clean it up.

-Amy

By: Arsen Chaloyan (arsen) 2006-03-15 01:18:16.000-0600

This issue should be spitted into 2 parts

- first issue (platovm  12-11-05 )
Sip Gateways, which is used with asterisk, is definitely not compliant to rfc2833.
It's too ugly to increase both timestamp and event duration on dtmfs (p2pgw.dump).
The event duration extends forwards from the time given by the RTP timestamp.
I'll suggest to report the bug to Sip Gateways vendor.
Any way, I guess that patch on ASTERISK-6491 can help your gateways to detect dtmfs correctly.

- second issue (blitzrage 01-12-06 22:23)
This issue is related to ASTERISK-5868, and has no relationship with ASTERISK-6491 at all.

Patch on ASTERISK-6491 improves the way how asterisk sends dtmfs (ast_rtp_senddigit in rtp.c), but not how it actually detects (process_rfc2833 in rtp.c).
Please see ASTERISK-5868 for more info.

Sorry guys, I cann't comment on all the notes added, hope my thoughts is clear enough.

By: Arsen Chaloyan (arsen) 2006-03-15 08:22:19.000-0600

damin:
Seems it's you upload  broadvox-dtmf-5970.cap.
I've just attached the patch to ASTERISK-5868, which is addressed to this stream as well.
Unfortunately I can't test patch against this stream, due to g729 codec license issue.
My asterisk rejects to establish call with g729.
Can you give any feedback or attach the similar capture with g711 codec.

By: alric (alric) 2006-03-15 09:52:07.000-0600

Checking out the newest 1.2 branch code from SVN didn't solve my problems either :(  I'm going to do some more testing and add some more cases and see what happens.

By: Arsen Chaloyan (arsen) 2006-03-16 00:26:08.000-0600

Some additional notes regarding comments made by 'amyl 02-17-06 10:37' and patch on ASTERISK-6491

I fully agree with mentioned comments and just copy & past recommendation to clarify against patch on ASTERISK-6491

> 1. All the packets that make up the digit should have a timestamp that indicates when the digit > starts. This timestamp should be against the same timebase as the media packets. The first >packet must be sent in a timely manner - i.e. it is no good telling MetaSwitch to start playing a >digit a significant time in the past or the future. Best to send the packet at the next point a normal >packet would have been sent (i.e. on the packetization interval) after the digit is >detected/requested
patch on 6667 contains no changes concerning timing intervals in which dtmf packets are sent

>2. The first packet should not be repeated.
fixed

>3. Best to send a digit packet at subsequent packetization intervals. Timestamp should remain >the same, but duration and sequence number should increase. Duration should increase by the >packetization interval for each packet.
duration value is fixed, timestamp was already there, all digits are sent at once (no changes here, but should be fixed in the further patches)

>4. The final packet should be sent 3 times with "end" set. No need to space the repeats out. >Duration should not increase in the repeats.
exactly, with another minor fix,
sequence number must be increased even on repeated packets.


Just analyzing source code of astreisk regarding dtmf issues, I see that it needs to be completely revised, but it will take too much effort and time from me.
Patch on ASTERISK-6491 is just maybe small, but robust enough step to compliance with rfc2833.

By: myj (myj) 2006-04-22 23:55:13

I did some testing regarding passing along the 2833 events.

The setup is as follows:

Cisco ATA-188 -> Asterisk -> metaswitch

I held a digit for 5 seconds.

The Cisco sends 3 "Event On" packets, starting with a Timestamp increased 160 from the previous packet. Then it sends additional packets with increasing Duration up until the event end.

Asterisk->metaswitch stops sending RTP stream the moment it receives the RFC2833 event start, instead of forwarding such event onward. Asterisk then waits until it sees the "Event End" and at that moment fires 6 packets (3 on ,3 off with 800 duration), all of them with the SAME timestamp as last RTP packet. However, this timestamp is now 5 seconds in the past and a duplicate....

The following voice RTP packet timestamp jumps ahead by 5 seconds after that.

No wonder metaswitch gets confused....

How hard would it be for Asterisk to pass along the RFC2833 events as they come in, instead of interpreting/translating them in this broken way ?


Paul



By: damin (damin) 2006-05-23 05:29:35

Paul,
   It is going to be rather difficult to get Asterisk 1.2 to handle RFC 2833 relay events gracefully. The new VLDTMF code that Kevin has been working on is going to require some intense testing, but that SHOULD solve the problem. The new code will allow for a "start_digit" and "end_digit" routine that will receive and immediately send/forward a DTMF keypress, followed by continuation frames and wrapping up with an end of event marker. This will be a lot more interoperable w/ other DTMF implementations out there. This code is almost ready to be dropped into trunk, and will definitely make it in before 1.4 goes beta. We'll need testers to run it through it's paces!!!!!

By: Max Glucksmann (mglucksmann) 2006-08-18 22:16:18

Hello friends,

I have some clear captures showing the duplicate digits interpreted becuase of actual Digit Events received out of sequence between the End Events for the same digit pressed.

As a newbie, I do my best to learn and contribute, hoping that everyday I'll do better.

I have attached our captures and traces. I hope they'll help.
1) Bad Events Sequence Causes Duplicate Digits.gif
2) Event Between End Events Causes Duplicate Digit.cap
3) Duplicate Digits.txt

At our side we had been providing reliable calling card services until our SIP DID Provider started sending DTMF events in the wrong sequence. It's been the worst thing (seeing our users struggle to dial correctly)... It has really slowed down our launch.

With best regards,
Max
Asterisk SVN-branch-1.2-r39379



By: damin (damin) 2006-08-18 23:48:09

mglucksmann: As luck would have it, I was informed on IRC tonight that the last couple of obstacles in the new VLDTMF code have been cleaned up and we are now at the stage where we can begin testing and getting feedback on the new implementation. We would love to see if this solves your problem. The following instructions for checking out the new VLDTMF code was sent to me by Russell Bryant a few minutes ago

From russell@digium.com Sat Aug 19 00:46:36 2006
Date: Sat, 19 Aug 2006 00:34:03 -0400
From: Russell Bryant <russell@digium.com>
To: Greg Boehnlein <damin@nacs.net>
Cc: jcolp@digium.com
Subject: Testing VLDTMF

$ svn co http://svn.digium.com/svn/zaptel/team/group/vldtmf zaptel-vldtmf
$ svn co http://svn.digium.com/svn/asterisk/team/group/vldtmf asterisk-vldtmf
$ cd zaptel-vldtmf ; ./configure ; make ; sudo make install
$ cd ../asterisk-vldtmf ; ./configure ; make ; sudo make install

That should be all there is to it.  If you have any more questions,
please let us know!

By: Max Glucksmann (mglucksmann) 2006-08-19 02:09:57

Damin, Qwell,

I appreciate very much your advice, but I got a couple of issues after installing the following:

First, rm -rf /usr/lib/asterisk/modules/*

then

cd /usr/src/libpri-trunk
./configure; make clean; make install

cd /usr/src/zaptel-vldtmf
./configure; make clean; make install

cd /usr/src/asterisk-vldtmf
./configure; make clean; make install

CD /usr/src/asterisk-addons-trunk
./configure; make clean; make install

Then I had to disable the g729 and g723 Intel Codecs as they didn't work.

When finally Asterisk started, no accountcodes where read from the Realtime DB and no CDR was being stored either in the DB...

I'm going back to the Branch 1.2 version I had where everyhting worked and will test DTMF there again.

I'm really worried since now I have to face various vendors that already were reselling our calling cards.

Please keep present the comments posted on issue:
http://bugs.digium.com/view.php?id=7758

Thank you very much again.

Hoping to hear from you soon and with best regards,
Max



By: Serge Vecher (serge-v) 2006-08-21 08:37:00

mglucksmann: does the small patch you have posted in bug ASTERISK-7554, note 0050519 fix out-of-order DTMF issues in 1.2.x branch? If so, could you please make a proper patch and attach it to this bugnote? See this page please http://www.asterisk.org/developers/Patch_Howto. Also, please confirm your disclaimer status. Thanks.

By: Max Glucksmann (mglucksmann) 2006-08-25 01:21:09

Vechers: Disclaimer faxed and patch uploaded...

I'd like to note that the patch is already found in the work from Russell in the vldtmf svn, as Damin commented, from where I extracted it.

Once more thank you and I believe that most of the luck I've had is to have found such a great program and community.

With best regards,
Max

By: Serge Vecher (serge-v) 2006-08-25 08:36:41

mglucksmann: thanks. The reason I've asked for the patch was because the vldtmf branch will be merged into trunk, not 1.2. If this small enough change actually fixes some dtmf issues, I think it's worth considering for inclusion into 1.2 branch too.

Could some people monitoring this bug and having issues with DTMF apply mglucksmann's patch and report on the results, please?

By: Joshua C. Colp (jcolp) 2006-09-27 16:31:28

Okay folks here is the scoop - DTMF in 1.4 is vastly different to what it is in 1.2, and it can't be backported without A LOT of trouble... so I'm closing this bug. Please do give 1.4 a try as a beta, or wait for a release and open a new bug if you have issues with the implementation there. That is all and thanks all for holding on!