[Home]

Summary:ASTERISK-26544: res_rtp_asterisk: Delay in DTLS handshake causes audio setup delay
Reporter:Marcelo Gornstein (marcelog)Labels:
Date Opened:2016-11-01 11:27:04Date Closed:2017-01-09 14:10:31.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_rtp_asterisk
Versions:13.6.0 13.9.0 13.12.0 13.11.2 13.12.1 14.0.0 14.1.1 Frequency of
Occurrence
Constant
Related
Issues:
is duplicated byASTERISK-28018 IP Fragmentation happening instead of DTLS fragmentation on handshake server hello certificate
Environment:Amazon Linux: 4.4.23-31.54.amzn1.x86_64 #1 SMP x86_64 GNU/Linux OpenSSL: 1.0.2j LibSRTP: 1.5.4 SIPml5: e3152e1edf116b651de379b3cc971bf699787c26 (Fri Mar 4 09:47:48 2016 +0100) Chrome: 54.0.2840.71 (64-bit) FireFox: 49.0.2 Opera: 41.0 Online JSSip Demo at: https://tryit.jssip.net/ Amazon EC2 instanceAttachments:( 0) 54752_jira_asterisk_26544_v13_test.pcap
( 1) 54755_jira_asterisk_26544_v13_test_force_dtlsv1.pcap
( 2) cert.txt
( 3) dtls-handshake-audio-delay-asterisk-14.1.1.zip
( 4) dtls-handshake-audio-delay-asterisk-14.1.1-2.zip
( 5) dtls-handshake-audio-delay-asterisk-14.1.1-3.zip
( 6) freeswitch-bad.pcap
( 7) freeswitch-capture-call-without-delay.zip
( 8) freeswitch-ok.pcap
( 9) fs-dtls.crt
(10) fs-dtls.key
(11) jira_asterisk_26544_v14_force_dtlsv1.patch
(12) jira_asterisk_26544_v14_test.patch
(13) res_rtp_asterisk.c.rej
(14) screencapture-sslanalyzer-comodoca-1482194006200.png
(15) screencapture-ssllabs-ssltest-analyze-html-1480426789448.png
(16) sip_trace.txt
Description:Hello,

It seems that there is a delay in the audio setup when using WebRTC with latest Asterisk versions and latest browser versions (described in the Environment section).

Sometimes there is no delay, but most of the time the delay goes between 1 second to a couple of minutes.

This seems to be related to a delay in the DTLS connection handshake between Asterisk and the browser (although this is just a guess after trying to isolate the issue).

sip.conf
{code}
[100]
nat=force_rport,comedia
host=dynamic
type=friend
secret=secret
disallow=all
allow=g722
icesupport=yes
transport=wss
dtlsenable=yes
dtlsverify=no
dtlscertfile=/cert.crt
dtlsprivatekey=/cert.key
dtlssetup=actpass
videosupport=no
encryption=yes
avpf=yes
force_avp=yes
directmedia=no
canreinvite=no
context=wrtc
{code}

extensions.conf
{code}
[wrtc]
exten => _X.,1,Answer
same => n,Playback(tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys&tt-monkeys)
same => n,Hangup
{code}

rtp.conf
{code}
[general]
rtpstart=6000
rtpend=65535
icesupport=true
[ice_host_candidates]
x.x.x.x => y.y.y.y ; x.x.x.x is the internal IP, y.y.y.y is the external IP
{code}

[Edit by Rusty - removed excessive debug in description field as per issue tracker guidelines. Moving to attachment.]

FireFox shows this in its logs:
{code}
276467712[7f8929a4aec0]: [|WebrtcAudioSessionConduit] AudioConduit.cpp:612: GetAudioFrame
276467712[7f8929a4aec0]: [|WebrtcAudioSessionConduit] AudioConduit.cpp:716: GetAudioFrame GetAudioFrame:Got samples: length 320
{code}

But no audio is played until some (random) number of seconds pass, and this logs shows up:
{code}
261443584[7f89265334a0]: Flow[568ce410feebc53d:0,rtp(none)]; Layer[dtls]: PacketReceived(2001)
261443584[7f89265334a0]: Flow[568ce410feebc53d:0,rtp(none)]; Layer[dtls]: Checking digest, algorithm=sha-256
261443584[7f89265334a0]: Flow[568ce410feebc53d:0,rtp(none)]; Layer[ice]:  SendPacket(75) succeeded
261443584[7f89265334a0]: Flow[568ce410feebc53d:0,rtp(none)]; Layer[dtls]: ****** SSL handshake completed ******
261443584[7f89265334a0]: Flow[568ce410feebc53d:0,rtp(none)]; Layer[dtls]: ALPN not negotiated, selecting default
261443584[7f89265334a0]: /builds/slave/m-rel-m64-00000000000000000000/build/src/media/mtransport/transportlayerdtls.cpp:865: Flow[568ce410feebc53d:0,rtp(none)];
{code}

EDIT: I forgot to mention that looking at the output from chrome://webrtc-internals, I noticed that during the delay the browser is stalled at ICEConnectionStateChecking. As soon as the audio is connected, it goes to ICEConnectionStateConnected.

Without changing any browser or network settings whatsoever either in the Asterisk Box or the browser's box, sometimes it just works. But most of the time the delay is present.

With FreeSWITCH (cec0cb39830546a3a1c1df7ad7a05b05f14b8975 - Fri Oct 28 15:38:25 2016 -0500) works perfectly, every single time.

Any help is greatly appreciated!

Thank you.

Best regards,
Comments:By: Asterisk Team (asteriskteam) 2016-11-01 11:27:05.607-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Rusty Newton (rnewton) 2016-11-01 13:43:29.888-0500

Moved excessive SIP trace from description field to attachment: sip_trace.txt

Please don't paste large amounts of debug into the *description* field. The description field is meant to hold a *description* of the issue.

JIRA provides the ability to attach files to the issue. Please attach debug with a txt extension.  

If debug is inline inside the description it results in massively long E-mail notifications which are not fun.

Here is the guidelines: https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines


By: Marcelo Gornstein (marcelog) 2016-11-01 14:24:02.580-0500

Got it, thank you Rusty.

By: Marcelo Gornstein (marcelog) 2016-11-08 04:57:28.615-0600

Hi guys!

Any thoughts on this one?

Thank you.

By: Joshua C. Colp (jcolp) 2016-11-08 05:03:15.970-0600

Does this happen with chan_pjsip? It has different timing conditions of when it starts the ICE negotiation process over chan_sip. If it works on chan_pjsip, then it is likely a problem in chan_sip.

As well in the future asking for thoughts and such doesn't really help - any comments/questions/etc will be posted on here and if this issue is triaged and accepted there is no timeframe on when it will be fixed.

By: Marcelo Gornstein (marcelog) 2016-11-08 05:10:39.747-0600

Thank you Joshua.

Yes, chan_pjsip is also affected by this issue, and it's an absolute deal breaker when using Asterisk with WebRTC.

By: Joshua C. Colp (jcolp) 2016-11-08 14:25:40.786-0600

Can you attach a wireshark capture with the DTLS negotiation present as well as a corresponding Asterisk debug log so the timing can be seen?

By: Marcelo Gornstein (marcelog) 2016-11-15 07:28:14.692-0600

I'm attaching a ZIP file with 5 calls:

* 1st: No audio at all
* 2nd: Normal audio
* 3rd: No audio at all
* 4th: Normal audio
* 5th: Audio after a couple of seconds of delay

The ZIP includes
* Asterisk console log with debug for sip, rtp, rtcp, and stun
* PCAP at the Asterisk side
* PCAP at the browser's side (FireFox)
* FireFox debug log

Thank you. Let me know if this helps.

Best regards.



By: Joshua C. Colp (jcolp) 2016-11-16 19:18:39.718-0600

Your provided logs do not contain debug level information.

Can you ensure that debug is set to go to a file in logger.conf and also that "core set debug 9" has been done?

By: Marcelo Gornstein (marcelog) 2016-11-16 19:48:32.302-0600

I'm attaching a call with a couple of seconds of delay. The following commands have been run in the console:

*CLI> sip set debug on
*CLI> rtp set debug on
*CLI> rtcp set debug on
*CLI> stun set debug on
*CLI> core set debug 9
Core debug was OFF and is now 9.

Thank you.

By: Marcelo Gornstein (marcelog) 2016-11-16 20:01:23.634-0600

Sorry, this one contains the debug output in console and the asterisk messages log file. A (this time) small delay is present in the beginning of the call.

By: Joshua C. Colp (jcolp) 2016-11-29 07:12:08.361-0600

[~marcelog] How did you generate the certificate for Asterisk and what are its attributes?

By: Marcelo Gornstein (marcelog) 2016-11-29 07:44:05.126-0600

It's the original, generated by Comodo.

I'm attaching the output for the command: openssl x509 -in /webrtc/certs/STAR_vip2phone_net.crt -text

and also a capture of the analysis by ssl-labs.

Why should the certificate should be an issue if sometimes it just works right away, sometimes it has a delay, and sometimes it will just not work at all?

Shouldn't it be a consistent behavior?


By: Joshua C. Colp (jcolp) 2016-11-29 07:47:03.543-0600

The wireshark capture shows fragmentation and issues around the certificate itself, which eventually resolve themselves after a period of time. It may be attributes of the certificate itself that we aren't handling well or properties of it causing the resulting DTLS traffic to be different than expected or seen with other certificates. To give the best chance of fixing this I wanted details on it, as there is a LOT involved and so having more data is great.

By: Marcelo Gornstein (marcelog) 2016-11-29 08:04:52.087-0600

Gotcha. Thank you very much for your help.

By: Richard Mudgett (rmudgett) 2016-12-09 17:05:25.539-0600

Asterisk is sending the DTLS handshake frames in fragments.  On the corresponding browser side pcap not all fragments are being received and thus the handshake gets stalled waiting on frame retransmissions.  It looks like there may be some device like a NAT router not passing all of the fragments in the path.

By: Richard Mudgett (rmudgett) 2016-12-09 17:08:56.346-0600

Could you provide a pcap of a call using FresSwitch to see what is happening with their DTLS handshake.

By: Marcelo Gornstein (marcelog) 2016-12-09 17:11:57.620-0600

Is there any way for you guys to not block the work on this by waiting for my capture?

By: Richard Mudgett (rmudgett) 2016-12-09 17:19:53.132-0600

The DTLS handshake exchange is handled by OpenSSL and fragmenting is done by the IP stack.  All Asterisk does for the DTLS handshake is set it up and tell OpenSSL to start it.  I'll look to see if there is  some option available in OpenSSL that may help.

By: Marcelo Gornstein (marcelog) 2016-12-10 07:21:21.439-0600

I'm attaching both PCAPs (browser side and freeswitch side) of 1 call without the issue.

As stated before, the issue does not manifest itself when using freeswitch, using same computer, same browser, same ISP, same router, and NAT configurations in both the VoIP box and browser's.

EDIT: ...and of course, using the same SSL certificate.

Thank you!

By: Richard Mudgett (rmudgett) 2016-12-14 18:45:31.153-0600

The freeswitch pcap shows that they are using DTLSv1.0 for the key exchange which sends smaller packets.  Asterisk is using DTLSv1.2 for the key exchange which is sending larger packets that get fragmented by the IP stack on the Asterisk server.  These fragments seem to be mishandled on the way to the far end.  Only when they successfully get passed to the far end does the audio connection get established.

[^jira_asterisk_26544_v14_test.patch] - Test patch to determine if setting the MTU will make any difference in how OpenSSL sends the key exchange packets.

Please test and get a pcap on the Asterisk side as you have a setup with a large certificate.  Thanks.

Apply the patch in the root source directory of Asterisk, recompile and install.
patch -p1 -i jira_asterisk_26544_v14_test.patch

By: Marcelo Gornstein (marcelog) 2016-12-14 18:57:42.051-0600

Thank you Richard.

Question: Could you guys test this on your own and achieve a full solution on your side first, so we can reduce the time needed for this going back and forth?

Thanks!

By: Marcelo Gornstein (marcelog) 2016-12-14 19:10:23.557-0600

Hi Richard,

Asterisk 14.1.1 failed with

# patch -p1 -i 54752_jira_asterisk_26544_v14_test.patch
patching file res/res_rtp_asterisk.c
Hunk #1 succeeded at 1331 with fuzz 1 (offset -5 lines).
Hunk #2 FAILED at 1485.
Hunk #3 FAILED at 1501.
Hunk #4 FAILED at 1546.
3 out of 4 hunks FAILED -- saving rejects to file res/res_rtp_asterisk.c.rej


By: Marcelo Gornstein (marcelog) 2016-12-14 19:30:32.293-0600

Same result for Asterisk 14.2.1. The patch is rejected.



By: Marcelo Gornstein (marcelog) 2016-12-14 20:14:30.563-0600

Hello Richard,

Sorry about that, it was my bad, I've copy&pasted the patch and something must have been wrong. I've downloaded the file with wget and the patch was applied correctly to a v13.

I'm attaching the result. It seems the same issue is still present.

Is there any way to force the use of dtls 1.0 so we can leverage the configuration with freeswitch and see if that helps?

Thanks for your help!

Best regards,

By: Richard Mudgett (rmudgett) 2016-12-15 17:51:35.788-0600

There is no config option to force DTLSv1.0 usage.  The code is setup to use any that is available to the OpenSSL installed.

[^jira_asterisk_26544_v14_force_dtlsv1.patch] - Force the use of DTLSv1.0 instead of any DTLS version.

Please see if this works and attach a pcap.

By: Marcelo Gornstein (marcelog) 2016-12-15 19:22:58.200-0600

Hello Richard,

The issue is still present, I'm attaching a capture of a call without audio.

The patch was applied to a v13 Asterisk.

Thanks for your help so far!

Best,

By: Richard Mudgett (rmudgett) 2016-12-19 17:48:00.858-0600

The pcap you sent for the Freeswitch DTLS negotiation is not using the same cert as with the Asterisk pcaps.  The cert has "FreeSWITCH" strings in it and is a much smaller cert than the ones used in the Asterisk pcaps.  The Asterisk pcaps have certs with strings like "Greater Manchester" and "vip2phone.net" in them.

The last patch to force DTLSv1.0 did work as intended because the pcap showed us using DTLSv1.0 where we would have used DTLSv1.2.  The DTLSv1.0 vs DTLSv1.2 difference between Freeswitch and Asterisk therefor isn't significant.  Looking at the Freeswitch code, I don't see any significant differences in how they setup for DTLS.  Of more significance is that they are using the same OpenSSL BIO (BIO_s_mem).  This BIO does not support pre-fragmenting the DTLS packets and they pass these packets to the IP stack in a similar manner.  Therefore, you will have the same issue with Freeswitch when actually using the larger cert files.

Restating from earlier, the IP stack on the sending machine has to fragment the large cert exchange packets.  This is seen in the pcaps on the Asterisk side.  Not all fragments make it through to the other side.  When they do make it you get connected.  This is seen in the pcaps on the browser side.

By: Marcelo Gornstein (marcelog) 2016-12-19 18:17:40.682-0600

Hello Richard,

That's odd. Here's the config file for Asterisk:
{code}
dtlscertfile=/webrtc/certs/STAR_vip2phone_net.crt
dtlsprivatekey=/webrtc/certs/vip2phone.net.key
{code}

And this is the one for FreeSWITCH:
{code}
     <param name="tls-cert-dir" value="/webrtc/certs"/>
{code}

In the directory /webrtc/certs I have wss.pem, which is formed by

{code}
cat STAR_vip2phone_net.crt vip2phone.net.key > wss.pem
{code}

And that's the file being used by FreeSWITCH. I don't have a certificate generated specifically for this, and I haven't included the string FreeSWITCH on it either.

Any other thoughts to track this down?

Thanks!

By: Marcelo Gornstein (marcelog) 2016-12-19 18:34:43.410-0600

I'm attaching a screenshot of what is reported by [https://sslanalyzer.comodoca.com/?url=https%3A%2F%2Fmg-webrtc.vip2phone.net%3A7443] when querying the websocket port of FreeSWITCH

By: Marcelo Gornstein (marcelog) 2016-12-19 19:06:41.728-0600

Hi Richard,

Could you compare these two now please?

I've replaced the file etc/freeswitch/tls/dtls-srtp.pem with the wss.pem file.

Thank you!

By: Marcelo Gornstein (marcelog) 2016-12-19 19:46:47.563-0600

Hello Richard,

So you were absolutely right. The location of the cert used for the DTLS setup wasn't well documented, so I completely missed that file in FS.

I grabbed the original etc/freeswitch/tls/dtls-srtp.pem and split it into a .crt and .key file, setup  the asterisk sip.conf file with these files and it seems to work perfectly.

Also, when using FreeSWITCH and overwriting their original dtls-srtp.pem file with wss.pem I can reproduce this problem too.

I'd like my team to make the final tests with this FS certificate with some customers before closing this one, and also would like to know your thoughts on where's the bug here, because with the original Comodo cert this issue can be reproduced consistently in different networks, in different countries, different ISPs, etc.

Although it does seem that both Asterisk and FreeSWITCH will behave the same now.

Thank you very much for your help!

By: Richard Mudgett (rmudgett) 2016-12-20 12:47:04.025-0600

Ideally the network would be able to handle the fragmented IP packets and there wouldn't be any issue.  Since fixing the internet isn't practical, the best place to fix the issue would be a more complete OpenSSL BIO_s_mem that adds some of the MTU handling of BIO_s_datagram so the outgoing memory buffer can be made a fixed size to simulate MTU.  That way the OpenSSL code that creates the UDP DTLS payload will automatically create the needed DTLS packet series.  Asterisk and Freeswitch would just need to set a MTU and pass the buffer on to the socket shared by the encrypted RTP stream.  The last place to fix the issue is for Asterisk and Freeswitch to take the large buffer from BIO_s_mem and split the payload into the needed DTLS packet series.  The down side of the last approach is that Asterisk and Freeswitch would now need to know how the DTLS packets are formatted and how to split them when they are too large.

By: Marcelo Gornstein (marcelog) 2017-01-09 14:10:18.707-0600

Thank you guys, we are now running an environment where the SIP signalling (http.conf) is using our original certificate, while at the same time using the FreeSWITCH certificate for DTLS. Not ideal, but it works and perhaps this can be used in other scenarios where the same problem is present as a workaround. I'm not sure if this should be an issue to open to the OpenSSL guys or others.

In any case, thanks for your awesome help on this one!

Best regards