[Home]

Summary:ASTERISK-23213: SIP over WS: Audio problems when upgrading to 11.8 from 11.7 with endpoints behind NAT
Reporter:Andrea Suisani (s1ckpig)Labels:
Date Opened:2014-01-29 10:16:26.000-0600Date Closed:2014-02-28 10:35:33.000-0600
Priority:CriticalRegression?Yes
Status:Closed/CompleteComponents:Resources/res_rtp_asterisk
Versions:11.8.0 Frequency of
Occurrence
Constant
Related
Issues:
is related toASTERISK-23026 [patch]Asterisk should send STUN messages using role 'ICE-CONTROLLING' when being offerer of SDP
is related toASTERISK-22911 [patch]Asterisk fails to resume WebRTC call from hold
is related toASTERISK-22938 Asterisk Crashes with assert fail for 'ype <= PJ_ICE_CAND_TYPE_RELAYED'
Environment:Ubuntu 12.04 LTS 64 bit. JsSIP WebRTCAttachments:( 0) asterisk11-118_broke_ice.patch
( 1) extensions.conf
( 2) full_log_not_works.txt
( 3) full_log_works.txt
( 4) http.conf
( 5) ice_base_addr.patch
( 6) ice_candidates_excessive_clearing_with_debug.patch
( 7) ice_candidates_excessive_clearing.patch
( 8) issue_23213_BAD_asterisk_11_8_with_rtp_debug.txt
( 9) issue_23213_GOOD_astersisk_11_7_with_rtp_debug.txt
(10) messages_log_not_works.txt
(11) messages_log_works.txt
(12) pjsip.conf
(13) rtp.conf
(14) rtp.conf
(15) sip_peer.txt
(16) sip.conf
(17) sip.conf
(18) users.conf
Description:Hi all,

We're testing the 11.8.0-rc1 with the aim to deploy it on our production servers. During this test we've found out that for some strange reason the audio of our phone calls stop working. Neither the callee nor the caller here anything.

I want briefly describe our setup. We're using a web soft-phone developed using JsSIP and WebRTC. Such application is connected to asterisk using websocket, than asterisk routes the call to the endpoint using a VoIP SIP provider.

Both the astersik server and the webapp are on the same LAN, and both are connected to the internet through NAT.

Everything works as expected as long as we use Asterisk <= 11.7.0, as soon as we move to 11.8.0-rc1 the audio simply goes away.

I can reproduce the problem at will.

I've looked at the rtp debug output both for a working session (11.7)  and a not working session (11.8). In the former case we have something like:

{noformat}
Got  RTP packet from    80.xxx.xxx.xxx:10244 (type 18, seq 042964, ts 084000, len 000020)
Sent RTP packet to      192.168.1.50:44635 (via ICE) (type 00, seq 049633, ts 094560, len 4294967284)
Got  RTP packet from    80.xxx.xxx.xxx:10244 (type 18, seq 042965, ts 084160, len 000020)
Sent RTP packet to      192.168.1.50:44635 (via ICE) (type 00, seq 049634, ts 094720, len 4294967284)
Got  RTP packet from    80.xxx.xxx.xxx:10244 (type 18, seq 042966, ts 084320, len 000020)
Sent RTP packet to      192.168.1.50:44635 (via ICE) (type 00, seq 049635, ts 094880, len 4294967284)
{noformat}

whereas in the latter we got:

{noformat}
Got  RTP packet from    80.xxx.xxx.xxx:15456 (type 18, seq 062132, ts 065600, len 000020)
Sent RTP packet to      78.zzz.zzz.zzz:43994 (type 00, seq 024953, ts 073760, len 000170)
Got  RTP packet from    80.xxx.xxx.xxx:15456 (type 18, seq 062133, ts 065760, len 000020)
Sent RTP packet to      78.zzz.zzz.zzz:43994 (type 00, seq 024954, ts 073920, len 000170)
Got  RTP packet from    80.xxx.xxx.xxx:15456 (type 18, seq 062134, ts 065920, len 000020)
{noformat}

as you might see when the audio works properly, the rtp connection is established between the IP address of our SIP VoIP provider (80.xxx.xxx.xxx) and IP address of the dev box 192.168.1.50 where the webapp is running (via ICE).

On the other case using asterisk 11.8  the audio does not flow properly because one of the end-point is the box associated with public IP address configured  on the router that is NATting all the outgoing traffic of the development LAN (78.zzz.zzz.zzz), instead of being the IP of the dev box from which we issued the call first place.


edit 1: I forgot to mention that I've used the same exact configuration both for asterisk 11.7 and 11.8-RC1
Comments:By: Matt Jordan (mjordan) 2014-01-29 10:38:06.857-0600

We require a complete debug log to help triage the issue. This document will provide instructions on how to collect debugging logs from an Asterisk machine for the purpose of helping bug marshals troubleshoot an issue: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

Please attach your sip.conf, rtp.conf, as well as the full DEBUG log requested in the link.

By: Andrea Suisani (s1ckpig) 2014-01-29 10:52:13.153-0600

attached sip.conf, rtp.conf and debug log session for 11.8-RC1

By: Andrea Suisani (s1ckpig) 2014-01-29 11:12:19.412-0600

i forgot to mention that we're using ARA to store user account definition. I'm out office now  tomorrow I'l post the typical user settings we use

By: Matt Jordan (mjordan) 2014-01-29 11:22:59.813-0600

I'm a little confused here. In your issue description, you state that the RTP debug showing media going to the private IP address works, i.e., that's the desired flow of media. Then, in the log you posted, you show an INVITE request where Asterisk places the private IP address in the SDP of the INVITE request. Furthermore, you have NAT disabled for the peer, so that would be expected:

{noformat}
[Jan 29 17:48:30] DEBUG[21167][C-00000000] chan_sip.c: NAT detected for 192.0.2.108 / 192.168.1.50
[Jan 29 17:48:30] DEBUG[21167][C-00000000] chan_sip.c: Setting NAT on RTP to Off
{noformat}

Since there's no RTP debug in the log, I'm not sure where Asterisk is actually sending the media. Can you provide a log file with:
# RTP debug enabled, as well as SIP debug
# What the desired flow of media is, and where you feel the media is incorrect
# A dump the of the database rows for the peer in question

By: Andrea Suisani (s1ckpig) 2014-01-29 13:46:49.317-0600

Hi Matt,

I was in hurry and I've only posted the log for the problematic case, i.e. using asterisk 11.8-RC1. Tomorrow morning (Central EU time), as soon as I'll get to work I'll post the debug log also for the working setup (i.e. using asterisk 11.7,
with rtp debug enabled).  

In any case let me summirise my case here:

- asterisk 11.7: every works as expected, RTP flow is setted up correctly. One
                 end-point is our voip provider host, the other my dev box (via ICE).

 

- 11.8-RC1: call is placed correctly, but the mdeia flow is established between
            the voip provider host and the public IP we use on our router, hence
            callee and caller don't hear anything.

Tomorrow I'll provide the thing you ask in point 1 and 3. In regards of desired media
flow, this is what it works and what I think is correct:


{noformat}
Got  RTP packet from    80.xxx.xxx.xxx:10244 (type 18, seq 042964, ts 084000, len 000020)
Sent RTP packet to      192.168.1.50:44635 (via ICE) (type 00, seq 049633, ts 094560, len 4294967284)
{noformat}

80.xxx.xxx.xxx is the IP address of our voip provider and 192.168.1.50 is my dev box IP.

Whene we use asterisk 11.8 the media flow is between 80.xxx.xxx.xxx and 78.zzz.zzz.zzz
(the public IP setted on our WAN interface) and the result is a "mute call".

Another worth thing to note is that in the no working case there's no mention of ICE
whatsoever.


By: Andrea Suisani (s1ckpig) 2014-01-30 02:54:50.044-0600

requested debug log files

By: Andrea Suisani (s1ckpig) 2014-01-30 02:55:26.527-0600

sip peer definition

By: Andrea Suisani (s1ckpig) 2014-01-30 02:56:08.499-0600

I've just provided the requested log files

By: Matt Jordan (mjordan) 2014-01-30 15:28:34.542-0600

Please hit "Send Back" when you've entered feedback. That way the issue will show up again for Bug marshals. Thanks!

By: Andrea Suisani (s1ckpig) 2014-01-31 02:17:00.442-0600

Sorry Matt, I'm quite new to jira and didn't know I've to do something like this to trig the notification.

A part from that if you need any other info just let me know.

By: Lott Caskey (lottc) 2014-01-31 12:27:51.897-0600

I can confirm this is happening in 11.8.0-rc1, though I am using SIPml5. Reverting back to 11.7 resolved the issue.

By: Andrea Suisani (s1ckpig) 2014-02-05 02:01:26.317-0600

Matt is there anything else I could to help to solve the issue?  

By: Lott Caskey (lottc) 2014-02-07 05:05:40.323-0600

Well, I have found a solution that works for my sipML5 stack...It should work for your too.


So, in 11.8.0-rc1, this block was added, which is what is seemingly causing the audio to not connect with an established line.  My above patch will comment out the block and allow your web clients to work again!



"asterisk-11.8.0-rc1/res/res_rtp_asterisk.c"  Starting at line 560.

/* even though create check list failed don't stop ice as
  it might still work */
  ast_debug(1, "Failed to create ICE session check list\n");
/* however we do need to reset remote candidates since
  this function may be re-entered */
  ao2_ref(rtp->remote_candidates, -1);
  rtp->remote_candidates = NULL;
  rtp->ice->rcand_cnt = rtp->ice->clist.count = 0;

By: Lott Caskey (lottc) 2014-02-07 05:11:57.215-0600

This only comments out the recently added lines wanting to clear the remote candidates.  Once cleared, neither sipml5 or jssip can receive audio.

By: David Brillert (aragon) 2014-02-07 08:44:38.230-0600

So your patch reverts this code
http://svnview.digium.com/svn/asterisk?view=revision&revision=405234

Revision 405234 - Directory Listing
Modified Thu Jan 9 16:49:55 2014 UTC (4 weeks ago) by kharwell

res_rtp_asterisk: Fails to resume WebRTC call from hold

In ast_rtp_ice_start if the ice session create check list failed, start check
was never initiated and ice_started was never set to true.  Upon re-entering
the function (for instance, [un]hold) it would try to create the check list
again with duplicate remote candidates.

Fixed so that if the create check list fails the necessary data structures
are properly re-initialized for any subsequent retries.

Note, it was decided to not stop ice support (by calling ast_rtp_ice_stop) on a
check list failure because it possible things might still work.  However, a
debug message was added to help with any future troubleshooting.

(closes issue ASTERISK-22911)
Reported by: Vytis Valentinavičius
Patches:
    works_on_my_machine.patch uploaded by xytis (license 6558)


By: Andrea Suisani (s1ckpig) 2014-02-07 08:54:46.291-0600

It's not a complete reverse, cause it keeps the "return;" statement on line 557 from vytis patch.

By: David Brillert (aragon) 2014-02-07 08:58:20.170-0600

That's Ok, I'm just trying to help the devs track down the regression.
Good work!

By: Lott Caskey (lottc) 2014-02-07 09:40:05.689-0600

The whole previous patch could be reverted.

For now I left the return in the previous block and the debug message, which was ultimately how I discovered exactly where the problem was.


Andrea, did this fix also work for you on jssip?

By: Andrea Suisani (s1ckpig) 2014-02-07 10:48:26.824-0600

It did, I've just tried and I can confirm that now the audio works properly.

By: Martin Harčár (martinhari) 2014-02-10 08:41:50.802-0600

I am using trunk version of asterisk revision 397563 via sipML5. Video works fine, but audio is not working at all. I did a try with your patch, but still is not working. I also try to install asterisk version 11.8 but seems that patch to get video works via webrtc client is not included in this release. Is there a way to get audio and video works when it is used with sipML5 or jssip.net
Thanks in advance.

By: Martin Harčár (martinhari) 2014-02-14 04:02:09.932-0600

Hi gyus,

I am using asterisk 12.0.0 to connect two webrtc client via jssip project. Sometimes it is working fine audio and video call but often it is working and just audio,video in one way or only video is received on both side. Could you look on logs if you find something wrong in my configuration or ...
For working session is attached these logs full_log_works.txt and messages_log_works.txt.
In case where it was not working is attached these logs full_log_not_works.txt and messages_log_not_works.txt.
In both cases the configuration files is same. Attached is configuration files with my changies.

Thanks in advance.  

By: Andrea Suisani (s1ckpig) 2014-02-14 04:46:22.437-0600

@Martin if you're using chrome just look at chrome://webrtc-internals/ during a phone call, scroll down to the bottom and look for a section that looks like:

{noformat}
Statistics ssrc_3256662663
cname:8tJDbKycMOQirlGL
msid:8wHT1kyxQh1LADX1cMMf8kmTK5UEh8qj8NQU 98761d00-ab84-4885-abf7-d7609e492885
mslabel:8wHT1kyxQh1LADX1cMMf8kmTK5UEh8qj8NQU
label:98761d00-ab84-4885-abf7-d7609e492885
timestamp Fri Feb 14 2014 11:40:48 GMT+0100 (CET)
ssrc 3256662663
googTrackId 98761d00-ab84-4885-abf7-d7609e492885
transportId Channel-audio-1
audioInputLevel 34
bytesSent 108043
packetsSent 1492
...
{noformat}

if audioInputLevel is setted to -1 that means that chrome is at fault because for some strange reason is not able to deal with your microphone correctly. If that happens caller is able to hear the callee voice coming in, but callee won't be able hear the caller voice.

By: Martin Harčár (martinhari) 2014-02-14 05:56:54.353-0600

@Andrea
I just check it and in case that audio was not working InputAudioLevel is set to 0. Do you want some logs from chrome?

By: Andrea Suisani (s1ckpig) 2014-02-14 06:02:57.404-0600

Yes just attach the relevant section of  chrome://webrtc-internals/ here.

Which version of chrome are you using and which version of JsSIP?

By: Matt Jordan (mjordan) 2014-02-14 06:19:25.711-0600

Just so you all are aware, this issue looks to have been caused by the ICE changes in 11.8.0-rc1. That is not related to any issue may be seeing in 12.0.0, as those ICE changes are not in that release.

Please don't muddy this issue with other, unrelated problems. As the issue tracker is not a support forum, you may be better served by taking this discussion to the Asterisk forums or to the asterisk-users mailing list.

By: Andrea Suisani (s1ckpig) 2014-02-14 06:30:52.391-0600

sure you're right, sorry for derailing the thread issue.

By: Martin Harčár (martinhari) 2014-02-14 07:17:35.386-0600

ok, sorry for that...
Andrea where I can post this issue?

By: Andrea Suisani (s1ckpig) 2014-02-14 07:35:51.639-0600

[one last comment on that topic,  I promise ]

I'd say you can report or to webrtc (https://code.google.com/p/webrtc/) or to the jssip guys. also you can send me the chrome log to suisani at opinioni dot net.

By: Jonathan Rose (jrose) 2014-02-18 11:27:27.616-0600

Hey everyone, I've created a patch (ice_candidates_excessive_clearing.patch) which I think undoes the more aggressive ICE candidate clearing from the ASTERISK-22911 patch while still addressing the holding problem that the patch was trying to fix. To put it bluntly though, I'm not experienced at dealing with NATs and this was really my first exposure to programming anything involved with PJNATH.

Please try this patch out and tell me if it fixes the regression. Also if the patch causes any problems, let me know if you would.

By: Andrea Suisani (s1ckpig) 2014-02-19 02:45:53.321-0600

the patch apply cleanly.
I've tested it and the audio is flowing as expected.

good work guys! really appreciated.

By: Andrea Suisani (s1ckpig) 2014-02-19 02:46:31.403-0600

the latest patch seems to fix the problem on my test env. as soon as I've a chance I will test it on production.

By: Lott Caskey (lottc) 2014-02-21 05:46:25.992-0600

The ice_candidates_excessive_clearing.patch patch also works on my system, using SIPml5...with one exception. If the patch in ASTERISK-23026 is applied, a segfault occurs...it seems to run fine without it anyway.


I have 4 suggestions, that should make it easier to catch these issues in the future:

1. Add a debug into the PJ_SUCCESS condition:
{noformat}
   ...
   if (res == PJ_SUCCESS) {
       ast_debug(1, "ICE session check list success.\n");
   ...
{noformat}


2. Add a debug into the PJ_ETOOMANY condition:
{noformat}
   ...
   } else if (res == PJ_ETOOMANY) {
       ast_debug(1, "ICE session check list has more ice candidates than it can handle.\n");
   ...
{noformat}


3. Add an else to print debug on unhandled returns:
{noformat}
   ...
   } else {
       ast_debug(1, "ICE session check list failure. (%d)\n",res);
   }
   ...
{noformat}


4. Change the NOTICE to debug, as not to spam:
{noformat}
   ast_debug(1, "Learning mode set by on_ice_complete.\n");
{noformat}



See attached file: ice_candidates_excessive_clearing_with_debug.patch

By: Andrea Suisani (s1ckpig) 2014-02-24 09:47:19.862-0600

speaking about segfault I just want to add that asterisk 11 branch is still suffering of ASTERISK-21696 (and its various incarnations: ASTERISK-20762, ASTERISK-22889)

By: Jonathan Rose (jrose) 2014-02-24 11:09:04.888-0600

Oh, sorry, that log message about the learning mode was just leftover diagnostic stuff and will be removed.  I'll go ahead and take your advice on adding the debug messages for the check list building result though.

By: Jonathan Rose (jrose) 2014-02-27 10:42:55.577-0600

Hey, I have some news. Josh Colp created a patch that should fix this regression in a much more complete fashion.  Basically, it turned out we weren't populating the local SRFLX candidates with the appropriate base address and this was causing checklist pruning to fail which led to broken ICE sessions from the get-go when STUN is in use. I've posted the patch to the following reviewboard page:
https://reviewboard.asterisk.org/r/3256/

Have a look, and if you have comments, please don't be shy. Additional testing is appreciated.

I'll also be posting the patch to this issue as ice_base_addr.patch

By: JoshE (n8ideas) 2014-02-27 17:31:23.422-0600

I backported Mr Colp's fix, combined with ASTERISK-21383 and this seems to have resolved the audio issues with hold/resume ICE candidate handling.

By: Andrea Suisani (s1ckpig) 2014-02-28 03:18:15.835-0600

@Jonathan Rose.

I've tested the new ice_base_addr patch. I've applied it to a fresh 11.8-rc1.tar.gz source tree. I can confirm that it worked. The audio flows correctly in both ways.
Hence it seems to work without the need of ice_candidates_excessive_clearing_with_debug.patch to be applied.

edit:


I've just noticed that the patch attached to this thread differs from the one applied to 11.8.0-rc2 tags. The one applied to the svn tags has this change:

{noformat}  
- if (!ast_stun_request(rtp->s, &stunaddr, NULL, &answer)) {
+ if (!ast_stun_request(component == AST_RTP_ICE_COMPONENT_RTCP ? rtp->rtcp->s : rtp->s, &stunaddr, NULL, &answer)) {
+ pj_sockaddr base;
{noformat}

whereas the one I've tested does not change the if condition:

{noformat}
if (!ast_stun_request(rtp->s, &stunaddr, NULL, &answer)) {
+ pj_sockaddr base;
{noformat}

is it normal?


edit2:

I've also tried the latter version of the patch (http://goo.gl/d9h3kO) and it worked.






By: Jonathan Rose (jrose) 2014-02-28 10:35:18.300-0600

Yes, this is normal.  That change was added after I posted the patch here. It's still an ongoing work in progress and I still have more ICE stuff in the pipe, so more changes are expected in the near future. Hopefully for the better, but you might want to track ASTERISK-22911 until I get that to closure as well.

Since everyone seems to be reporting that this issue is fixed though, I'll go ahead and close it.  Thanks for the feedback everyone.