[Home]

Summary:ASTERISK-22071: chan_sip doesn't respect Via ..completely
Reporter:Alex Zarubin (az_tth)Labels:
Date Opened:2013-07-12 14:21:40Date Closed:2013-08-20 21:14:01
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:1.8.23.0 11.4.0 Frequency of
Occurrence
Constant
Related
Issues:
is caused byASTERISK-20904 RFC1918 NAT Issue On Prune
is duplicated byASTERISK-22314 Failure in canceling a call, sending OK to wrong port
Environment:CentOS 6.4Attachments:( 0) asterisk-11.{2.1,4.0}-channels-chan_sip.c.diff
( 1) ASTERISK-22071.patch
( 2) asterisk-22071-store-recvd-address.diff
( 3) AsteriskJira22071.txt
( 4) issue_22071_0715_11.5.0-rc2
( 5) issue_22071_gen
( 6) issue_22071_log11.2.1
( 7) issue_22071_sipsettings
( 8) sip.conf
( 9) sipsettings
Description:The same exact sip call works in 11.2.1 and fails in 11.4. Outbound call is established normally via proxy. BYE comes from proxy with <proxy ip/port> in first Via header. 11.4 adds received=<carrier ip/port> and sends OK there. BYE is repeated several times.
Will try to attach sip.conf and trace once this jira issue is created. Thank you.
Comments:By: Michael L. Young (elguero) 2013-07-13 22:07:47.513-0500

I notice that you have nat=no in the [general] context but see this in your log:

{quote}
<--- Transmitting (NAT) to 5.6.7.8:5060 --->
{quote}

Can you try putting nat=no in the peer context if nat is supposed to be turned off?

Also, canreinvite has been deprecated since 1.6.2.x, if memory serves me correctly.  You should be using directmedia instead.

One other thing, can you try 11.5.0-rc1?  There were some bug fixes made in regards to the nat settings not being applied properly.

If these suggestions do not resolve this, please attach a full debug log to this issue to help triage it; https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

Thanks

By: Alex Zarubin (az_tth) 2013-07-14 18:18:15.596-0500

Hi Michael,

1. Putting nat=no in the peer context doesn't help. It changes the behavior though, BYE is the same as before but OK to BYE is transmitted with 'no NAT' and goes to <carrier ip address>:<proxy port>
<--- Transmitting (no NAT) to 5.6.7.8:5062 --->
SIP/2.0 200 OK
Via: SIP/2.0/UDP 1.2.3.4:5062;branch=z9hG4bK-whatever-cs7OSENFJJ.0;received=5.6.7.8
Via: SIP/2.0/UDP 5.6.7.8:5060;branch=z9hG4bK-whatever-KAIyFLLQ9Z.0
Via: SIP/2.0/UDP 5.6.7.100:5060;branch=z9hG4bK-whatever-carrier-3682011348752653

2. The latest release 11.5.0-rc2 has the same issue. Logs are consistent with 11.4.0, whether we have nat=no globally (in the general context only) or add nat=no into the peer context.

3. We'll change canreinvite=no to directmedia=no (chan_sip.c still processes canreinvite=no though)

Thank you.

By: Alex Zarubin (az_tth) 2013-07-14 19:21:11.492-0500

Attaching l1.5.0-rc2 log for the case when nat=no is set in the general context and is not set for the peer.

By: Michael L. Young (elguero) 2013-07-14 20:05:56.474-0500

Alex,

Can you post the output of "sip show settings"?

Thanks

By: Alex Zarubin (az_tth) 2013-07-15 01:25:28.053-0500

Attaching output of "sip show settings".

By: Michael L. Young (elguero) 2013-07-15 09:25:47.579-0500

Alex,

Can you double check your sip.conf settings in regards to nat in the [general] context?  Perhaps you can attach a sanitized version of the sip.conf settings (not sure if the one you attached in the first attachment was the entire file or not).  Right now, not sure what can be triggering the following observation.

{quote}
Record off feature:     automon
 Force rport:            Auto (No)
 DTMF:                   rfc2833
{quote}

If you have nat=no in the [general] context, Force rport should be set to plain "No", not Auto (No).  Auto is the default setting when nothing is specified in the [general] section.

To double check this, I set nat=no on a machine running the latest code and it appears to be functioning properly.

Please check that setting and report back.  Based on the debug trace of the call in the file [^issue_22071_gen], nat is being turned on because Force rport is set to Auto or at least Asterisk thinks that is the setting you want for some reason.

_edit_: Just thought of something.  Can you check "sip show settings" before making a call to the outbound proxy, after freshly reloading sip.conf or starting Asterisk for the first time?  Then check "sip show settings" after making the call?  I want to make sure that something is not changing the default setting when making an outbound call through the proxy.

Thanks

By: Alex Zarubin (az_tth) 2013-07-15 22:41:03.579-0500

Michael,
1. There was a typo in the nat=no line of sip.conf for the release 11.5.0-rc2, so please invalidate my yesterday's report for 11.5.0-rc2. My original report for 11.2.1 working and 11.4.0 not working is valid, both had nat=no in the general context.
2. I'm submitting a stripped version of sip.conf and "sip show settings" for 11.5.0-rc2. There were no difference in "sip show settings" output right after asterisk restart (i.e. before making a call) and after the call.
3. 11.5.0-rc2 has the same issue as 11.4 (putting nat=no into peer context didn't make any difference).
I'm submitting today's full logs for 11.5.0-rc2 and for 11.2.1. The only difference in BYE processing I noticed is extra check_for_nat line in 11.5.0-rc2 log:

[Jul 15 22:19:01] DEBUG[17930][C-00000000]: netsock2.c:138 ast_sockaddr_split_hostport: Splitting '1.2.3.4:5062' into...
[Jul 15 22:19:01] DEBUG[17930][C-00000000]: netsock2.c:192 ast_sockaddr_split_hostport: ...host '1.2.3.4' and port '5062'.
[Jul 15 22:19:01] DEBUG[17930][C-00000000]: chan_sip.c:17941 check_for_nat: NAT detected for 1.2.3.4:5062 / 5.6.7.8:5060
Sending to 1.2.3.4:5062 (no NAT)

Thank you.



By: Alex Zarubin (az_tth) 2013-07-23 18:09:02.345-0500

Hello,
Any update? I'd need to make a decision on whether to roll back to 11.2.1 or wait for a patch.
Thank you.

By: Mark Michelson (mmichelson) 2013-07-29 16:08:28.671-0500

So, this issue is interesting.

Follow along with me using {{issue_22071_0715_11.5.0-rc2}}. We'll skip ahead to the 200 OK that Asterisk receives in response to the INVITE. The 200 OK has two Record-Route headers. The top one specifies address 5.6.7.8:5060. Asterisk constructs a route set that specifies that messages in this dialog should be sent to 5.6.7.8:5060. Presumably, the ACK that Asterisk sends should go to this address. The ACK is not sent to this address, which seems wrong to me. But this is a separate issue.

Later, a BYE arrives from 1.2.3.4:5062. The top-most Via header has 1.2.3.4:5062, so we should send our response to that address. In fact, several lines lower you'll see a line that says "Sending to 1.2.3.4:5062". It seems like things should go correctly.

In the code, however, the issue arises as part of the {{respprep()}} function. At the very end of it, you'll find the following lines:

{code}
  /* default to routing the response to the address where the request
    * came from.  Since we don't have a transport layer, we do this here.
    * The process_via() function will update the port to either the port
    * specified in the via header or the default port later on (per RFC
    * 3261 section 18.2.2).
    */
   p->sa = p->recv;

   if (process_via(p, req)) {
       ast_log(LOG_WARNING, "error processing via header, will send response to originating address\n");
   }
{code}

The first line below the comment is setting the send address to be the same as the receive address, overriding what we had correctly determined to be the proper send address. As the comment states, the process_via() function updates the port based on what is in the Via header. However, for some reason, the process_via() function does not try to set the send address to the address that is in the Via header. Thus Asterisk ends up sending the 200 OK to the oddball address of 5.6.7.8:5062. Asterisk sends to the address from the route set but the port from the Via header in the BYE request.

Worst of all, this behavior is *completely unaffected* by NAT settings.

I can provide a fix for this, and it likely will solve your problem. The problem is that making the change to respect the Via header could potentially lead to broken interoperability with specific clients. We'll see.

By: Mark Michelson (mmichelson) 2013-07-29 16:18:22.097-0500

I've added ASTERISK-22071.patch. See if this works for you.

By: Alex Zarubin (az_tth) 2013-07-29 18:03:40.551-0500

Thank you, Mark, will try your patch tonight.
The root cause should be elsewhere though since, as far as I can tell, process_via() and respprep() are identical in 11.2.1 and 11.4.0 while 11.2.1 'respects Via' and 11.4.0 doesn't.

By: Walter Doekes (wdoekes) 2013-07-30 04:32:18.158-0500

For the record, the changes between 2.1 and 4.0:

I suppose this change is the culprit:

{noformat}
@@ -28153,7 +28258,10 @@ static int handle_request_do(struct sip_
       owner_chan_ref = sip_pvt_lock_full(p);

       copy_socket_data(&p->socket, &req->socket);
-       ast_sockaddr_copy(&p->recv, addr);
+
+       if (ast_sockaddr_isnull(&p->recv)) { /* This may already be set before getting here */
+               ast_sockaddr_copy(&p->recv, addr);
+       }

       /* if we have an owner, then this request has been authenticated */
       if (p->owner) {
{noformat}

As changed in:
{noformat}
------------------------------------------------------------------------
r382322 | elguero | 2013-03-01 05:28:22 +0100 (Fri, 01 Mar 2013) | 34 lines

Fix / Clean Up Some Items To Handle The New auto_* NAT Options

The original report had to do with a realtime peer behind NAT being pruned and
the peer's private address being used instead of its external address.  Upon
debugging, it was discovered that this was being caused by the addition of
the auto_force_rport and auto_comedia settings.

This patch does the following:

* Adds a missing note to the CHANGES file indicating that the default global nat
 setting is auto_force_rport

* Constify the 'req' parameter for check_via()

* Add calls to check_via() in a couple of places in order for the auto_*
 settings to do their job in attempting to determine if NAT is involved

* Set the flags SIP_NAT_FORCE_RPORT and SIP_PAGE2_SYMMETRICRTP if the auto_*
 settings are in use where it was needed

* Moves the copying of peer flags up in build_peer() to before they are used;
 this fixes the realtime prune issue

* Update the contrib/realtime schemas to allow the nat column to handle the
 different nat setting combinations we have

This patch received a review and "Ship It!" on the issue itself.

(closes issue ASTERISK-20904)
Reported by: JoshE
Tested by: JoshE, Michael L. Young
Patches:
 asterisk-20904-nat-auto-and-rt-peersv2.diff Michael L. Young (license 5026)

------------------------------------------------------------------------
{noformat}

I'm not sure if that change is correct.

handle_request_do:

p = find_call(req, addr, req->method);

We get an old dialog which has ->recv set and we decide to keep the old recv. That seems wrong..

By: Alex Zarubin (az_tth) 2013-08-01 11:57:56.470-0500

Mark's patch with process_via() change has been applied to 11.4.0 and fixed the issue, i.e. OK to BYE is transmitted properly now. It was a 'quick fix', so what about a root cause which should be more in line with Walter's comments? Thank you.

By: Michael L. Young (elguero) 2013-08-01 12:39:27.462-0500

Alex,

Sorry that I haven't been able to chime in.  Mark tracked down where I was looking as well before being taken away to work on some other projects which took me away from here.

I agree with Walter... more than likely, that is the culprit.  After looking at this, I see my thinking was flawed at the time.  Actually, I recall looking at this line briefly before uploading the patch to the issue that Walter pointed out.  It was one of those lines added while debugging and tracking things down and in the end I left in there for some reason thinking it was an optimization.  There is no harm done by reversing that one change to find out if that is it.

Since Mark states his patch might break interoperability with some clients, can you please test reversing that change Walter helped to point out with the following patch I will be attaching?  If you could report back, then Mark can decide which direction to go in regards to which fix should go into the code. I have a feeling that reversing that change will be the safest way to go.

If we find this is the culprit, this fix will need to be applied to 1.8 as well.

[^asterisk-22071-store-recvd-address.diff]

By: Alex Zarubin (az_tth) 2013-08-02 13:37:27.353-0500

Hello. I have applied the latest patch by Michael (reversal to 11.2.1 logic) to the original 11.4.0 code (with Mark's patch removed) and our issue is fixed, i.e. received= is set as it should and OK to BYE gets transmitted to the proxy. I think it's safe to apply this patch to our 11.4.0 based production code (which I'm going to do today) and I also hope it will be fixed in 11.x releases going forward. Thank you so much.

By: Walter Doekes (wdoekes) 2013-08-19 04:16:07.709-0500

Bump. Reviewboard or commit time?

By: Michael L. Young (elguero) 2013-08-19 12:37:41.678-0500

A duplicate report (ASTERISK-22314) was created.  The reporter on that issue has tested the patch [^asterisk-22071-store-recvd-address.diff] and says it resolved his issue.

Here are his comments:

{quote}
Karsten Wemheuer added a comment - 19/Aug/13 1:13 PM
I've checked the patch from issue ASTERISK-22071 as Michael told. It solves my problem. The sip communications seems ok (as it was in 1.8.22).
{quote}

I would say that we should commit that patch but since Mark is assigned to this issue, it would be nice to get his thoughts.

By: Mark Michelson (mmichelson) 2013-08-20 15:27:36.979-0500

I believe the patch that Michael uploaded is a better approach than mine. Michael, feel free to commit when you want. As you noted previously, you'll need the same change for 1.8 in addition to 11 (and trunk of course).