ASTERISK-23020: PJSip - Multihomed machine returning wrong IP address

[Home]

Summary: ASTERISK-23020: PJSip - Multihomed machine returning wrong IP address

Reporter: xrobau (xrobau) Labels:

Date Opened: 2013-12-17 15:49:30.000-0600 Date Closed: 2014-03-11 11:07:57

Priority: Major Regression? Yes

Status: Closed/Complete Components: Channels/chan_pjsip

Versions: 12.0.0-beta2 Frequency of
Occurrence Constant

Related
Issues:

Environment: Attachments: ( 0) sip_trace.txt

Description: Multihomed machine has two interfaces:

eth0.10 = 192.168.15.5/24 (default gateway via this int)
eth0.20 = 192.168.5.247/24

Phone endpoint is connected to eth0.20, with the IP Address 192.168.5.248

Connecting works fine, but in the OK packet returned from the server, the wrong IP address is handed back:

{noformat}
v=0
o=- 7132004 7132006 IN IP4 localhost.localdomain
s=Asterisk
c=IN IP4 192.168.15.5
t=0 0
m=audio 17940 RTP/AVP 8 101
c=IN IP4 192.168.15.5
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=maxptime:150
a=sendrecv
{noformat}

This causes the phone to try to connect to the wrong IP address which is not reachable from the other network.

Larger TCPdump attached

Comments: By: xrobau (xrobau) 2013-12-17 15:51:30.157-0600

[Edit by Rusty - Removed inline debug and attaching to issue, we generally want this attached to the issues as .txt, thanks!]
By: xrobau (xrobau) 2013-12-17 15:56:28.866-0600

[transport-default]
protocol=udp
bind=0.0.0.0:5060

[300]
type=endpoint
aors=300
auth=300-auth
allow=alaw
context=from-internal
callerid=device <300>
dtmf_mode=rfc4733
mailboxes=300@device
transport=transport-default

[300]
type=aor
max_contacts=1

[300-auth]
type=auth
auth_type=userpass
password=62b5942b6aabb2aa53df28074c1b834f
username=300

By: xrobau (xrobau) 2013-12-17 16:00:48.359-0600

Note the machine is listening on the correct port and interface (0.0.0.0)

[root@localhost admin]# netstat -nap | grep asterisk
tcp 0 0 0.0.0.0:5038 0.0.0.0:* LISTEN 27967/asterisk
udp 0 0 0.0.0.0:17940 0.0.0.0:* 27967/asterisk
udp 0 0 0.0.0.0:17941 0.0.0.0:* 27967/asterisk
udp 0 0 0.0.0.0:5060 0.0.0.0:* 27967/asterisk
udp 0 0 0.0.0.0:4569 0.0.0.0:* 27967/asterisk

By: Rusty Newton (rnewton) 2013-12-17 17:10:19.183-0600

Thanks for the report! Be sure to attach configs and debug to the issue as.txt files when possible rather than putting them in comments. Makes it easier to find them and link to others.
By: xrobau (xrobau) 2013-12-17 17:17:25.395-0600

Even when EXPLICITLY specifying the IP address in the transport, it's still returning incorrect information.

[root@localhost a12]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 90:fb:a6:27:ac:be brd ff:ff:ff:ff:ff:ff
inet6 fe80::92fb:a6ff:fe27:acbe/64 scope link
valid_lft forever preferred_lft forever
7: eth0.100@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 90:fb:a6:27:ac:be brd ff:ff:ff:ff:ff:ff
inet 192.168.15.5/24 brd 192.168.15.255 scope global eth0.100
inet6 fe80::92fb:a6ff:fe27:acbe/64 scope link
valid_lft forever preferred_lft forever
8: eth0.30@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 90:fb:a6:27:ac:be brd ff:ff:ff:ff:ff:ff
inet 192.168.5.247/24 brd 192.168.5.255 scope global eth0.30
inet6 fe80::92fb:a6ff:fe27:acbe/64 scope link
valid_lft forever preferred_lft forever

This is a Cisco SPA 504g:

[root@localhost a12]# ping 192.168.5.248
PING 192.168.5.248 (192.168.5.248) 56(84) bytes of data.
64 bytes from 192.168.5.248: icmp_seq=1 ttl=64 time=0.450 ms
64 bytes from 192.168.5.248: icmp_seq=2 ttl=64 time=0.468 ms
^C
--- 192.168.5.248 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1316ms
rtt min/avg/max/mdev = 0.450/0.459/0.468/0.009 ms

/etc/asterisk/pjsip.conf

[transport-default]
protocol=udp
bind=192.168.5.247:5060
type=transport

[300]
type=endpoint
aors=300
auth=300-auth
allow=g722,speex16,slin16,ulaw,alaw
context=from-internal
callerid=device <300>
dtmf_mode=rfc4733
mailboxes=300@device
transport=transport-default
[300-auth]
type=auth
auth_type=userpass
password=62b5942b6aabb2aa53df28074c1b834f
username=300
[300]
type=aor
max_contacts=1

Note that this machine now has NO default gateway:

[root@localhost a12]# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0.30
192.168.15.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0.100
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0.100
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0.30
[root@localhost a12]#

pjsip understands the bind line, as it's listening on the correct IP:
[root@localhost a12]# netstat -nap | grep 5060
udp 0 0 192.168.5.247:5060 0.0.0.0:* 28225/asterisk
[root@localhost a12]#

However, TCPdump shows this:

09:15:20.129048 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 884)
192.168.5.247.sip > 192.168.5.248.5062: SIP, length: 856
SIP/2.0 200 OK
Via: SIP/2.0/UDP 192.168.5.248:5062;rport;received=192.168.5.248;branch=z9hG4bK-936f477e
Call-ID: dfbc9fc-bd72cfb6@192.168.5.248
From: "A12" <sip:300@192.168.5.247>;tag=810bf544406b5d0eo2
To: <sip:*43@192.168.5.247>;tag=7ba26edc-68c2-42d5-a657-f842768515f4
CSeq: 102 INVITE
Contact: <sip:192.168.5.247:5060>
Allow: OPTIONS, SUBSCRIBE, NOTIFY, PUBLISH, INVITE, ACK, BYE, CANCEL, UPDATE, PRACK, REFER, REGISTER, MESSAGE
Supported: 100rel, timer, replaces, norefersub
Content-Type: application/sdp
Content-Length: 311

v=0
o=- 7683843 7683845 IN IP4 localhost.localdomain
s=Asterisk
c=IN IP4 192.168.15.5
t=0 0
m=audio 16354 RTP/AVP 9 0 8 101
c=IN IP4 192.168.15.5
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=maxptime:150
a=sendrecv

Note that it's STILL Returning the wrong IP address. Asterisk has been restarted after the default route was removed, too.

By: Anthony Messina (amessina) 2013-12-21 02:04:17.838-0600

I can confirm this issue with Asterisk-12.0.0 (with the exception that my "internal" nic is 10.x.x.x and my "external" nic has a publicly available IPv4 address. In my case the publicly available nic is set as the "default" route. I believe this likely has something to do with [http://trac.pjsip.org/repos/wiki/FAQ#multihomed]--in short, Asterisk seems to be choosing the "default" route, rather than the one that fits an AOR created by REGISTER.
By: Zane Conkle (zconkle) 2013-12-23 15:19:56.485-0600

Are you sure that the Asterisk box is not responding via the second interface? I have had this problem in the past. Take a look at this: http://kindlund.wordpress.com/2007/11/19/configuring-multiple-default-routes-in-linux/
By: Joshua C. Colp (jcolp) 2014-01-04 18:38:06.248-0600

This should just be made a general multi-homing issue for pjsip. There's two things in place:

1. If we bind to any (0.0.0.0) we should adjust signaling accordingly depending on which interface the message ends up going out on.
2. If multiple transports are configured and none is explicitly locked we should determine which interface the message should go out on and send it out accordingly/change the message.

Both of the above can use the pj_getipinterface function to get the information.
By: Kinsey Moore (kmoore) 2014-03-10 15:44:17.552-0500

Joshua's res_pjsip_multihomed patch on reviewboard fixes all problems I was having related to incorrect IPs going out on the wire.
By: Matt Jordan (mjordan) 2014-03-10 20:11:35.623-0500

I'd say given the difficulty in creating an automated test for this, the solution is going to be to just commit this with the knowledge that:
* Josh tested it
* Kinsey tested it

If anyone else would like to give the multihomed module a whirl and verify that it also fixes the issues they're seeing, you can download the patch from here:

https://reviewboard.asterisk.org/r/3102/
By: Private Name (falves11) 2015-02-20 21:46:32.616-0600

This issue is not fixed. I have a scenario a little more subtle. The box has two IPs in the same subnet, single gateway. A call comes in on interface eth1, but the response leaves the box from interface eth0, and the call fails. The transport is 0.0.0.0:5060. I ended up having to split the machine in two and using only one IP per box. But it should work fine without this. Is there any idea how to manage the issue? The box is correctly configured against arp flux already
net.ipv4.conf.all.arp_filter=1
net.ipv4.conf.all.arp_ignore=1
net.ipv4.conf.all.arp_announce=2
Beyond this I use policy-based routing, using the package iproute.
There must be a way to tell PJSIP to remember what IP address was used when the call arrived, and send the response back out the same interface.