Summary: | ASTERISK-23020: PJSip - Multihomed machine returning wrong IP address | ||
Reporter: | xrobau (xrobau) | Labels: | |
Date Opened: | 2013-12-17 15:49:30.000-0600 | Date Closed: | 2014-03-11 11:07:57 |
Priority: | Major | Regression? | Yes |
Status: | Closed/Complete | Components: | Channels/chan_pjsip |
Versions: | 12.0.0-beta2 | Frequency of Occurrence | Constant |
Related Issues: | |||
Environment: | Attachments: | ( 0) sip_trace.txt | |
Description: | Multihomed machine has two interfaces:
eth0.10 = 192.168.15.5/24 (default gateway via this int) eth0.20 = 192.168.5.247/24 Phone endpoint is connected to eth0.20, with the IP Address 192.168.5.248 Connecting works fine, but in the OK packet returned from the server, the wrong IP address is handed back: {noformat} v=0 o=- 7132004 7132006 IN IP4 localhost.localdomain s=Asterisk c=IN IP4 192.168.15.5 t=0 0 m=audio 17940 RTP/AVP 8 101 c=IN IP4 192.168.15.5 a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-16 a=ptime:20 a=maxptime:150 a=sendrecv {noformat} This causes the phone to try to connect to the wrong IP address which is not reachable from the other network. Larger TCPdump attached | ||
Comments: | By: xrobau (xrobau) 2013-12-17 15:51:30.157-0600 [Edit by Rusty - Removed inline debug and attaching to issue, we generally want this attached to the issues as .txt, thanks!] By: xrobau (xrobau) 2013-12-17 15:56:28.866-0600 [transport-default] protocol=udp bind=0.0.0.0:5060 [300] type=endpoint aors=300 auth=300-auth allow=alaw context=from-internal callerid=device <300> dtmf_mode=rfc4733 mailboxes=300@device transport=transport-default [300] type=aor max_contacts=1 [300-auth] type=auth auth_type=userpass password=62b5942b6aabb2aa53df28074c1b834f username=300 By: xrobau (xrobau) 2013-12-17 16:00:48.359-0600 Note the machine is listening on the correct port and interface (0.0.0.0) [root@localhost admin]# netstat -nap | grep asterisk tcp 0 0 0.0.0.0:5038 0.0.0.0:* LISTEN 27967/asterisk udp 0 0 0.0.0.0:17940 0.0.0.0:* 27967/asterisk udp 0 0 0.0.0.0:17941 0.0.0.0:* 27967/asterisk udp 0 0 0.0.0.0:5060 0.0.0.0:* 27967/asterisk udp 0 0 0.0.0.0:4569 0.0.0.0:* 27967/asterisk By: Rusty Newton (rnewton) 2013-12-17 17:10:19.183-0600 Thanks for the report! Be sure to attach configs and debug to the issue as.txt files when possible rather than putting them in comments. Makes it easier to find them and link to others. By: xrobau (xrobau) 2013-12-17 17:17:25.395-0600 Even when EXPLICITLY specifying the IP address in the transport, it's still returning incorrect information. [root@localhost a12]# ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 90:fb:a6:27:ac:be brd ff:ff:ff:ff:ff:ff inet6 fe80::92fb:a6ff:fe27:acbe/64 scope link valid_lft forever preferred_lft forever 7: eth0.100@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 90:fb:a6:27:ac:be brd ff:ff:ff:ff:ff:ff inet 192.168.15.5/24 brd 192.168.15.255 scope global eth0.100 inet6 fe80::92fb:a6ff:fe27:acbe/64 scope link valid_lft forever preferred_lft forever 8: eth0.30@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 90:fb:a6:27:ac:be brd ff:ff:ff:ff:ff:ff inet 192.168.5.247/24 brd 192.168.5.255 scope global eth0.30 inet6 fe80::92fb:a6ff:fe27:acbe/64 scope link valid_lft forever preferred_lft forever This is a Cisco SPA 504g: [root@localhost a12]# ping 192.168.5.248 PING 192.168.5.248 (192.168.5.248) 56(84) bytes of data. 64 bytes from 192.168.5.248: icmp_seq=1 ttl=64 time=0.450 ms 64 bytes from 192.168.5.248: icmp_seq=2 ttl=64 time=0.468 ms ^C --- 192.168.5.248 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1316ms rtt min/avg/max/mdev = 0.450/0.459/0.468/0.009 ms /etc/asterisk/pjsip.conf [transport-default] protocol=udp bind=192.168.5.247:5060 type=transport [300] type=endpoint aors=300 auth=300-auth allow=g722,speex16,slin16,ulaw,alaw context=from-internal callerid=device <300> dtmf_mode=rfc4733 mailboxes=300@device transport=transport-default [300-auth] type=auth auth_type=userpass password=62b5942b6aabb2aa53df28074c1b834f username=300 [300] type=aor max_contacts=1 Note that this machine now has NO default gateway: [root@localhost a12]# netstat -nr Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0.30 192.168.15.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0.100 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0.100 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0.30 [root@localhost a12]# pjsip understands the bind line, as it's listening on the correct IP: [root@localhost a12]# netstat -nap | grep 5060 udp 0 0 192.168.5.247:5060 0.0.0.0:* 28225/asterisk [root@localhost a12]# However, TCPdump shows this: 09:15:20.129048 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 884) 192.168.5.247.sip > 192.168.5.248.5062: SIP, length: 856 SIP/2.0 200 OK Via: SIP/2.0/UDP 192.168.5.248:5062;rport;received=192.168.5.248;branch=z9hG4bK-936f477e Call-ID: dfbc9fc-bd72cfb6@192.168.5.248 From: "A12" <sip:300@192.168.5.247>;tag=810bf544406b5d0eo2 To: <sip:*43@192.168.5.247>;tag=7ba26edc-68c2-42d5-a657-f842768515f4 CSeq: 102 INVITE Contact: <sip:192.168.5.247:5060> Allow: OPTIONS, SUBSCRIBE, NOTIFY, PUBLISH, INVITE, ACK, BYE, CANCEL, UPDATE, PRACK, REFER, REGISTER, MESSAGE Supported: 100rel, timer, replaces, norefersub Content-Type: application/sdp Content-Length: 311 v=0 o=- 7683843 7683845 IN IP4 localhost.localdomain s=Asterisk c=IN IP4 192.168.15.5 t=0 0 m=audio 16354 RTP/AVP 9 0 8 101 c=IN IP4 192.168.15.5 a=rtpmap:9 G722/8000 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-16 a=ptime:20 a=maxptime:150 a=sendrecv Note that it's STILL Returning the wrong IP address. Asterisk has been restarted after the default route was removed, too. By: Anthony Messina (amessina) 2013-12-21 02:04:17.838-0600 I can confirm this issue with Asterisk-12.0.0 (with the exception that my "internal" nic is 10.x.x.x and my "external" nic has a publicly available IPv4 address. In my case the publicly available nic is set as the "default" route. I believe this likely has something to do with [http://trac.pjsip.org/repos/wiki/FAQ#multihomed]--in short, Asterisk seems to be choosing the "default" route, rather than the one that fits an AOR created by REGISTER. By: Zane Conkle (zconkle) 2013-12-23 15:19:56.485-0600 Are you sure that the Asterisk box is not responding via the second interface? I have had this problem in the past. Take a look at this: http://kindlund.wordpress.com/2007/11/19/configuring-multiple-default-routes-in-linux/ By: Joshua C. Colp (jcolp) 2014-01-04 18:38:06.248-0600 This should just be made a general multi-homing issue for pjsip. There's two things in place: 1. If we bind to any (0.0.0.0) we should adjust signaling accordingly depending on which interface the message ends up going out on. 2. If multiple transports are configured and none is explicitly locked we should determine which interface the message should go out on and send it out accordingly/change the message. Both of the above can use the pj_getipinterface function to get the information. By: Kinsey Moore (kmoore) 2014-03-10 15:44:17.552-0500 Joshua's res_pjsip_multihomed patch on reviewboard fixes all problems I was having related to incorrect IPs going out on the wire. By: Matt Jordan (mjordan) 2014-03-10 20:11:35.623-0500 I'd say given the difficulty in creating an automated test for this, the solution is going to be to just commit this with the knowledge that: * Josh tested it * Kinsey tested it If anyone else would like to give the multihomed module a whirl and verify that it also fixes the issues they're seeing, you can download the patch from here: https://reviewboard.asterisk.org/r/3102/ By: Private Name (falves11) 2015-02-20 21:46:32.616-0600 This issue is not fixed. I have a scenario a little more subtle. The box has two IPs in the same subnet, single gateway. A call comes in on interface eth1, but the response leaves the box from interface eth0, and the call fails. The transport is 0.0.0.0:5060. I ended up having to split the machine in two and using only one IP per box. But it should work fine without this. Is there any idea how to manage the issue? The box is correctly configured against arp flux already net.ipv4.conf.all.arp_filter=1 net.ipv4.conf.all.arp_ignore=1 net.ipv4.conf.all.arp_announce=2 Beyond this I use policy-based routing, using the package iproute. There must be a way to tell PJSIP to remember what IP address was used when the call arrived, and send the response back out the same interface. |