[Home]

Summary:ASTERISK-18827: iax2 peer/trunk unreachable
Reporter:andrea lanza (lanzaandrea)Labels:
Date Opened:2011-11-07 02:14:31.000-0600Date Closed:2013-04-12 16:53:01
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:1.8.7.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:opensuse 11.4 virtual machine on esxi 4.1Attachments:( 0) review2427.diff.txt
Description:After upgrading vm to opensuse 11.4 and asterisk 1.8.x (tested 1.8.5; 1.8.7.1) iax2 peer/trunk becomes unreachable from virtual to physial; no problem on the wayback (phys to virt); rolling back to asterisk 1.6.20 link is again OK.
Debugging with tcpdump on the virtual machine, shows virtual sending POKE messages to physical peer; peer answers PONG which is received by the vm; vm anyway doesn't send ACK, probably meaning PONG in not "undestood" by chan_iax; changing the port of the link back, or changing the ip address (i.e: the two servers have 2 network cards in 2 different nets; link phys to virt is done on the first net; link back virt to phys on the other net) solves the problem.
No changes setting trunk=yes or no
No problems seems to exists on physical machines-pairs (link-trunk between physical machines) having the same os/asterisk versions of the faulty ones.
Reproduced in 3 different situations on various esxi versions and in esx 3.5

Maybe some timing issues ?
Comments:By: andrea lanza (lanzaandrea) 2011-11-07 02:50:40.130-0600

I forgot to add the iax2 debug:

Tx-Frame Retry[000] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: POKE
  Timestamp: 00006ms  SCall: 01756  DCall: 00000 [xx.yy.zzz.ww:4569]

Tx-Frame Retry[001] -- OSeqno: 000 ISeqno: 000 Type: IAX     Subclass: POKE
  Timestamp: 00006ms  SCall: 01756  DCall: 00000 [xx.yy.zzz.ww:4569]


this debug is showing that, also if tcpdump shows it is receiving PONG, iax2 nevertheless seems not


By: andrea lanza (lanzaandrea) 2011-11-07 04:07:00.743-0600

Further investigation:
start asterisk on the virtual machine, having qualify=yes in the peer definition
iax is UNKNOWN, and after some second becomes UNREACHABLE
stop asterisk (core stop now)
start asterisk on the virtual machine, having qualify=no in the peer definition
iax is UNMONITORED
exit (NOT STOP!) asterisk, change peer definition to qualify=yes
enter asterisk console (asterisk -r)
iax2 reload ==> peer is now REACHABLE

So my conclusion is:

1) Starting an asterisk server 1.8x on a virtual machine, with a iax peer having qualify=yes,
brings to UNREACHABLE state

2) Starting an asterisk server 1.8x on a virtual machine, with a iax peer having qualify=no,
then after startup modify to qualify=yes and reload iax2 configuration, brings to REACHABLE state

3) no issue on the same virtual box having 1.6.20 * version

4) On the physical counterpart (IAX peer), same OS and * versioon, no issues at all

If anybody can confirm this behaviour ...






By: andrea lanza (lanzaandrea) 2011-11-07 09:32:10.681-0600

just for further debug/investigation, I tried to edit chan_iax.c

commenting out the line "return 1" in the following block, SOLVES the problem

if (cur) {
   /* we found another thread processing a full frame for this call,
     so queue it up for processing later. */
   defer_full_frame(thread, cur);
   AST_LIST_UNLOCK(&active_list);
   thread->iostate = IAX_IOSTATE_IDLE;
   signal_condition(&thread->lock, &thread->cond);
// THIS IS MY COMMENT                     return 1;
  } else {
...


anyway I don't know what is doing the chan and which are the problems in doing it: maybe it breaks everything !

line commented out is 9571 for asterisk 1.8.7.1

ASTERISK_FILE_VERSION(__FILE__, "$Revision: 331248 $")

I hope coders could investigate this issue

******************************

SO: the problem seems to be related to this:

chan_iax doesn't process received POKE message becouse it belives another task is already processing that


By: Josep Casals C (joscas) 2011-11-17 09:21:07.004-0600

I have the same issue with an Ubuntu 10.04.3 virtual machine running Asterisk 1.8.7.0.
The exact same configuration on an Ubuntu 8.04.4 virtual machine running Asterisk 1.8.4.1 works fine.
The problem is exactly as described. Physical can see virtual but virtual cannot see physical.

By: Victoriano Giralt (victoriano) 2011-12-06 13:52:40.215-0600

I can confirm that both the problem and the (qualify=no/qualify=yes) workaround manifest themselves on asterisk 1.8.7.1 running on CentOS 5.8 on x86_64 physical hardware, no virtualisation whatsoever. BTW, my deepest gratitude to the poster of the workaround.

By: Imre Gergely (cemc) 2012-08-25 11:06:06.660-0500

One more confirmation for this same issue and working workaround on FreePBX distro.

- Asterisk 10.7.0
- VM is a 64bit KVM, and the FreePBX distro is CentOS 6.2
- host is Ubuntu 10.04 64bit

The IAX2 peer is a physical CentOS 5.6 32bit with Asterisk 1.6.2.2.

By: amcrory (amcrory) 2012-12-16 19:06:57.978-0600

Though I am having trouble with UNREACHABLE errors on a proxmox virtualized host, the above patch breaks iaxmodems for me completely. This error was so severe that I never bothered to test trunking. patch applied to asterisk-1.8.19.0 on compiled centos 6.x

By: amcrory (amcrory) 2012-12-18 13:58:19.381-0600

FYI - I was able to bring my trunks back up by reconfiguring to host=dynamic and adding a register string (username:password@<remote_ip>). Not really a fix but it would be nice to know if it will work for you.

By: Alec Davis (alecdavis) 2013-04-02 22:45:02.596-0500

I'm seeing similar, but both are physical boxes, no VM's.
Both are on the same physical lan, and switch.
Both are SVN-branch-11-r376441M

However they are different speed machines;
testbox is a P4 3.0GHZ
production box is Intel(R) Core(TM)2 Duo CPU     E6750  @ 2.66GHz

The symptom is the slower testbox always reports UNREACHBALE to the production box.
The other direction is fine.

applying iax_18827.diff.txt to asterisk 11, works around the issue. I always get the debug warning every qualify period.

Below, the response time is quick (1ms), but the slow machine hasn't finished the previous POKE??

{code}
astrid-test*CLI> iax2 show peers
Name/Username    Host                 Mask             Port          Status      Description
astrid2/astrid2  192.168.5.40    (S)  255.255.255.255  4569 (T)      OK (1 ms)
1 iax2 peers [1 online, 0 offline, 0 unmonitored]
astrid-test*CLI>
{code}



By: Alec Davis (alecdavis) 2013-04-03 04:53:39.319-0500

results with iax_18827.diff.txt on a RASPBERRY PI.

{code}
SVN-branch-1.8-r375994M

raspberrypi*CLI> iax2 show peers
Name/Username    Host                 Mask             Port          Status
aldpabx/aldpabx  192.168.xxx.254 (S)  255.255.255.255  4569 (T)      UNREACHABLE
1 iax2 peers [0 online, 1 offline, 0 unmonitored]

after patching and compiling on raspberry pi.

raspberrypi*CLI> module unload chan_iax2.so
Unloaded chan_iax2.so
raspberrypi*CLI> module load chan_iax2.so
Loaded chan_iax2.so
raspberrypi*CLI> iax2 show peers
Name/Username    Host                 Mask             Port          Status
aldpabx/aldpabx  192.168.xxx.254 (S)  255.255.255.255  4569 (T)      OK (1000 ms)
1 iax2 peers [1 online, 0 offline, 0 unmonitored]
raspberrypi*CLI> iax2 show peers
Name/Username    Host                 Mask             Port          Status
aldpabx/aldpabx  192.168.xxx.254 (S)  255.255.255.255  4569 (T)      OK (1 ms)
1 iax2 peers [1 online, 0 offline, 0 unmonitored]
raspberrypi*CLI>

{code}

By: Alec Davis (alecdavis) 2013-04-03 18:19:16.707-0500

uploaded iax_18827.diff2.txt
The issue is the deferred frames are not being processed.

This patch has debug messages, that show the deferal of a frame to the other thread, and the other thread processing the deferred frame.

{code}
[Apr  4 12:08:34] WARNING[380]: chan_iax2.c:9743 socket_read: ALEC[3] already being processed on thread [1]
[Apr  4 12:08:34] WARNING[380]: chan_iax2.c:9685 defer_full_frame: ALEC[3] append deferred frame on tail of thread [1]
[Apr  4 12:08:34] WARNING[370]: chan_iax2.c:9642 handle_deferred_full_frames: ALEC[1] handled a deferred frame

{code}

but I'm not sure about the legality of the following, that allows processing to happen on the other thread.

{code}
                       cur->iostate = IAX_IOSTATE_READY;
                       signal_condition(&cur->lock, &cur->cond);
{code}

By: Alec Davis (alecdavis) 2013-04-05 02:46:10.688-0500

Added reviewboard link https://reviewboard.asterisk.org/r/2427/

The review will fix startup that will prevent network messages to enter before all iax helper threads were ready.

The helper thread is required to be sleeping before any messages arrive, expecting to be woken up by the network thread.




By: Alec Davis (alecdavis) 2013-04-12 03:21:25.915-0500

attached review2427.diff.txt as commited to 1.8, 11 and trunk

By: Matt Jordan (mjordan) 2013-04-12 16:46:52.957-0500

Alec: should this issue be closed now after your commits?

By: Alec Davis (alecdavis) 2013-04-12 16:53:01.795-0500

Fixed in 1.8, 11 and trunk
reviewboard 2426 and reviewboard 2427

By: Heiko Wundram (modelnine) 2013-04-15 02:57:08.761-0500

Applying patches iax_18827.diff.txt and review2427.diff.txt should fix these issues? From what I can tell, iax_18827.diff2.txt is only debugging, and at least applying it stand-alone causes Asterisk to segfault on call.

By: Alec Davis (alecdavis) 2013-04-15 03:10:30.944-0500

@Heiko, correct they were debug and testing, sorry I should have deleted them.

The probable reason for the diff2 version failing was that I didn't have the lock on the other thread.

The remaining patch review2427.diff.txt is should only be applied.