[Home]

Summary:ASTERISK-22352: [patch] IAX2 custom qualify timer is not taken into account
Reporter:Frederic Van Espen (frederic.ve)Labels:
Date Opened:2013-08-21 05:59:56Date Closed:2015-04-10 07:41:50
Priority:MinorRegression?
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:1.8.23.0 Frequency of
Occurrence
Constant
Related
Issues:
is related toASTERISK-24894 [patch] iax2_poke_noanswer expiration timer too short
Environment:Attachments:( 0) iax_qualify.patch
( 1) iax_qualifyv2.patch
Description:When I try to use a qualify timer value for an IAX other than the default 2000ms, this value is not taken into account.

I can reproduce this with this configuration:
[remote-host]
type=friend
host=172.16.6.45
username=remote-host
secret=test
notransfer=yes
qualify=16000
qualifyfreqnotok=30000

disallow=all
allow=alaw
allow=ulaw
allow=ilbc

auth=md5
encryption=no

On remote host I configure a delay of 6000ms:
~# tc qdisc add dev eth0 root netem delay 6000ms

The result in the logs:
[Aug 21 10:54:58] NOTICE[13318] chan_iax2.c: Peer 'remote-host' is now REACHABLE! Time: 6001
[Aug 21 10:56:02] NOTICE[13323] chan_iax2.c: Peer 'remote-host' is now UNREACHABLE! Time: 6001
[Aug 21 10:56:38] NOTICE[13319] chan_iax2.c: Peer 'remote-host' is now REACHABLE! Time: 6001
[Aug 21 10:57:42] NOTICE[13324] chan_iax2.c: Peer 'remote-host' is now UNREACHABLE! Time: 6001
[Aug 21 10:58:18] NOTICE[13318] chan_iax2.c: Peer 'remote-host' is now REACHABLE! Time: 5999
[Aug 21 10:59:22] NOTICE[13325] chan_iax2.c: Peer 'remote-host' is now UNREACHABLE! Time: 5999
[Aug 21 10:59:58] NOTICE[13319] chan_iax2.c: Peer 'remote-host' is now REACHABLE! Time: 6001
[Aug 21 11:01:02] NOTICE[13324] chan_iax2.c: Peer 'remote-host' is now UNREACHABLE! Time: 6001
[Aug 21 11:01:38] NOTICE[13320] chan_iax2.c: Peer 'remote-host' is now REACHABLE! Time: 6001
[Aug 21 11:02:42] NOTICE[13325] chan_iax2.c: Peer 'remote-host' is now UNREACHABLE! Time: 6001
[Aug 21 11:03:18] NOTICE[13319] chan_iax2.c: Peer 'remote-host' is now REACHABLE! Time: 6001

I believe this is due to a mixup in the chan_iax2.c code. When the peer is REACHABLE, the response appears to be expected within the default timer value*2. When it is UNREACHABLE, it uses the peer->pokefreqnotok value. IMO this timer value should be the same, whatever state it is, and the timer value should be peer->maxms.

I did not test on asterisk 10 or 11, but by looking at the code it appears to be present in those releases as well
Comments:By: Frederic Van Espen (frederic.ve) 2013-08-21 06:14:00.615-0500

proposed patch that fixes the issue for me.

By: Michael L. Young (elguero) 2013-08-21 13:26:54.448-0500

Frederic,

I don't think the qualify setting is used as a timer.  It is used to determine when to consider a peer is lagged or unreachable.  The default (qualify=yes) is when it takes more than 2 seconds to get a response back from the peer.  In your case, you are setting it to consider anything over 16 seconds as being lagged or unreachable.

The attached patch to use peer->maxms, is incorrect.


By: Frederic Van Espen (frederic.ve) 2013-08-22 02:41:06.456-0500

Hello Michael,

Thanks for your feedback and pointing out my error. I did some more tests and I understand the code a bit better now.

That piece of the code determines how long we still wait for a PONG response on a POKE request, and the DEFAULT_MAXMS seconds multiplied by 2 allows us to differentiate between LAGGED and UNREACHABLE for the default value.

However, I still don't understand why the wait time for a response should be any different when the peer is UNREACHABLE or REACHABLE. On top of that, this means that you cannot use any qualify value of over 4000 ms, because the peer will always be marked unreachable after 4000 ms. I know these values are quite extreme, but we're currently dealing with network conditions where for a very short time a delay of more than 4000 ms may happen. If I configure a value higher than 4000 ms, the peer will enter a "flapping" state between REACHABLE and UNREACHABLE.

So, currently when the peer is:
- REACHABLE: we wait a hardcoded value of 4000 ms for a response;
- UNREACHABLE: we wait peer->pokefreqnotok (default 10000 ms) for a response.

I modified the patch iax_qualifyv2.patch to always wait "peer->maxms * 2".

By: Matt Jordan (mjordan) 2013-09-12 19:16:43.686-0500

This patch removes the option {{qualifyfreqnotok}}. As a start, that makes it not acceptable for any release branch.

From {{iax.conf}}:

{noformat}
;qualifyfreqok = 60000 ; how frequently to ping the peer when
; everything seems to be ok, in milliseconds
;qualifyfreqnotok = 10000 ; how frequently to ping the peer when it's
; either LAGGED or UNAVAILABLE, in milliseconds
{noformat}

Regardless of whether or not it is the behavior you *want*, the behavior that {{chan_iax}} has is to use those two values for it's periodic qualifying of IAX peers.

The bug here isn't that we aren't using {{peer->maxms * 2}} - which is not the defined behavior - it is that we are not using the peer's {{qualifyfreqok}} value.

I'd be fine with a patch that changes the scheduling to {{qualifyfreqok}} - anything else would need the following:
* Another configuration option that instructs Asterisk to ignore {{qualifyfreqok}} and {{qualifyfreqnotok}} and instead use the last known qualify time, if available, as the time to reschedule the peer poke
* Such a patch would have to be written against trunk. It would be an improvement for {{chan_iax2}}, and as such is not a suitable candidate for inclusion in the existing release branches.


By: Matt Jordan (mjordan) 2013-09-12 19:18:15.295-0500

If you want to write a patch for trunk that meets the above criteria, let me know and we'll keep this issue open for said patch.

Otherwise, we can keep this open with a note that ammends the bug to state that the issue is not honoring the {{qualifyfreqok}} setting.

By: Frederic Van Espen (frederic.ve) 2013-09-13 03:36:54.287-0500

Hello Matt,

Thanks for looking into this again.

I don't agree that this patch removes the option qualifyfreqnotok. The poke frequency is scheduled in a different part of the code: lines 10960 to 10965 and line 12053 of current asterisk 1.8.23.1 release. In those places there is legitimate use of the qualifyfreqnotok value (pokefreqnotok in the code).

My patch actually just removes a piece of the code that should IMO not be there. In that part of the code we are actually scheduling a call to a function that should be called when the peer does not respond to the poke within the given qualify time (qualify=<maxms). It makes no sense to use qualifyfreqnotok (pokefreqnotok) in this part of the code. I'll try to explain with extreme values. Say you have a peer with these settings:
qualify=2000
qualifyfreqnotok=1

If at some point the peer becomes unreachable, we have to receive a PONG within 1 ms of a POKE for the peer to become reachable again because the __iax2_poke_noanswer function would be called before the PONG can be received. This function destroys the callno. This makes doesn't make any sense. I still believe there is a mixup between qualify=<maxms> and qualifyfreqnotok=<pokefreqnotok> in this part of the code.

Let me know if you don't understand my point and I'd be happy to explain further

By: Frederic Van Espen (frederic.ve) 2014-02-11 14:54:44.443-0600

Hi,

I did not realise that I'm the assignee for this issue. What should I do to move forward with this?

As I said before, I believe there's a misunderstanding going on. The bug is that we are not waiting for at least {{maxms}} for a response to a poke when the peer became unreachable before, but instead we are waiting for {{pokefreqnotok}}. According to the documentation, {{pokefreqnotok}} is the interval in which we will poke the peer when it is LAGGED or UNREACHABLE, but in the code the value is used to schedule the destruction of the IAX poke dialog.

As such, my patch is not an improvement to chan_iax2, but a bug fix which IMO would be a candidate for inclusion to existing release branches. The patch schedules the destruction of the IAX poke dialog to {{maxms}} * 2.

BTW, we've been using this patch on 50ish asterisk 1.8.13.0 installations without any issues since the bug report was created.

By: Y Ateya (yateya) 2015-04-05 08:39:50.882-0500

This issue is solved (with better solution) in ASTERISK-24894 (https://reviewboard.asterisk.org/r/4536/).