[Home]

Summary:ASTERISK-21193: IAX/2 fails to destroy channel on max retransmits exceeded
Reporter:Jaco Kroon (jkroon)Labels:
Date Opened:2013-02-28 14:32:32.000-0600Date Closed:2013-04-10 15:00:39
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:11.2.1 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:Attachments:
Description:Hi,

Every now and again an IAX/2 channel will get locked up, with the following being sent to the CLI and logs repeatedly:
{noformat}
[Feb 28 21:38:47] WARNING[23040]: chan_iax2.c:3552 __attempt_transmit: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 11, ts=17269334, seqno=249)
[Feb 28 21:38:57] WARNING[23045]: chan_iax2.c:3552 __attempt_transmit: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 11, ts=17279334, seqno=250)
[Feb 28 21:39:00] WARNING[23047]: chan_iax2.c:3552 __attempt_transmit: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 2, ts=17282693, seqno=251)
[Feb 28 21:39:07] WARNING[23039]: chan_iax2.c:3552 __attempt_transmit: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 11, ts=17289333, seqno=252)
{noformat}

Currently running iax2 show peer arctica shows the following:

{noformat}
morpheus*CLI> iax2 show peer arctica


 * Name       : arctica
 Description  :
 Secret       : <Set>
 Context      : uls-from-trunk
 Parking lot  :
 Mailbox      :
 Dynamic      : Yes
 Callnum limit: 1024
 Calltoken req: Yes
 Trunk        : Yes
 Encryption   : No
 Callerid     : "" <>
 Expire       : 3212988
 ACL          : No
 Addr->IP     : 105.236.192.85 Port 4569
 Defaddr->IP  : 0.0.0.0 Port 4569
 Username     : ulsvoip
 Codecs       : (gsm|g729)
 Codec Order  : (g729|gsm)
 Status       : OK (9 ms)
 Qualify      : every 60000ms when OK, every 10000ms when UNREACHABLE (sample smoothing Off)
{noformat}

The specific channel from core show channels concise:

{noformat}
IAX2/arctica-3307!uls-from-trunk!0761916763!9!Ring!Dial!Local/0799048876@uls-place-call/n!0873514879!arctica!!2!17643!(None)!morpheus-1362063026.53899
{noformat}

Some logs from the channel creation (came in from remote peer, I did filter for the string arctica and then worked from there - there is a LOT of information in the logs around this, total log is about 270M atm):

{noformat}
[Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c:     -- Executing [0761916763@uls-from-trunk:1] NoCDR("IAX2/arctica-3307", "") in new stack
[Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c:     -- Executing [0761916763@uls-from-trunk:2] Set("IAX2/arctica-3307", "_BLINDTRANSFER=") in new stack
[Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c:     -- Executing [0761916763@uls-from-trunk:3] GotoIf("IAX2/arctica-3307", "0?godial") in new stack
[Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c:     -- Executing [0761916763@uls-from-trunk:4] Set("IAX2/arctica-3307", "_reqip=105.236.192.85") in new stack
[Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c:     -- Executing [0761916763@uls-from-trunk:5] Set("IAX2/arctica-3307", "_callref=") in new stack
[Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c:     -- Executing [0761916763@uls-from-trunk:6] GotoIf("IAX2/arctica-3307", "1?godial") in new stack
[Feb 28 16:50:26] VERBOSE[2711][C-0000317a] pbx.c:     -- Executing [0761916763@uls-from-trunk:9] Dial("IAX2/arctica-3307", "Local/0761916763@uls-place-call/n") in new stack
[Feb 28 16:50:26] VERBOSE[2712][C-0000317a] pbx.c:     -- Executing [0761916763@uls-place-call:1] NoOp("Local/0761916763@uls-place-call-0000377b;2", "Trunk call request from "" <0873514879> to 0761916763 (RDNIS=, BLINDTRANSFER=, realuser=arctica)") in new stack
[Feb 28 16:50:26] VERBOSE[2712][C-0000317a] pbx.c:     -- Executing [0761916763@uls-place-call:2] GotoIf("Local/0761916763@uls-place-call-0000377b;2", "1?loaduser") in new stack
{noformat}

... more dialplan ... lot's of it ... and off to just before the max retransmits ...

{noformat}
[Feb 28 16:50:26] VERBOSE[2712][C-0000317a] pbx.c:     -- Executing [0761916763@uls-callprovider:19] Set("Local/0761916763@uls-place-call-0000377b;2", "c=1") in new stack
[Feb 28 16:50:26] VERBOSE[2712][C-0000317a] pbx.c:     -- Executing [0761916763@uls-callprovider:20] Dial("Local/0761916763@uls-place-call-0000377b;2", "IAX2/uls-node-r2d2/prov*is*0761916763,,") in new stack
[Feb 28 16:50:26] VERBOSE[2712][C-0000317a] app_dial.c:     -- IAX2/uls-node-r2d2-17667 is making progress passing it to Local/0761916763@uls-place-call-0000377b;2
[Feb 28 16:50:26] VERBOSE[2711][C-0000317a] app_dial.c:     -- Local/0761916763@uls-place-call-0000377b;1 is making progress passing it to IAX2/arctica-3307
[Feb 28 16:50:31] VERBOSE[2723] chan_iax2.c:     -- Seeding 'arctica' at 105.236.192.85:4569 for 60
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Registered extension context 'dial-trunk-arctica'; registrar: pbx_config
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension '_X!' priority 1 to dial-trunk-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension '_X!' priority 2 to dial-trunk-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Registered extension context 'speeddial-tr-arctica'; registrar: pbx_config
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Including context 'speeddial-cl-ULS-A13' in context 'speeddial-tr-arctica'
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Registered extension context 'chanvars-arctica'; registrar: pbx_config
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension 's' priority 1 to chanvars-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension 's' priority 2 to chanvars-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension 's' priority 3 to chanvars-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension 's' priority 4 to chanvars-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension 's' priority 5 to chanvars-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension 's' priority 6 to chanvars-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension 's' priority 7 to chanvars-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension 's' priority 8 to chanvars-arctica
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Registered extension context 'cli-arctica-client'; registrar: pbx_config
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Including context 'cli-arctica-global' in context 'cli-arctica-client'
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Registered extension context 'cli-arctica-global'; registrar: pbx_config
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension '0873534134' priority 1 to cli-arctica-global
[Feb 28 16:50:31] VERBOSE[2723] pbx.c:     -- Added extension 'arctica' priority 1 to uls-trunks
[Feb 28 16:50:33] WARNING[23039] channel.c: Exceptionally long voice queue length queuing to IAX2/arctica-3307
[Feb 28 16:50:33] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1
[Feb 28 16:50:34] WARNING[23037] channel.c: Exceptionally long voice queue length queuing to IAX2/arctica-3307
[Feb 28 16:50:34] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1
[Feb 28 16:50:35] WARNING[23039] channel.c: Exceptionally long voice queue length queuing to IAX2/arctica-3307
[Feb 28 16:50:35] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1
[Feb 28 16:50:36] WARNING[23041] channel.c: Exceptionally long voice queue length queuing to IAX2/arctica-3307
[Feb 28 16:50:36] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1
[Feb 28 16:50:36] VERBOSE[2712][C-0000317a] app_dial.c:     -- IAX2/uls-node-r2d2-17667 answered Local/0761916763@uls-place-call-0000377b;2
[Feb 28 16:50:38] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1
[Feb 28 16:50:39] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1
[Feb 28 16:50:40] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1
[Feb 28 16:50:42] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1
[Feb 28 16:50:43] WARNING[2712][C-0000317a] channel.c: Exceptionally long voice queue length queuing to Local/0761916763@uls-place-call-0000377b;1
[Feb 28 16:50:44] NOTICE[2712][C-0000317a] src/cdr_uls.C: cdr_uls: cdr entry: clid=0873514879,src=0873514879,dst=0761916763,dcontext=uls-callprovider,channel=Local/0761916763@uls-place-call-0000377b;2,dstchannel=IAX2/uls-node-r2d2-17667,lastapp=Dial,lastdata=IAX2/uls-node-r2d2/prov*is*0761916763,,,start=2013-02-28 16:50:26,answer=2013-02-28 16:50:36,end=2013-02-28 16:50:44,duration=18,billsec=8,disposition=ANSWERED,amaflags=DOCUMENTATION,accountcode=arctica,uniqueid=morpheus-1362063026.53901,userfield=0873534134,dialed=0761916763,ulstype=normal,peerip=105.236.192.85,destnetwork=Vodacom (Pty) Ltd,calldesc=Vodacom (Pty) Ltd,circuit=quality,provider=is,cli=yes,dialed=0761916763,ulstype=normal,peerip=105.236.192.85
{noformat}

... This indicates that the call got hung up ...

{noformat}
[Feb 28 16:50:44] VERBOSE[2712][C-0000317a] pbx.c:   == Spawn extension (uls-callprovider, 0761916763, 20) exited non-zero on 'Local/0761916763@uls-place-call-0000377b;2'
[Feb 28 16:51:18] WARNING[23044] chan_iax2.c: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 11, ts=20015, seqno=6)
[Feb 28 16:51:19] WARNING[23045] chan_iax2.c: Max retries exceeded to host 105.236.192.85 on IAX2/arctica-3307 (type = 6, subclass = 2, ts=21017, seqno=7)
{noformat}

and from here on this is all I see.

Looking at the code for some reason the channel never gets destroyed.  At the location where the message is output (channels/chan_iax2.c, lines 3550-3569) it would seem that as long as iaxs[callno]=>owner is not NULL that the channel will never get destroyed and we will infinitely keep retransmitting a HANGUP request - which is likely to be ignored by the remote end due to the callid not existing.

Looking at frame.h it would seem that type=6 => AST_FRAME_IAX, and subclass=2 and 11 represents AST_CONTROL_RING and AST_CONTROL_OPTION respectively.

The logs on the remote server (not under my direct control) shows the following for the same call (time is offset by about an hour):

{noformat}
[2013-02-28 15:58:20] VERBOSE[23968] pbx.c:     -- Executing [s@macro-dialout-trunk:20] Dial("SIP/100-000003ea", "IAX2/ulsvoip/0761916763,300,") in new stack
[2013-02-28 15:58:20] VERBOSE[23968] app_dial.c:     -- Called ulsvoip/0761916763
[2013-02-28 15:58:20] VERBOSE[23969] app_mixmonitor.c:   == Begin MixMonitor Recording SIP/100-000003ea
[2013-02-28 15:58:20] VERBOSE[2996] chan_iax2.c:     -- Call accepted by 196.37.63.68 (format g729)
[2013-02-28 15:58:20] VERBOSE[2996] chan_iax2.c:     -- Format for call is g729
[2013-02-28 15:58:20] VERBOSE[23968] app_dial.c:     -- IAX2/ulsvoip-16511 is making progress passing it to SIP/100-000003ea
[2013-02-28 15:58:31] VERBOSE[23968] chan_iax2.c:     -- Hungup 'IAX2/ulsvoip-16511'
[2013-02-28 15:58:31] VERBOSE[23968] app_macro.c:   == Spawn extension (macro-dialout-trunk, s, 20) exited non-zero on 'SIP/100-000003ea' in macro 'dialout-trunk'
[2013-02-28 15:58:31] VERBOSE[23968] pbx.c:   == Spawn extension (from-internal, 0761916763, 5) exited non-zero on 'SIP/100-000003ea'
[2013-02-28 15:58:31] VERBOSE[23968] pbx.c:     -- Executing [h@from-internal:1] Hangup("SIP/100-000003ea", "") in new stack
[2013-02-28 15:58:31] VERBOSE[23968] pbx.c:   == Spawn extension (from-internal, h, 1) exited non-zero on 'SIP/100-000003ea'
{noformat}

So I'm not sure what to make of this ...
Comments:By: Rusty Newton (rnewton) 2013-03-06 15:50:23.834-0600

Sounds like a bug, but still need further information.

1.Is it only calls going through the "arctica" peer that get locked up?

1A. Does this peer handle only inbound, or both inbound and outbound calls? If so, do only one or the other lock up? That could lead us to a dialplan path that affects the issue.

2.It may not be possible considering you probably don't know which channel will lock up, but I should ask anyway - can you provide a PCAP of a complete call where its locked up, plus a comparison with a working call?

3.Can you provide the same log info, but along with the DEBUG message type?

4. Can you provide the same log information, but for a working call and attach to the issue?

5. can you provide the iax.conf configuration for the "arctica" peer? along with the [general] config?



By: Rusty Newton (rnewton) 2013-03-25 13:04:30.913-0500

Jaco, going to let this one sit for another two weeks so you have time to gather the debug if possible.

By: Jaco Kroon (jkroon) 2013-04-03 07:46:10.665-0500

Hi,

1.  No, other IAX/2 peers also does this from time to time.
1A. Bi-directional call directions, usually only one side.
2.  The most logical place for me to make this capture would require around 30-40GB of storage for pcap purposes per day.
3.  Busy looking into this now.  Need this for a SIP lockup in 11.3.0 too.  Odly enough, this bug hasn't triggered in a whilst, but I can leave my switches running with debug messages on (only around 2GB of logs currently per day, so this will hopefully not push it over the limit).
4.  99% of the calls work, if I get the data for "3" I'll provide the full logs for a period around the error period.
5.  See below:

{noformat}
[general]
bindport=4569
iaxcompat=no
delayreject=yes
language=en
tos=ef
autokill=5000

iaxthreadcount=10
iaxmaxthreadcount=500

;jitterbuffer=yes
;forcejitterbuffer=yes
;maxjitterbuffer=1000
;maxjitterinterps=10
;resyncthreshold=1000
codecpriority=host

disallow=all
allow=g729
;allow=speex
allow=gsm

trunkfreq=10
trunktimestamps=yes
trunkmtu=1240

calltokenoptional=0.0.0.0/0.0.0.0
maxcallnumbers=256

; You may specify a global default AMA flag for iaxtel calls.  It must be
; one of 'default', 'omit', 'billing', or 'documentation'.  These flags
; are used in the generation of call detail records.
;
amaflags=billing
authdebug=no

transfer=no
regcontext=uls-trunks

[arctica]
type=friend
context=uls-from-trunk
secret=no_your_not_going_to_get_that_lucky
username=ulsvoip
host=dynamic
qualify=yes
accountcode=arctica
trunk=yes
requirecalltoken=auto
maxcallnumbers=1024
transfer=no
setvar=_realuser=arctica
setvar=_codecannounce=1
{noformat}

The config the client is using is a standard freepbx install in this case with the following (not my preferred setup, and we've seen this to other peers before too):

{noformat}
[general]
disallow=all
allow=ulaw
allow=alaw
allow=gsm
mailboxdetail=yes
tos=ef
trunkfreq=10
trunktimestamps=yes

register=arctica:no_your_not_going_to_get_that_lucky@voip.uls.co.za

[ulsvoip]
disallow=all
username=arctica
type=friend
secret=no_your_not_going_to_get_that_lucky
qualify=yes ; possibly make this a value like 200
host=voip.uls.co.za
trunk=yes
allow=g729
context=from-trunk
{noformat}

I must add that we've had to add a few other patches to IAX/2 to fix other issues (patches was all uploaded to issues.asterisk.org) this seems to have at the very least reduced and we haven't seen a case where it keeps escalating indefinitely again.

By: Rusty Newton (rnewton) 2013-04-09 19:03:16.063-0500

Okay - we'll need the additional DEBUG info that corresponds with PCAPS for a call in question to really see what is happening here unless we have a way to reproduce. I'll put this in Waiting on Feedback for now.

By: Jaco Kroon (jkroon) 2013-04-10 08:51:48.583-0500

Rusty,

I'm not seeing this happen any more.  And I hate that.  I really do not like it if problems just go away.  Don't get me wrong, I'm thankful that the problem is gone, but it means there is an underlying problem sitting in limbo somewhere just waiting to trigger and bite.

For now I cannot see how we can possibly troubleshoot this, so let's close.  If I get more information I'll log a new issue.

By: Rusty Newton (rnewton) 2013-04-10 15:00:20.789-0500

Roger that. I hear you. That's crazy frustrating. Closing this for now.

By: Bobby Hakimi (bobbymc) 2015-10-27 02:06:28.053-0500

this happens to me every day, i cant reproduce it on demand but it happens at random times during the day, its a production system we have and im at a loss, i dont even mind paying someone to help us fix this. can anyone please help or direct me to a place we can pay to fix this? i am getting very desperate and we are losing clients due to this