[Home]

Summary:ASTERISK-23460: ooh323 channel stuck if call is placed directly and gatekeeper is not available
Reporter:Dmitry Melekhov (slesru)Labels:
Date Opened:2014-03-11 03:43:23Date Closed:2014-03-28 13:03:36
Priority:MajorRegression?
Status:Closed/CompleteComponents:Addons/chan_ooh323
Versions:11.7.0 Frequency of
Occurrence
Related
Issues:
Environment:Centos 6/x86-64Attachments:( 0) ASTERISK-23460.patch
( 1) ASTERISK-23460-1.patch
( 2) ASTERISK-23460-2.patch
( 3) ASTERISK-23460-stacklookup.patch
( 4) DebugLog
( 5) h323_log
( 6) h323_log
Description:Hello!

This issue is related to ASTERISK-23033 , but is worse.

Just had power outage, and I turned on asterisk server before server with gnugk.
I got stuck channel with direct, i.e. to peer , not registered in gatekeeper call:

OOH323/neftisa-4     6973@h323:1          Down    (None)                        
DAHDI/i1/6401-14     116973@dahdi:9       Ring    Dial(OOH323/6973@neftisa)  

I was not able to place call , only solution is to restart asterisk...

Looks like the same problem was solved for calls to peers registered in gatekeeper, but not for direct calls.

Thank you!
Comments:By: Rusty Newton (rnewton) 2014-03-11 18:38:01.552-0500

Can you reproduce this reliably by making the gatekeeper unavailable and placing a call directly to it? Or did it only happen during your power outage the one time?

By: Dmitry Melekhov (slesru) 2014-03-11 23:02:38.152-0500

Hello!

It is not easy to reproduce this , because of ASTERISK-23033, which makes impossible to place call directly if gatekeeper is not available :-(

btw, this is what I have in log:

12:02:35:138  Error:Transport failure while reading Q931 message (incoming, ooh323c_1)
12:10:26:512  Error:Transport failure while reading Q931 message (incoming, ooh323c_3)
12:21:18:032  Error:Gatekeeper not responding to ARQ
12:21:18:032  Error: Gatekeeper error. Either Gk not responding or Gk sending invalid messages
12:21:18:032  Error: Gatekeeper error detected. Closing GkClient as Gk mode is UseSpecifcGatekeeper


By: Alexander Anikin (may213) 2014-03-13 12:10:15.907-0500

Hello,

Looks like to i known where is problem,
will try to make patch few later.

By: Alexander Anikin (may213) 2014-03-13 17:12:11.793-0500

Dmitry,

please test attached patch when you do it. I'm not sure that is solve and
I could not reproduce problem at the moment but will do more testing.


By: Dmitry Melekhov (slesru) 2014-03-13 22:47:41.463-0500

Hello!

Thank you!
I'll ,hopefully, do test in monday, just because I can do such tests only at low load time....


By: Dmitry Melekhov (slesru) 2014-03-16 22:37:39.832-0500

Hello!

With this patch there is no complete stack lockup, but call still is in stale state.

How I reproduced this:

I blocked gatekeeper ip by iptables, then placed calls. first call was dropped by asterisk after about 30 seconds ( I guess tcp timeout), second call stalled, this is what I got after some time:

asterisk*CLI> core show channels
Channel              Location             State   Application(Data)            
DAHDI/i1/6401-1      116973@dahdi:9       Ring    Dial(OOH323/6973@neftisa)    
...
OOH323/neftisa-0     6973@h323:1          Down    (None)

then I removed iptables filter and  copied h323 log, I'll attach it.
And even after I hanged up from pri side call still "exists"
asterisk*CLI> core show channels
Channel              Location             State   Application(Data)            
DAHDI/i1/6401-1      116973@dahdi:9       Ring    Dial(OOH323/6973@neftisa)    
OOH323/neftisa-0     6973@h323:1          Down    (None)  

but other h323 calls can be placed, this is good, but, I think it's better to fix these stalled calls.

Thank you!


By: Dmitry Melekhov (slesru) 2014-03-16 22:38:34.869-0500

here is stalled call log

By: Alexander Anikin (may213) 2014-03-17 03:11:34.060-0500

Hello Dmitry,

Thank you, will analyze log.

By: Dmitry Melekhov (slesru) 2014-03-17 03:30:25.817-0500

Thank you!

Log contains only last call, if I remember correctly, not two as I wrote.
I turned tracelevel=6 and restarted asterisk after first call...


By: Alexander Anikin (may213) 2014-03-17 08:23:21.191-0500

Dmitry,

the problem is same as ASTERISK-21960
i.e. asterisk channel lockup if there is no channel create in ooh323 stack.
Please try attached patch ASTERISK-23460-stacklookup.patch, it change infinite waiting to time limited.
But it's not final patch here, i will do more correct way with call back from ooh323 stack to asterisk channel
if there is error on channel creation in the stack.



By: Dmitry Melekhov (slesru) 2014-03-17 22:25:58.042-0500

Hello!

You are right about problem cause- this patch makes problem disappear- channels always hangs up after some time.
Waiting for final patch and ready to test :-)

Thank you!

By: Alexander Anikin (may213) 2014-03-26 13:27:44.301-0500

Hello,

Dmitry please try attached ASTERISK-23460-1.patch.
It send signal on locked condition variable from call clearing callback that cause immediately if call can't go without GK.
Patch is for clean 11 branch without previous patches from here.

By: Dmitry Melekhov (slesru) 2014-03-26 22:37:30.129-0500

Hello!

Thank you very much, I hope to test at tomorrow early morning, or, unfortunately, I guess , more realistic, at Monday morning...


By: Dmitry Melekhov (slesru) 2014-03-26 23:47:09.478-0500

Oops!
I forgot I have server which is still not in production :-)
So I tested on it, I can use it only from sip, but there is no difference- I still can reproduce problem on
11.8.1  with only ASTERISK-23460-1.patch  applied.

I blocked gatekeeper :
iptables -A INPUT -s 192.168.22.254 -j DROP

then restarted asterisk and placed call:

  -- Executing [6401@sipphones:5] Dial("SIP/6052-00000000", "OOH323/6401@cisco

ast-blk*CLI> core show channels
Channel              Location             State   Application(Data)            
SIP/6052-00000000    6401@sipphones:5     Ring    Dial(OOH323/6401@cisco)      
OOH323/cisco-0       6401@h323:1          Down    (None)                        
2 active channels
1 active call
1 call processed

and it stucks forever.

I'll attach h323 log with debug level =6 .

Thank you!

By: Dmitry Melekhov (slesru) 2014-03-26 23:49:59.756-0500

sip to h323 stuck call

By: Alexander Anikin (may213) 2014-03-27 02:56:58.144-0500

It's interesting that there is no any call logs in h323_log only gk ras packets.
Dmitry, could you attach asterisk console log for the call with ooh323 set debug enabled?


By: Dmitry Melekhov (slesru) 2014-03-27 03:37:49.988-0500

Hello!

Here is log including console log...


By: Dmitry Melekhov (slesru) 2014-03-27 03:40:30.737-0500

As I said, I'm doing call directly to peer...

btw, if I remove iptables filter and do call it looks in console as:

---   find_peer "cisco"
comparing with "192.168.22.253"
found matching peer
+++   find_peer "cisco"
---   ooh323_new - cisco
+++   h323_new
---   onNewCallCreated 7f4d0c000d18: ooh323c_o_1
....


By: Alexander Anikin (may213) 2014-03-27 06:56:20.960-0500

Dmitry,

please try attached ASTERISK-23460-2.patch. As previous it must be applied to clean asterisk.
I forget remove limitation on ooh323 stack command processing in there is no registered gk.

By: Dmitry Melekhov (slesru) 2014-03-27 07:39:57.447-0500

Thank you!

Now all looks good, I get hangup if asterisk was registered in gatekeeper

   -- Called OOH323/6401@cisco
Restart stopped gatekeeper client
 == Everyone is busy/congested at this time (1:0/0/1)
   -- Executing [6401@sipphones:6] Hangup("SIP/6052-00000012", "") in new stack
 == Spawn extension (sipphones, 6401, 6) exited non-zero on 'SIP/6052-00000012'

and immediately if not:

  -- Called OOH323/6401@cisco
 == Everyone is busy/congested at this time (1:0/0/1)
   -- Executing [6401@sipphones:6] Hangup("SIP/6052-00000013", "") in new stack
 == Spawn extension (sipphones, 6401, 6) exited non-zero on 'SIP/6052-00000013'


I'll apply patch to production system tomorrow morning and test :-)

Thank you!


By: Dmitry Melekhov (slesru) 2014-03-27 22:21:49.646-0500

Good morning! :-)

Just tested on production- can't reproduce problem with
ASTERISK-23460-2.patch

So, problem is fixed.
Thank you!

And it still will be good to call peers directly when gatekeeper is not available ( ASTERISK-23033 ) , but, anyway, system is in working state in such situation and this is excellent :-)


By: Alexander Anikin (may213) 2014-03-28 08:00:34.108-0500

Dmitry,

thank you for testing, fine result. I will close this issue and go to switch to 23033 ;)


By: Alexander Anikin (may213) 2014-03-28 13:03:36.609-0500

close as resolved