[Home]

Summary:ASTERISK-19308: problem with transit calls ooh323-dahdi(pri)-panasonic 500
Reporter:Dmitry Melekhov (slesru)Labels:
Date Opened:2012-02-08 01:49:30.000-0600Date Closed:2012-08-10 11:49:07
Priority:CriticalRegression?
Status:Closed/CompleteComponents:Addons/chan_ooh323
Versions:10.2.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Centos 6 x86Attachments:( 0) ASTERISK-19308.patch
( 1) h323_log.7169
Description:Here is call from ooh323 to dahdi how I see it in console (there are several other calls, I removed info about them):


   -- Executing [7015@h323:1] Set("OOH323/10.3.1.3-83", "FAXOPT(t38gateway)=yes") in new stack
   -- Executing [7015@h323:2] Dial("OOH323/10.3.1.3-83", "DAHDI/g1/7015") in new stack
   -- Requested transfer capability: 0x00 - SPEECH
   -- Called DAHDI/g1/7015

   -- DAHDI/i1/7015-54 is making progress passing it to OOH323/10.3.1.3-83

   -- DAHDI/i1/7015-54 is busy
   -- Hungup 'DAHDI/i1/7015-54'
 == Everyone is busy/congested at this time (1:1/0/0)
   -- Auto fallthrough, channel 'OOH323/10.3.1.3-83' status is 'BUSY'

now I have this channel busy on ooh323 side:

ast-nsk*CLI> core show channels
Channel              Location             State   Application(Data)            
OOH323/10.3.1.3-83   7015@h323:3          Busy    (None)                        
1 active channel
1 active call

and there is problem with new calls:

   -- DAHDI/i1/7024-5a is making progress passing it to OOH323/192.168.22.253-89
   -- DAHDI/i1/7024-5a is proceeding passing it to OOH323/192.168.22.253-89
   -- DAHDI/i1/7024-5a is ringing
[Feb  8 11:42:43] WARNING[23285]: app_dial.c:1379 wait_for_answer: Unable to write frametype: 2
[Feb  8 11:42:43] WARNING[23285]: app_dial.c:1379 wait_for_answer: Unable to write frametype: 2
[Feb  8 11:42:43] WARNING[23285]: app_dial.c:1379 wait_for_answer: Unable to write frametype: 2
[Feb  8 11:42:43] WARNING[23285]: app_dial.c:1379 wait_for_answer: Unable to write frametype: 2
   -- Hungup 'DAHDI/i1/7024-5a'
 == Spawn extension (h323, 7024, 2) exited non-zero on 'OOH323/192.168.22.253-89'

and system load is very high:
load average: 5.63, 5.32, 3.33
asterisk eats cpu:
 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
23055 asterisk -11   0  156m 126m   9m S 94.7  8.4  14:07.49 asterisk

Looks like there is no such problem when there are not more then 2 concurrent calls,but when load increases up to 5-6... and we have no such problem with different, but similar configuration, where transit is SIP- dahdi- TS-004 (PRI switch).

What can be wrong? What can I do to solve this problem?

Thank you!
Comments:By: Dmitry Melekhov (slesru) 2012-02-08 01:51:42.575-0600

by the way, looks like problem appears when another side is fxs gateway, connected to PBX, so may be it is not hang up for some time, but , imho, hangup should be caused by dahdi side...


By: Dmitry Melekhov (slesru) 2012-02-08 03:16:54.797-0600

btw, I forget to set hangup in dialplan :-)

exten => _7XXX,1,SET(FAXOPT(t38gateway)=yes)
exten => _7XXX,n,Dial(DAHDI/g1/${EXTEN})
exten => _7XXX,n,Hangup

may be this will help.
now added it, but will be able to test only tomorrow...


By: Alexander Anikin (may213) 2012-02-08 06:11:17.287-0600

Dmitry,

I'm not sure that issue is OOH323 issue.
Could you attach h323_log here with tracelevel=6?

By: Dmitry Melekhov (slesru) 2012-02-09 01:14:47.893-0600

Hello!

I don't sure this problem is ooh323 related too, just because we have only digital lines in our sip environment, so this problem just not appears at all.
After I added hangup in dialplan, problem disappeared, just because there is no ooh323 channel in busy state anymore, not connected to any other channel.
I'm quite happy in how it works now(i.e. when ooh323 is hangup right after dial), so, really, there is no need to fix something.
If you are interesting- I can try to reproduce this problem with debug info,  otherwise good idea is to close this issue as not a bug , but just an expected behavoir.

Thank you!


By: Dmitry Melekhov (slesru) 2012-02-09 01:15:20.625-0600

forgot to set done, sorry :-)


By: Alexander Anikin (may213) 2012-02-11 06:23:20.442-0600

Dmitry,
I think trouble is in Busy application that load cpu, but not hangup with busy code.
And i think there is need to understand why caller side don't hangup this call too.
We can continue work with this issue but i need h323_log as i said before.


By: Dmitry Melekhov (slesru) 2012-02-12 22:10:50.641-0600

Thank you!

I'll try to reproduce problem and upload logs, no easy task, because we already use this server in production though ;-)

And there is easy explanation why another side doesn't hangup.
This is fxs gateway, which is connected to PBX, so gateway-PBX connection can be hangup sometimes only by busy detector, which takes long enough time...



By: Alexander Anikin (may213) 2012-02-13 06:30:44.989-0600

Dmitry, wait your logs then will reopen issue and continue work with it.
Understand about busy detection on fxs gw.


By: Dmitry Melekhov (slesru) 2012-02-15 22:25:52.959-0600

2 calls. first OK, second to busy.

By: Dmitry Melekhov (slesru) 2012-02-15 22:27:44.356-0600

Hello!

Finally (sorry for so long time), here is log.
There are 2 calls in log.
First one is to fax, so we busy it, second one it to busy extension.
In last case we have OOH323 channel stuck in busy state.

Thank you!


By: Dmitry Melekhov (slesru) 2012-04-09 03:28:28.867-0500

btw, if user on PRI side is busy I alsways get
[Apr  9 12:26:42] ERROR[6572]: chan_ooh323.c:1520 ooh323_set_write_format: No owner found
is this OK? Can this cause problem?


By: Alexander Anikin (may213) 2012-04-09 03:48:10.356-0500

No, problem isn't here, this is just message that we have H.323 connection but asterisk channel already closed.


By: Dmitry Melekhov (slesru) 2012-04-09 04:02:33.734-0500

Well, if there is no hangup h323 connection will exists? may be long enough to eat cpu?


By: Alexander Anikin (may213) 2012-04-09 04:28:57.676-0500

Must be not, ooh323 thread is poll() based and must not cause 100% cpu (may be due to some bug, but i don't known such now)

No hangup isn't a problem also, message about no owner display when we receive some signalling from opposite side that require change media format (alerting or progress), but asterisk channel already close.
In this case we have received hangup from asterisk core but it's not performed while now and will performed
in the next loop of ooh323 thread.



By: Dmitry Melekhov (slesru) 2012-04-09 04:44:33.187-0500

I see. Thank you for clarification :-)


By: Alexander Anikin (may213) 2012-08-10 11:27:13.130-0500

Issue really in ooh323.

By: Alexander Anikin (may213) 2012-08-10 11:29:01.570-0500

The problem is in indicate routine that mark call as already gone and hangup isn't send that is wrong.

By: Alexander Anikin (may213) 2012-08-10 11:31:06.804-0500

patch is here. remove alreadygone flag on indicate busy/congestion from asterisk core.

By: Alexander Anikin (may213) 2012-08-10 11:43:33.650-0500

Dmitry,

Patch solved trouble on my system, please try on yours.
Please reopen issue if trouble still there.

By: Dmitry Melekhov (slesru) 2012-08-12 22:29:59.080-0500

Thank you!

I don't want to test patch on production system, just because I'm going to vacations in two weeks, but we are going to install new server soon ( in our communications department termins this can mean several months ;-( ,  but I can't do test without PBX connection :- ) ), I'll test on it when it will be ready. Sorry for delay...


By: Dmitry Melekhov (slesru) 2012-09-12 22:54:01.399-0500

Hello!

just tested :-)
it fixes issue in my case too.

Thank you!