[Home]

Summary:ASTERISK-15529: [patch] PRI locks randomly, hangup cause 102, "recovery on timer expiry".
Reporter:Alec Davis (alecdavis)Labels:
Date Opened:2010-01-27 04:21:07.000-0600Date Closed:2010-06-10 10:52:55
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_dahdi
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 28-01-10-missed-setup.txt
( 1) full-debug-29-01-10.txt
( 2) iframe-restart_t203.diff.txt
( 3) pri_locked.txt
( 4) pri-debug-29-01-10.txt
( 5) t203-iframe-restart.diff2.txt
Description:Every week once or twice our PRI locks up. The easiest way I've found to get it started again is to issue at the CLI 'dahdi restart'

When it is locked up, inbound pri calls are not accepted.


****** ADDITIONAL INFORMATION ******

Identical issue sent to asterisk-users list.<pre>
http://lists.digium.com/pipermail/asterisk-users/2009-August/236460.html</pre>
Comments:By: Alec Davis (alecdavis) 2010-01-27 04:23:36.000-0600

Hardware:
Wildcard TE220 (4th Gen) (VPMOCT064)

libpri version: SVN-branch-1.4-r1374M
DAHDI Version: SVN-branch-2.2-r7684 Echo Canceller: MG2
Asterisk SVN-branch-1.6.1-r241456M

uploaded pri_locked.txt
This is one of our staff trying to call into our IVR.

Just after that call I issued 'dahdi restart' from CLI, and all calls start flowing again.

It seems like, if a CAUSE 102 is received on the ISDN, we should soft restart the channel it was received on, or something to that effect. As I understand it, existing calls would still be working, but in this capture, none were going.



By: Alec Davis (alecdavis) 2010-01-27 17:41:25.000-0600

uploaded full debug trace 28-01-10-missed-setup.txt
libpri upgraded this morning to: libpri version: SVN-branch-1.4-r1439

Below while this happended I was capturing the Jtec dchannel with Jtec tools.

0248 E1M-B    100128 11:03:14.72 Q.931 SETUP                   C/Ref 6092  Org
0251 E1M-B    100128 11:03:23.79 Q.931 SETUP                   C/Ref 6092  Org
0253 E1M-B    100128 11:03:29.00 Q.931 RELease                 C/Ref 6092  Org
0255 E1M-B    100128 11:03:34.00 Q.931 RELease                 C/Ref 6092  Org

0259 E1M-B    100128 11:03:47.29 Q.931 SETUP                   C/Ref 6093  Org
0261 PBX-A    100128 11:03:47.29 Q.931 CALL PROCeeding         C/Ref 6093 Dest
0262 PBX-A    100128 11:03:47.35 Q.931 ALERTing                C/Ref 6093 Dest
0265 PBX-A    100128 11:03:50.42 Q.931 CONNect                 C/Ref 6093 Dest
0267 E1M-B    100128 11:03:50.53 Q.931 CONNect ACKnowledge     C/Ref 6093  Org

Although there are a few seconds difference between systems the RELease 13 seconds before the next Successful setup corresponds with the astersik debug log.

console output: tail end of previous call, failed call, then beginning of next successful call:
  -- Native bridging DAHDI/29-1 and DAHDI/3-1
   -- Channel 0/29, span 1 got hangup request, cause 16
   -- Executing [h@ivr2-bdt-day:1] Verbose("DAHDI/29-1", "0,'' IVR MAIN-DAY Hungup") in new stack
'' IVR MAIN-DAY Hungup
   -- Executing [h@ivr2-bdt-day:2] Verbose("DAHDI/29-1", "0,Time=Thu Jan 28 11:02:23 2010") in new stack
Time=Thu Jan 28 11:02:23 2010
   -- Hungup 'DAHDI/3-1'
 == Spawn extension (ivr2-bdt-day, 2, 3) exited non-zero on 'DAHDI/29-1'
   -- Hungup 'DAHDI/29-1'
[Jan 28 11:03:28] WARNING[18583]: chan_dahdi.c:10661 pri_fixup_principle: Call specified, but not found?
[Jan 28 11:03:28] WARNING[18583]: chan_dahdi.c:11803 pri_dchannel: Hangup on bad channel 0/29 on span 1
[Jan 28 11:03:33] WARNING[18583]: chan_dahdi.c:10661 pri_fixup_principle: Call specified, but not found?
[Jan 28 11:03:33] WARNING[18583]: chan_dahdi.c:11803 pri_dchannel: Hangup on bad channel 0/29 on span 1
   -- Executing [9100@incoming:1] Goto("DAHDI/29-1", "ivr2-bdt-main,s,1") in new stack
   -- Goto (ivr2-bdt-main,s,1)
   -- Executing [s@ivr2-bdt-main:1] Verbose("DAHDI/29-1", "0,Time=Thu Jan 28 11:03:46 2010") in new stack
Time=Thu Jan 28 11:03:46 2010
   -- Executing [s@ivr2-bdt-main:2] Verbose("DAHDI/29-1", "0,'' IVR Main called '9100' IAXVAR='' ") in new stack
'' IVR Main called '9100' IAXVAR=''
   -- Executing [s@ivr2-bdt-main:3] Ringing("DAHDI/29-1", "") in new stack
   -- Accepting call from '' to '9100' on channel 0/29, span 1

<b>Asterisk didn't see the SETUP messages, but saw the RELEASE messages????</b>



By: Alec Davis (alecdavis) 2010-01-28 15:27:05.000-0600

uploaded pri-debug-29-01-10.txt and full-debug-29-01-20.txt

<b>libpri does see the incoming SETUP messages</b>, but doesn't act on it?

By: Alec Davis (alecdavis) 2010-01-29 02:39:13.000-0600

uploaded iframe-restart_t203.diff.txt
Note has debug print lines that need to be removed.

following q.931 <pre>
5.10.3 Connection verification procedures
5.10.3.1 Start timer T203
The timer T203 is started:
– when the multiple-frame-established state is entered; and
– in the multiple-frame-established state whenever timer T200 is stopped. (See Note in
5.10.3.2.)
Upon receiving an I or supervisory frame, timer T203 will be restarted if timer T200 is not to be
started.</pre>


console output:
-- Starting T203 timer
-- Starting T200 timer
-- Stopping T200 timer
-- Starting T203 timer
<b>-- Restarting T203 timer</b>
< Protocol Discriminator: Q.931 (8)  len=34
< Call Ref: len= 2 (reference 25945/0x6559) (Originator)
< Message Type: SETUP (5)
< [04 03 80 90 a3]
< Bearer Capability (len= 5) [ Ext: 1  Q.931 Std: 0  Info transfer capability: Speech (0)
<                              Ext: 1  Trans mode/rate: 64kbps, circuit-mode (16)
<                                User information layer 1: A-Law (35)
< [18 03 a9 83 81]
< Channel ID (len= 5) [ Ext: 1  IntID: Implicit  Other(PRI)  Spare: 0  Exclusive  Dchan: 0
<                        ChanSel: As indicated in following octets
<                       Ext: 1  Coding: 0  Number Specified  Channel Type: 3
<                       Ext: 1  Channel: 1 Type: CPE]
< [1e 02 a1 90]
< Progress Indicator (len= 4) [ Ext: 1  Coding: Non-ITU international standard (1)  0: 0  Location: Private network serving the local user (1)
<                               Ext: 1  Progress Description: Interworking with a public network. (16) ]
< [6c 02 00 80]
< Calling Number (len= 4) [ Ext: 0  TON: Unknown Number Type (0)  NPI: Unknown Number Plan (0)
<                           Presentation: Presentation permitted, user number not screened (0)  '' ]
< [70 05 80 38 35 30 39]
< Called Number (len= 7) [ Ext: 1  TON: Unknown Number Type (0)  NPI: Unknown Number Plan (0)  '8509' ]
< [9d]
< Non-Locking Shift (len=01): Requested codeset 5
< [32 01 80]
< Calling Party Category (len= 3) [ Ext: 1  Cat: Unknown Source (0) ]
-- Making new call for cr 25945
Received message for call 0xb6406c78 on 0xb6c10468 TEI/SAPI 0/0, call->pri is 0xb6c10468 TEI/SAPI 0/0
-- Processing Q.931 Call Setup
-- Processing IE 4 (cs0, Bearer Capability)
-- Processing IE 24 (cs0, Channel Identification)
-- Processing IE 30 (cs0, Progress Indicator)
-- Processing IE 108 (cs0, Calling Party Number)
-- Processing IE 112 (cs0, Called Party Number)
-- Processing IE 50 (cs5, Calling Party Category)
q931.c:6611 post_handle_q931_message: Call 25945 enters state 6 (Call Present).  Hold state: Idle
ALEC Fill Ring Event
<b>-- T200 requested to stop when not started
T203 requested to start without stopping first
-- Starting T203 timer</b>



By: Alec Davis (alecdavis) 2010-01-29 03:17:01.000-0600

uploaded t203-iframe-restart.diff2.txt

worked through differences from release 1291 to 1406
found typo in q921.c:iframe_rx

wasn't rescheduling T203 when I frame received.

edit: I still think the T203 reschedule is too late, in iframe_rx(), it should be before 'res = q931_receive' to close that window where it can fire.



By: Digium Subversion (svnbot) 2010-01-29 11:27:45.000-0600

Repository: libpri
Revision: 1445

U   branches/1.4/q921.c

------------------------------------------------------------------------
r1445 | mattf | 2010-01-29 11:27:44 -0600 (Fri, 29 Jan 2010) | 1 line

Fix bug in which an event was lost if an I-frame was received during a timer recovery state (related to ASTERISK-15529)
------------------------------------------------------------------------

http://svn.digium.com/view/libpri?view=rev&revision=1445

By: Alec Davis (alecdavis) 2010-03-02 01:11:46.000-0600

closing as fixed by libpri Revision 1445