Summary: | ASTERISK-15529: [patch] PRI locks randomly, hangup cause 102, "recovery on timer expiry". | ||
Reporter: | Alec Davis (alecdavis) | Labels: | |
Date Opened: | 2010-01-27 04:21:07.000-0600 | Date Closed: | 2010-06-10 10:52:55 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_dahdi |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) 28-01-10-missed-setup.txt ( 1) full-debug-29-01-10.txt ( 2) iframe-restart_t203.diff.txt ( 3) pri_locked.txt ( 4) pri-debug-29-01-10.txt ( 5) t203-iframe-restart.diff2.txt | |
Description: | Every week once or twice our PRI locks up. The easiest way I've found to get it started again is to issue at the CLI 'dahdi restart' When it is locked up, inbound pri calls are not accepted. ****** ADDITIONAL INFORMATION ****** Identical issue sent to asterisk-users list.<pre> http://lists.digium.com/pipermail/asterisk-users/2009-August/236460.html</pre> | ||
Comments: | By: Alec Davis (alecdavis) 2010-01-27 04:23:36.000-0600 Hardware: Wildcard TE220 (4th Gen) (VPMOCT064) libpri version: SVN-branch-1.4-r1374M DAHDI Version: SVN-branch-2.2-r7684 Echo Canceller: MG2 Asterisk SVN-branch-1.6.1-r241456M uploaded pri_locked.txt This is one of our staff trying to call into our IVR. Just after that call I issued 'dahdi restart' from CLI, and all calls start flowing again. It seems like, if a CAUSE 102 is received on the ISDN, we should soft restart the channel it was received on, or something to that effect. As I understand it, existing calls would still be working, but in this capture, none were going. By: Alec Davis (alecdavis) 2010-01-27 17:41:25.000-0600 uploaded full debug trace 28-01-10-missed-setup.txt libpri upgraded this morning to: libpri version: SVN-branch-1.4-r1439 Below while this happended I was capturing the Jtec dchannel with Jtec tools. 0248 E1M-B 100128 11:03:14.72 Q.931 SETUP C/Ref 6092 Org 0251 E1M-B 100128 11:03:23.79 Q.931 SETUP C/Ref 6092 Org 0253 E1M-B 100128 11:03:29.00 Q.931 RELease C/Ref 6092 Org 0255 E1M-B 100128 11:03:34.00 Q.931 RELease C/Ref 6092 Org 0259 E1M-B 100128 11:03:47.29 Q.931 SETUP C/Ref 6093 Org 0261 PBX-A 100128 11:03:47.29 Q.931 CALL PROCeeding C/Ref 6093 Dest 0262 PBX-A 100128 11:03:47.35 Q.931 ALERTing C/Ref 6093 Dest 0265 PBX-A 100128 11:03:50.42 Q.931 CONNect C/Ref 6093 Dest 0267 E1M-B 100128 11:03:50.53 Q.931 CONNect ACKnowledge C/Ref 6093 Org Although there are a few seconds difference between systems the RELease 13 seconds before the next Successful setup corresponds with the astersik debug log. console output: tail end of previous call, failed call, then beginning of next successful call: -- Native bridging DAHDI/29-1 and DAHDI/3-1 -- Channel 0/29, span 1 got hangup request, cause 16 -- Executing [h@ivr2-bdt-day:1] Verbose("DAHDI/29-1", "0,'' IVR MAIN-DAY Hungup") in new stack '' IVR MAIN-DAY Hungup -- Executing [h@ivr2-bdt-day:2] Verbose("DAHDI/29-1", "0,Time=Thu Jan 28 11:02:23 2010") in new stack Time=Thu Jan 28 11:02:23 2010 -- Hungup 'DAHDI/3-1' == Spawn extension (ivr2-bdt-day, 2, 3) exited non-zero on 'DAHDI/29-1' -- Hungup 'DAHDI/29-1' [Jan 28 11:03:28] WARNING[18583]: chan_dahdi.c:10661 pri_fixup_principle: Call specified, but not found? [Jan 28 11:03:28] WARNING[18583]: chan_dahdi.c:11803 pri_dchannel: Hangup on bad channel 0/29 on span 1 [Jan 28 11:03:33] WARNING[18583]: chan_dahdi.c:10661 pri_fixup_principle: Call specified, but not found? [Jan 28 11:03:33] WARNING[18583]: chan_dahdi.c:11803 pri_dchannel: Hangup on bad channel 0/29 on span 1 -- Executing [9100@incoming:1] Goto("DAHDI/29-1", "ivr2-bdt-main,s,1") in new stack -- Goto (ivr2-bdt-main,s,1) -- Executing [s@ivr2-bdt-main:1] Verbose("DAHDI/29-1", "0,Time=Thu Jan 28 11:03:46 2010") in new stack Time=Thu Jan 28 11:03:46 2010 -- Executing [s@ivr2-bdt-main:2] Verbose("DAHDI/29-1", "0,'' IVR Main called '9100' IAXVAR='' ") in new stack '' IVR Main called '9100' IAXVAR='' -- Executing [s@ivr2-bdt-main:3] Ringing("DAHDI/29-1", "") in new stack -- Accepting call from '' to '9100' on channel 0/29, span 1 <b>Asterisk didn't see the SETUP messages, but saw the RELEASE messages????</b> By: Alec Davis (alecdavis) 2010-01-28 15:27:05.000-0600 uploaded pri-debug-29-01-10.txt and full-debug-29-01-20.txt <b>libpri does see the incoming SETUP messages</b>, but doesn't act on it? By: Alec Davis (alecdavis) 2010-01-29 02:39:13.000-0600 uploaded iframe-restart_t203.diff.txt Note has debug print lines that need to be removed. following q.931 <pre> 5.10.3 Connection verification procedures 5.10.3.1 Start timer T203 The timer T203 is started: – when the multiple-frame-established state is entered; and – in the multiple-frame-established state whenever timer T200 is stopped. (See Note in 5.10.3.2.) Upon receiving an I or supervisory frame, timer T203 will be restarted if timer T200 is not to be started.</pre> console output: -- Starting T203 timer -- Starting T200 timer -- Stopping T200 timer -- Starting T203 timer <b>-- Restarting T203 timer</b> < Protocol Discriminator: Q.931 (8) len=34 < Call Ref: len= 2 (reference 25945/0x6559) (Originator) < Message Type: SETUP (5) < [04 03 80 90 a3] < Bearer Capability (len= 5) [ Ext: 1 Q.931 Std: 0 Info transfer capability: Speech (0) < Ext: 1 Trans mode/rate: 64kbps, circuit-mode (16) < User information layer 1: A-Law (35) < [18 03 a9 83 81] < Channel ID (len= 5) [ Ext: 1 IntID: Implicit Other(PRI) Spare: 0 Exclusive Dchan: 0 < ChanSel: As indicated in following octets < Ext: 1 Coding: 0 Number Specified Channel Type: 3 < Ext: 1 Channel: 1 Type: CPE] < [1e 02 a1 90] < Progress Indicator (len= 4) [ Ext: 1 Coding: Non-ITU international standard (1) 0: 0 Location: Private network serving the local user (1) < Ext: 1 Progress Description: Interworking with a public network. (16) ] < [6c 02 00 80] < Calling Number (len= 4) [ Ext: 0 TON: Unknown Number Type (0) NPI: Unknown Number Plan (0) < Presentation: Presentation permitted, user number not screened (0) '' ] < [70 05 80 38 35 30 39] < Called Number (len= 7) [ Ext: 1 TON: Unknown Number Type (0) NPI: Unknown Number Plan (0) '8509' ] < [9d] < Non-Locking Shift (len=01): Requested codeset 5 < [32 01 80] < Calling Party Category (len= 3) [ Ext: 1 Cat: Unknown Source (0) ] -- Making new call for cr 25945 Received message for call 0xb6406c78 on 0xb6c10468 TEI/SAPI 0/0, call->pri is 0xb6c10468 TEI/SAPI 0/0 -- Processing Q.931 Call Setup -- Processing IE 4 (cs0, Bearer Capability) -- Processing IE 24 (cs0, Channel Identification) -- Processing IE 30 (cs0, Progress Indicator) -- Processing IE 108 (cs0, Calling Party Number) -- Processing IE 112 (cs0, Called Party Number) -- Processing IE 50 (cs5, Calling Party Category) q931.c:6611 post_handle_q931_message: Call 25945 enters state 6 (Call Present). Hold state: Idle ALEC Fill Ring Event <b>-- T200 requested to stop when not started T203 requested to start without stopping first -- Starting T203 timer</b> By: Alec Davis (alecdavis) 2010-01-29 03:17:01.000-0600 uploaded t203-iframe-restart.diff2.txt worked through differences from release 1291 to 1406 found typo in q921.c:iframe_rx wasn't rescheduling T203 when I frame received. edit: I still think the T203 reschedule is too late, in iframe_rx(), it should be before 'res = q931_receive' to close that window where it can fire. By: Digium Subversion (svnbot) 2010-01-29 11:27:45.000-0600 Repository: libpri Revision: 1445 U branches/1.4/q921.c ------------------------------------------------------------------------ r1445 | mattf | 2010-01-29 11:27:44 -0600 (Fri, 29 Jan 2010) | 1 line Fix bug in which an event was lost if an I-frame was received during a timer recovery state (related to ASTERISK-15529) ------------------------------------------------------------------------ http://svn.digium.com/view/libpri?view=rev&revision=1445 By: Alec Davis (alecdavis) 2010-03-02 01:11:46.000-0600 closing as fixed by libpri Revision 1445 |