[Home]

Summary:ASTERISK-21099: Reload makes dahdi not work
Reporter:Jorge Bastos (jasb)Labels:
Date Opened:2013-02-13 16:28:36.000-0600Date Closed:2013-04-17 17:59:43
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_dahdi
Versions:11.2.1 Frequency of
Occurrence
Related
Issues:
Environment:Linux, x86, asterisk 11.2.1Attachments:( 0) backtrace.txt
( 1) backtrace2.txt
( 2) core_show_locks.txt
( 3) core_show_locks2.txt
( 4) dahdi_debug.txt
( 5) pri_debug.txt
Description:Since i'm using 11.2.1, I've noticed a problem with DAHDI, some some times I wasn't able to make call's, or receive them from the outside.

All I had to do, was dahdi restart.
After some time I discovered, that his happened after a core reload, so steps in my instalation to make this happen:

- core reload
- "i can't make or receive call's from the outside"
- dahdi restart
- everything's normal.

This is same behavior that didn't happened when I was using 10.12.0
Dahdi version (2.6.1) and libpri (1.4.14) are the same.

I start receiving this information when i cannot make/receive call's:

[2013-02-13 21:30:04] ERROR[26528]: chan_dahdi.c:14449 dahdi_pri_error: PRI Span: 2 Unable to receive TEI from network in state 3(Establish awaiting TEI)!

As said, after the dahdi restart, everything went back to normal.
What can I do to debug/help on this problem?
Comments:By: Jorge Bastos (jasb) 2013-02-13 16:54:32.601-0600

Hi, check the info with pri debug set on spans's.

By: Jorge Bastos (jasb) 2013-02-14 15:26:42.933-0600

Hi,

I'm now very confused from where the problem is.
I went back to asterisk 2.10 and same problem, after a reload, cannot make/receive call's.

What can I do to solve this?

By: Richard Mudgett (rmudgett) 2013-02-14 15:49:09.829-0600

chan_dahdi has never handled reloads very well.  It would be nice to know between which Asterisk versions this problem occurred.

By: Jorge Bastos (jasb) 2013-02-14 16:59:37.430-0600

Hi Richard,

Tested 10.12.1, and 11.2.1, and this symptom exists.
What do you need me to do to help on this? I can make this problem happen anytime I need.

By: Richard Mudgett (rmudgett) 2013-02-14 17:28:19.052-0600

Was there a revision that worked?  I want to find out if this is a regression or is something that has always been there.

By: Jorge Bastos (jasb) 2013-02-15 06:03:35.988-0600

Hi Richard again,

Well, I was doing a test for you, and went back to 10.10.0, and the problem doesn't happen.
I tried this version to test, 'cause I have some servers using that version, and people never complained.

Does this help?

By: Richard Mudgett (rmudgett) 2013-02-15 13:14:26.799-0600

There were only a few changes to chan_dahdi/sig_pri between v10.10.0 and v10.12.1 and none of them should have had this effect.  The timer anti-pattern fix and device state cache security fix are possibilities but I would not think so.

This may be a deadlock situation that is forcibly cleared by the CLI "dahdi restart" command.  Please collect a core show locks output.

By: Richard Mudgett (rmudgett) 2013-02-15 13:14:53.180-0600

Debugging deadlocks: Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install).  This will then give you the console command "core show locks." When the symptoms of the deadlock present themselves again, please provide output of the deadlock via:

# asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt
# gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt
gdb> bt
gdb> bt full
gdb> thread apply all bt

Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks!



By: Jorge Bastos (jasb) 2013-02-15 15:33:53.770-0600

Hi,

I can't send you the files, 'cause something failed here, but anything, there's no lock when this problems occors, please see:

root@asterisk:~# asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt
Asterisk 10.10.0, Copyright (C) 1999 - 2012 Digium, Inc. and others.
Created by Mark Spencer <markster@digium.com>
Asterisk comes with ABSOLUTELY NO WARRANTY; type 'core show warranty' for details.
This is free software, with components licensed under the GNU General Public
License version 2 and other licenses; you are welcome to redistribute it under
certain conditions. Type 'core show license' for details.
=========================================================================
 == Parsing '/etc/asterisk/asterisk.conf':   == Found

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=======================================================================

Executing last minute cleanups
root@asterisk:~# gdb -se "asterisk" `pidof asterisk` | tee /tmp/backtrace.tx
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
12095: No such file or directory.
(gdb) btNo stack.

(gdb) bt fullNo stack.

(gdb) thread apply all bt
(gdb)


By: Richard Mudgett (rmudgett) 2013-02-15 16:55:45.071-0600

That was done on a v10.10.0 installation which you said did not have the problem.

By: Jorge Bastos (jasb) 2013-02-16 05:06:45.628-0600

Hi,

You're right, forgot I've downgraded.
Went back to 11.2.1, and same results, I mean, no dead lock when I can't make call's.
The output is 100% equal to the one above.

Any idea?

By: Rusty Newton (rnewton) 2013-02-18 17:44:28.202-0600

Do the "dahdi_pri_error:" messages start immediately after the 'core reload' ?

Can you please attach your chan_dahdi.conf (sanitized if necessary)

Can you provide a VERBOSE and DEBUG log of the actual 'core reload' and a few minutes after?



By: Rusty Newton (rnewton) 2013-03-25 12:59:37.350-0500

Jorge, is the issue still occurring? Can you respond to the above request for further information?

By: Jorge Bastos (jasb) 2013-03-26 15:30:58.462-0500

Hi Guys,

Sorry for the delay, haven't got time for more tests, allow me to re-test this on the weekend, even with dahdi 2.6.2 (hope it's compatible with kernel 3.8.4).

By: Jorge Bastos (jasb) 2013-03-30 15:39:18.998-0500

Hi,

It seems that this problem, is caused by the freepbx dahdi config module.
When I have more info I'll post.

By: Rusty Newton (rnewton) 2013-04-02 17:23:25.117-0500

If the issue is with the FreePBX module itself then you may want to look for assistance with it on the FreePBX forums where you'll find other users of it.

If you do find a bug with the latest DAHDI configuration module for FreePBX - you'll need to report that bug on the [FreePBX bug tracker|http://www.freepbx.org/trac/simpleticket].

What version of FreePBX are you using?

Let us know what you find out..

By: Max E. Reyes Vera J. (navaismo) 2013-04-09 18:28:33.595-0500

Hi, I have similar issue using vanilla Asterisk:

* Asterisk version 11.3.0
* Dahdi Version 2.6.2
* CentOS with kernel 2.6.32-358.2.1.el6.x86_64

The backtrace was obtained with this command(not sure if its ok):

{code}
gdb -ex "bt full" -ex "thread apply all bt" --batch -p 28057 | tee backtrace.txt
{code}

By: Richard Mudgett (rmudgett) 2013-04-09 18:45:01.153-0500

@Max
The backtrace is showing the dahdi_restart() waiting on a pthread_join() and the pri_dchannel() thread is waiting in write() to kick start the link.

Do you have layer2_persistence enabled?

By: Max E. Reyes Vera J. (navaismo) 2013-04-09 19:08:31.739-0500

@Richard Mudgett

{quote}Do you have layer2_persistence enabled?{quote}
To be honest with You I don't know what is that option and where to setup that.

By: Max E. Reyes Vera J. (navaismo) 2013-04-09 19:35:10.960-0500

Ok thanks to "elguero" from IRC, now I know that's a chan_dadhi config option. Never used before so I'll test it and back with results.

And the answer to original question is no, I'm not using that option. So I guess is using the default: leave_down.

By: Max E. Reyes Vera J. (navaismo) 2013-04-10 10:52:46.986-0500

After more test, Now when I run asterisk -vvvvvvdddcg and run the pri set debug on span1 y saw this errors: http://pastebin.com/rXdcmXpg

Then run again the core show locks--->http://pastebin.com/Z888Ysg9
And finally i Get the backtrace--->http://pastebin.com/TDANwgdP

I'm starting to think something about a card issue.

By: Rusty Newton (rnewton) 2013-04-10 15:08:19.791-0500

Please attach the debug output to this issue as separate files use "More Actions > Attach Files" so that we still have access if the pastebin instances expire.

By: Rusty Newton (rnewton) 2013-04-10 16:38:13.049-0500

@Max what is the make, model and firmware version of your card? Have you already worked with the vendors technical support department?

By: Max E. Reyes Vera J. (navaismo) 2013-04-11 17:08:03.920-0500

@Rusty

Added the files here.

{quote}what is the make, model and firmware version of your card?{quote}

I have tried with a Digium Card 4 E1's 1st Generation, driver: wct4xxp, also with an OpenVox Card. Same result. The Server is an HP ML110 G2.

The provider claim the E1 must work with ccs,hdb3,crc4 & euroisdn, but no luck, I have tried without the crc4, changing to ami and national and same result.

It's very weird, both cards don't show the alarm when I Disconnect the E1 until I ran the dahdi_cfg.


By: Rusty Newton (rnewton) 2013-04-17 17:55:44.745-0500

@Max, please file a new issue with all the information you have provided so far in the description, and attach any debug to the new issue (as separate files), also you'll want to attach your /etc/dahdi/system.conf and /etc/asterisk/chan_dahdi.conf (plus any #included files from chan_dahdi.conf)

I'm not sure that your issue is exactly the same as the one here.

By: Rusty Newton (rnewton) 2013-04-17 17:58:24.435-0500

{quote}
It seems that this problem, is caused by the freepbx dahdi config module.
When I have more info I'll post.
{quote}
@Jorge, regarding your last post. I'm going to close this, as this is not the issue tracker that you'll want to discuss a problem with the FreePBX DAHDI config module on.

See my comment at "02/Apr/13 5:23 PM"