[Home]

Summary:ASTERISK-28605: chan_dahdi: Deadlock in Hangup Scenarios with concurrent command pri show span X
Reporter:Dirk Wendland (kesselklopfer79)Labels:
Date Opened:2019-10-31 04:17:49Date Closed:2020-01-08 08:58:05.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:PBX/General
Versions:16.6.0 16.6.1 Frequency of
Occurrence
Constant
Related
Issues:
is related toASTERISK-28525 chan_dahdi: set CHANNEL(hangupsource) when a PRI channel hangs up
Environment:Asterisk 16.6 CentOs 6Attachments:( 0) 13527-PS1-bt.log
( 1) 13527-PS1-deadlock.odg
( 2) 13527-PS1-test.txt
( 3) core-asterisk-running-2019-11-08T11-08-15+0100-brief.txt
( 4) core-asterisk-running-2019-11-08T11-08-15+0100-full.txt
( 5) core-asterisk-running-2019-11-08T11-08-15+0100-locks.txt
( 6) core-asterisk-running-2019-11-08T11-08-15+0100-thread1.txt
( 7) currentLocks.txt
Description:We have a Szenario with an high Call Flow ( load tests ).
Szenario:
4 S2M Ports => 60 channels go up in the same second 60 Channels hangs up in the same second
Every few seconds we check the ISN Lines with the command:
pri show span X
After a few iterations asterisk will stop working.
On the console we can fire up that command only one time then that command line interface hangs.

localhost*CLI> pri show span 4
Primary D-channel: 109
Status: Up, Active
The next ( Switchtype ) will not be printed
-----------------------------
We found the patch/task that created that issue.
ASTERISK-28525
If we revert that commit everything works fine.

Greetings
Dirk
Comments:By: Asterisk Team (asteriskteam) 2019-10-31 04:17:52.743-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Joshua C. Colp (jcolp) 2019-10-31 04:22:01.252-0500

We suspect that you have a deadlock occurring within Asterisk. Please follow the instructions on the wiki [1] for obtaining debug relevant to a deadlock. Once you have that information, attach it to the issue. Be sure the instructions are followed exactly as the debug may otherwise not be useful.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock



By: Dirk Wendland (kesselklopfer79) 2019-10-31 04:36:16.458-0500

Hi Joshua

I will try to get that till the end of the next week around 8-11 Nov.

Have a nice weekend greetings
Dirk

By: Dirk Wendland (kesselklopfer79) 2019-11-08 04:20:28.170-0600

Hi Joshua

i cannot upload the File. It has 30 MB in tgz format.

{NOFORMAT}
[root@localhost tmp]# asterisk -rvvv
Asterisk 16.6.1 (debug build 16.6.1), Copyright (C) 1999 - 2018, Digium, Inc. and others.
Created by Mark Spencer <markster@digium.com>
Asterisk comes with ABSOLUTELY NO WARRANTY; type 'core show warranty' for details.
This is free software, with components licensed under the GNU General Public
License version 2 and other licenses; you are welcome to redistribute it under
certain conditions. Type 'core show license' for details.
=========================================================================
Connected to Asterisk 16.6.1 (debug build 16.6.1) currently running on localhost (pid = 6771)
localhost*CLI>
localhost*CLI> core show locks

=======================================================================
=== 16.6.1 (debug build 16.6.1)
=== Currently Held Locks
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 0x7fbe7f73d700 LWP:6779 (worker_start         started at [ 1111] threadpool.c worker_thread_start())
=== ---> Waiting for Lock #0 (threadpool.c): MUTEX 364 threadpool_execute pool 0x21b6eb0 (1)
asterisk __ast_pthread_mutex_lock()
asterisk __ao2_lock()
asterisk <unknown>()
asterisk <unknown>()
asterisk ast_taskprocessor_push()
asterisk ast_threadpool_push()
asterisk <unknown>()
asterisk <unknown>()
=== -------------------------------------------------------------------
===
=======================================================================

   -- Remote UNIX connection
   -- Remote UNIX connection disconnected
localhost*CLI>
{NOFORMAT}

By: Dirk Wendland (kesselklopfer79) 2019-11-08 04:22:13.021-0600

Asterisk core running have 1.5G so that file will missing.

By: George Joseph (gjoseph) 2019-12-18 08:37:33.149-0600

We are reverting the change that may have caused this issue and that will take effect in Asterisk 13.30, 16.7 and 17.1.  In the mean time we're trying to determine exactly what the issue was and hopefully have a permanent fix in the following releases.


By: George Joseph (gjoseph) 2019-12-19 09:43:48.585-0600

Unfortunately,  the backtrace files you attached don't have the debugging symbols included in them.   How did you install Asterisk, from source?  If so, can you rebuild with the DONT_OPTIMIZE flag set and the BETTER_BACKTRACES flag unset, make sure the binaries aren't stripped and then re-create the issue (not on a production system of course)?

If you can do that, run
{noformat}
ast_coredumper --tarball-coredumps --running --no-default-search
{noformat}

The resulting tarball will help us narrow down the original issue.  You can then host the tarball on Google Drive, Dropbox, etc and give us the link.


By: Dirk Wendland (kesselklopfer79) 2019-12-20 03:03:44.186-0600

Hi George,

can you please try that link: https://files.starface.de/index.php/s/xpwfMsLDTaQnF9j.
That is the whole archieve as an tgz with the dump.
That is only our testsystem and not from an production system.
The asterisk should be correct compiled if not the next run will take a while ..... because of holidays etc.

Please can you check that file and give me an short ok or not ok with that bigger file so i will try to get an better one :).

Greetings
Dirk


By: George Joseph (gjoseph) 2019-12-20 08:15:29.296-0600

It doesn't have the binaries in it but it does have the functions and line numbers so that will work.  I was finally able to reproduce the issue yesterday so I can use your new backtraces to validate that it's the same issue.

Have a happy holidays!



By: Frederic LE FOLL (flefoll) 2020-01-07 02:19:40.002-0600

Test for https://gerrit.asterisk.org/c/asterisk/+/13527 Patch Set 1

By: Frederic LE FOLL (flefoll) 2020-01-07 02:19:45.486-0600

Test with Change Set https://gerrit.asterisk.org/c/asterisk/+/13527, Patch Set 1:

Conditions :
- TE820 with span 4 (euroisdn pri_net) connected to span 8 (euroisdn pri_cpe) with a crossover cable
- call generation with call files towards span 4, approx 10 calls/s
- call answer on span 8 with fast hangup (50ms)
- 'pri show span 4' every 1 s
See attached file 13527-PS1-test.txt for test conditions.

On one of our test servers, deadlock occurs after a variable time (generally 10mn). On another one (different hardware), no deadlock occurs.
Of course, the crossover cable creates a very artificial coincidence between outgoing and incoming calls.
See attached files 13527-PS1-bt.log for a backtrace of all threads after a deadlock, and 13527-PS1-deadlock.odg for a graphical representation of the deadlock.

By: Friendly Automation (friendly-automation) 2020-01-08 08:58:06.316-0600

Change 13505 merged by Friendly Automation:
sig_pri:  Fix deadlock caused by sig_pri_queue_hangup

[https://gerrit.asterisk.org/c/asterisk/+/13505|https://gerrit.asterisk.org/c/asterisk/+/13505]

By: Friendly Automation (friendly-automation) 2020-01-08 09:42:33.128-0600

Change 13527 merged by Joshua Colp:
sig_pri:  Fix deadlock caused by sig_pri_queue_hangup

[https://gerrit.asterisk.org/c/asterisk/+/13527|https://gerrit.asterisk.org/c/asterisk/+/13527]

By: Friendly Automation (friendly-automation) 2020-01-08 09:42:46.947-0600

Change 13528 merged by Joshua Colp:
sig_pri:  Fix deadlock caused by sig_pri_queue_hangup

[https://gerrit.asterisk.org/c/asterisk/+/13528|https://gerrit.asterisk.org/c/asterisk/+/13528]

By: Friendly Automation (friendly-automation) 2020-01-08 09:42:59.977-0600

Change 13529 merged by Joshua Colp:
sig_pri:  Fix deadlock caused by sig_pri_queue_hangup

[https://gerrit.asterisk.org/c/asterisk/+/13529|https://gerrit.asterisk.org/c/asterisk/+/13529]