Summary: | ASTERISK-28605: chan_dahdi: Deadlock in Hangup Scenarios with concurrent command pri show span X | ||||
Reporter: | Dirk Wendland (kesselklopfer79) | Labels: | |||
Date Opened: | 2019-10-31 04:17:49 | Date Closed: | 2020-01-08 08:58:05.000-0600 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | PBX/General | ||
Versions: | 16.6.0 16.6.1 | Frequency of Occurrence | Constant | ||
Related Issues: |
| ||||
Environment: | Asterisk 16.6 CentOs 6 | Attachments: | ( 0) 13527-PS1-bt.log ( 1) 13527-PS1-deadlock.odg ( 2) 13527-PS1-test.txt ( 3) core-asterisk-running-2019-11-08T11-08-15+0100-brief.txt ( 4) core-asterisk-running-2019-11-08T11-08-15+0100-full.txt ( 5) core-asterisk-running-2019-11-08T11-08-15+0100-locks.txt ( 6) core-asterisk-running-2019-11-08T11-08-15+0100-thread1.txt ( 7) currentLocks.txt | ||
Description: | We have a Szenario with an high Call Flow ( load tests ).
Szenario: 4 S2M Ports => 60 channels go up in the same second 60 Channels hangs up in the same second Every few seconds we check the ISN Lines with the command: pri show span X After a few iterations asterisk will stop working. On the console we can fire up that command only one time then that command line interface hangs. localhost*CLI> pri show span 4 Primary D-channel: 109 Status: Up, Active The next ( Switchtype ) will not be printed ----------------------------- We found the patch/task that created that issue. ASTERISK-28525 If we revert that commit everything works fine. Greetings Dirk | ||||
Comments: | By: Asterisk Team (asteriskteam) 2019-10-31 04:17:52.743-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur. By: Joshua C. Colp (jcolp) 2019-10-31 04:22:01.252-0500 We suspect that you have a deadlock occurring within Asterisk. Please follow the instructions on the wiki [1] for obtaining debug relevant to a deadlock. Once you have that information, attach it to the issue. Be sure the instructions are followed exactly as the debug may otherwise not be useful. Thanks! [1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock By: Dirk Wendland (kesselklopfer79) 2019-10-31 04:36:16.458-0500 Hi Joshua I will try to get that till the end of the next week around 8-11 Nov. Have a nice weekend greetings Dirk By: Dirk Wendland (kesselklopfer79) 2019-11-08 04:20:28.170-0600 Hi Joshua i cannot upload the File. It has 30 MB in tgz format. {NOFORMAT} [root@localhost tmp]# asterisk -rvvv Asterisk 16.6.1 (debug build 16.6.1), Copyright (C) 1999 - 2018, Digium, Inc. and others. Created by Mark Spencer <markster@digium.com> Asterisk comes with ABSOLUTELY NO WARRANTY; type 'core show warranty' for details. This is free software, with components licensed under the GNU General Public License version 2 and other licenses; you are welcome to redistribute it under certain conditions. Type 'core show license' for details. ========================================================================= Connected to Asterisk 16.6.1 (debug build 16.6.1) currently running on localhost (pid = 6771) localhost*CLI> localhost*CLI> core show locks ======================================================================= === 16.6.1 (debug build 16.6.1) === Currently Held Locks ======================================================================= === === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: 0x7fbe7f73d700 LWP:6779 (worker_start started at [ 1111] threadpool.c worker_thread_start()) === ---> Waiting for Lock #0 (threadpool.c): MUTEX 364 threadpool_execute pool 0x21b6eb0 (1) asterisk __ast_pthread_mutex_lock() asterisk __ao2_lock() asterisk <unknown>() asterisk <unknown>() asterisk ast_taskprocessor_push() asterisk ast_threadpool_push() asterisk <unknown>() asterisk <unknown>() === ------------------------------------------------------------------- === ======================================================================= -- Remote UNIX connection -- Remote UNIX connection disconnected localhost*CLI> {NOFORMAT} By: Dirk Wendland (kesselklopfer79) 2019-11-08 04:22:13.021-0600 Asterisk core running have 1.5G so that file will missing. By: George Joseph (gjoseph) 2019-12-18 08:37:33.149-0600 We are reverting the change that may have caused this issue and that will take effect in Asterisk 13.30, 16.7 and 17.1. In the mean time we're trying to determine exactly what the issue was and hopefully have a permanent fix in the following releases. By: George Joseph (gjoseph) 2019-12-19 09:43:48.585-0600 Unfortunately, the backtrace files you attached don't have the debugging symbols included in them. How did you install Asterisk, from source? If so, can you rebuild with the DONT_OPTIMIZE flag set and the BETTER_BACKTRACES flag unset, make sure the binaries aren't stripped and then re-create the issue (not on a production system of course)? If you can do that, run {noformat} ast_coredumper --tarball-coredumps --running --no-default-search {noformat} The resulting tarball will help us narrow down the original issue. You can then host the tarball on Google Drive, Dropbox, etc and give us the link. By: Dirk Wendland (kesselklopfer79) 2019-12-20 03:03:44.186-0600 Hi George, can you please try that link: https://files.starface.de/index.php/s/xpwfMsLDTaQnF9j. That is the whole archieve as an tgz with the dump. That is only our testsystem and not from an production system. The asterisk should be correct compiled if not the next run will take a while ..... because of holidays etc. Please can you check that file and give me an short ok or not ok with that bigger file so i will try to get an better one :). Greetings Dirk By: George Joseph (gjoseph) 2019-12-20 08:15:29.296-0600 It doesn't have the binaries in it but it does have the functions and line numbers so that will work. I was finally able to reproduce the issue yesterday so I can use your new backtraces to validate that it's the same issue. Have a happy holidays! By: Frederic LE FOLL (flefoll) 2020-01-07 02:19:40.002-0600 Test for https://gerrit.asterisk.org/c/asterisk/+/13527 Patch Set 1 By: Frederic LE FOLL (flefoll) 2020-01-07 02:19:45.486-0600 Test with Change Set https://gerrit.asterisk.org/c/asterisk/+/13527, Patch Set 1: Conditions : - TE820 with span 4 (euroisdn pri_net) connected to span 8 (euroisdn pri_cpe) with a crossover cable - call generation with call files towards span 4, approx 10 calls/s - call answer on span 8 with fast hangup (50ms) - 'pri show span 4' every 1 s See attached file 13527-PS1-test.txt for test conditions. On one of our test servers, deadlock occurs after a variable time (generally 10mn). On another one (different hardware), no deadlock occurs. Of course, the crossover cable creates a very artificial coincidence between outgoing and incoming calls. See attached files 13527-PS1-bt.log for a backtrace of all threads after a deadlock, and 13527-PS1-deadlock.odg for a graphical representation of the deadlock. By: Friendly Automation (friendly-automation) 2020-01-08 08:58:06.316-0600 Change 13505 merged by Friendly Automation: sig_pri: Fix deadlock caused by sig_pri_queue_hangup [https://gerrit.asterisk.org/c/asterisk/+/13505|https://gerrit.asterisk.org/c/asterisk/+/13505] By: Friendly Automation (friendly-automation) 2020-01-08 09:42:33.128-0600 Change 13527 merged by Joshua Colp: sig_pri: Fix deadlock caused by sig_pri_queue_hangup [https://gerrit.asterisk.org/c/asterisk/+/13527|https://gerrit.asterisk.org/c/asterisk/+/13527] By: Friendly Automation (friendly-automation) 2020-01-08 09:42:46.947-0600 Change 13528 merged by Joshua Colp: sig_pri: Fix deadlock caused by sig_pri_queue_hangup [https://gerrit.asterisk.org/c/asterisk/+/13528|https://gerrit.asterisk.org/c/asterisk/+/13528] By: Friendly Automation (friendly-automation) 2020-01-08 09:42:59.977-0600 Change 13529 merged by Joshua Colp: sig_pri: Fix deadlock caused by sig_pri_queue_hangup [https://gerrit.asterisk.org/c/asterisk/+/13529|https://gerrit.asterisk.org/c/asterisk/+/13529] |