[Home]

Summary:ASTERISK-29998: sla: deadlock when calling SLAStation application
Reporter:N A (InterLinked)Labels:
Date Opened:2022-03-31 06:10:16Date Closed:2022-11-28 08:54:44.000-0600
Priority:MinorRegression?
Status:Closed/CompleteComponents:Applications/app_meetme
Versions:18.6.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) exvvljatlc.txt
Description:As soon as I call the SLAStation application in the dialplan, deadlock ensues almost instantly:

Initially, I get this:

[Mar 30 20:10:16] DEBUG[9760]: channel.c:3073 ast_waitfor_nandfds: Thread LWP 9760 is blocking 'DAHDI/5-1', already blocked by thread LWP 9655 in procedure ast_waitfor_nandfds

An IAX2 trunk call that normally works fails, seemingly as a result of that. I can get a Local channel to at least dial.

Then, the longer one waits, the worse the issue progressively becomes,and it's very clear that there is a deadlock here, between 9862 and 9863:

{noformat}
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9863]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9863 is blocking 'DAHDI/5-1', already blocked by thread LWP 9862 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:04] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:05] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:05] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:05] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:05] DEBUG[9863]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9863 is blocking 'DAHDI/5-1', already blocked by thread LWP 9862 in procedure ast_waitfor_nandfds
[Mar 30 20:20:05] DEBUG[9863]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9863 is blocking 'DAHDI/5-1', already blocked by thread LWP 9862 in procedure ast_waitfor_nandfds
[Mar 30 20:20:05] DEBUG[9863]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9863 is blocking 'DAHDI/5-1', already blocked by thread LWP 9862 in procedure ast_waitfor_nandfds
[Mar 30 20:20:05] DEBUG[9863]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9863 is blocking 'DAHDI/5-1', already blocked by thread LWP 9862 in procedure ast_waitfor_nandfds
[Mar 30 20:20:05] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
[Mar 30 20:20:05] DEBUG[9862][C-00000001]: chan_dahdi.c:8742 dahdi_read: Thread LWP 9862 is blocking 'DAHDI/5-1', already blocked by thread LWP 9863 in procedure ast_waitfor_nandfds
{noformat}

Interestingly, "core show locks" does not return anything (no locks held).

Finally, to round things off, when I hang up the phone that called SLAStation, I get thousands of these per second in the log:

{noformat}
[Mar 30 20:10:21] DEBUG[9760]: devicestate.c:361 _ast_device_state: No provider found, checking channel drivers for DAHDI - 5
[Mar 30 20:10:21] DEBUG[9760]: devicestate.c:361 _ast_device_state: No provider found, checking channel drivers for DAHDI - 5
[Mar 30 20:10:21] DEBUG[9760]: devicestate.c:361 _ast_device_state: No provider found, checking channel drivers for DAHDI - 5
[Mar 30 20:10:21] DEBUG[9760]: devicestate.c:361 _ast_device_state: No provider found, checking channel drivers for DAHDI - 5
[Mar 30 20:10:21] WARNING[9760]: logger: ***: Log queue threshold (1000) exceeded.  Discarding new messages.
[Mar 30 20:10:21] WARNING[9559]: logger: ***: Logging resumed.  354 messages discarded.
{noformat}

This was initially. When I recompiled with developer options for locks and memory, this doesn't seem to happen anymore. However, the thread blocking still occurs.

minimal sla.conf:
{noformat}
[general]
attemptcallerid = no

[line1]
type = trunk
ringtimeout = 99999
device = Local/s@line1out
barge = yes
hold = open

[station1]
type = station
trunk = line1
device = DAHDI/5
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2022-03-31 06:10:17.503-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: N A (InterLinked) 2022-04-01 18:30:12.176-0500

Actually, deadlock was conservative - I tried this on another system and this causes a crash there:

[2022-04-01 23:27:19]     -- Local/disa@bla-out-00000004;1 answered
[2022-04-01 23:27:19] DEBUG[29440]: channel.c:3065 ast_waitfor_nandfds: Thread LWP 29440 is blocking 'Local/4002702596@astrex-lines-pre-00000003;2', already blocked by thread LWP 29503 in procedure ast_waitfor_nandfds
[2022-04-01 23:27:19] ERROR[29440]: channel.c:3065 ast_waitfor_nandfds: FRACK!, Failed assertion 0 (0)
[2022-04-01 23:27:20] ERROR[29440]:   Got 8 backtrace records
# 0: asterisk __ast_assert_failed()
# 1: asterisk <unknown>()
# 2: asterisk ast_waitfor_nandfds()
# 3: asterisk ast_waitfor_n()
# 4: asterisk <unknown>()
# 5: asterisk <unknown>()
# 6: [0x7f45b9e13fa3] libpthread.so.0 pthread_create.c:487 start_thread()
# 7: [0x7f45b98a7eff] libc.so.6 clone.S:97 clone()

core-full.txt is attached.



By: Friendly Automation (friendly-automation) 2022-11-28 08:54:45.272-0600

Change 19374 merged by Friendly Automation:
sla: Prevent deadlock and crash due to autoservicing.

[https://gerrit.asterisk.org/c/asterisk/+/19374|https://gerrit.asterisk.org/c/asterisk/+/19374]

By: Friendly Automation (friendly-automation) 2022-11-28 11:19:30.180-0600

Change 19308 merged by Friendly Automation:
sla: Prevent deadlock and crash due to autoservicing.

[https://gerrit.asterisk.org/c/asterisk/+/19308|https://gerrit.asterisk.org/c/asterisk/+/19308]

By: Friendly Automation (friendly-automation) 2022-11-28 17:41:49.232-0600

Change 19372 merged by Friendly Automation:
sla: Prevent deadlock and crash due to autoservicing.

[https://gerrit.asterisk.org/c/asterisk/+/19372|https://gerrit.asterisk.org/c/asterisk/+/19372]