[Home]

Summary:ASTERISK-28893: pbx_realtime: Cascading deadlock due to ast_autoservice_stop blocking
Reporter:Martin Nyström (martin.nystrom)Labels:
Date Opened:2020-05-14 07:28:59Date Closed:2020-05-14 08:00:56
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/General PBX/pbx_realtime
Versions:13.31.0 Frequency of
Occurrence
Occasional
Related
Issues:
duplicatesASTERISK-21228 Deadlock in pbx_find_extension when attempting an autoservice stop due to holding the context lock
Environment:Centos 7.7.1908 Linux 3.10.0-1062.9.1.el7.x86_64Attachments:( 0) dumps.tar.gz
Description:We seem to repeatable run into full udp buffers in our Asterisk. I am trying to figure out if Asterisk is doing something incorrectly or if the issue lays in the OS level. When this happens, as you can imagine, all UDP traffic starts to get dropped and an asterisk restart is required.

Now I could increase the UDP buffers, but it feels like that will only push the issue to the future instead and if its in Asterisk then that solution would not be ideal or long-term.

[root@server ~]# netstat -c --udp -an | grep 5060
udp   213248      0 0.0.0.0:5060            0.0.0.0:*                          
udp   213248      0 0.0.0.0:5060            0.0.0.0:*                          
udp   213248      0 0.0.0.0:5060            0.0.0.0:*                          
udp   213248      0 0.0.0.0:5060            0.0.0.0:*                          
udp   213248      0 0.0.0.0:5060            0.0.0.0:*                          
udp   213248      0 0.0.0.0:5060            0.0.0.0:*  

[root@server ~]# sysctl -a | grep mem
net.core.optmem_max = 20480
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.ipv4.igmp_max_memberships = 20
net.ipv4.tcp_mem = 1443162 1924218 2886324
net.ipv4.tcp_rmem = 4096 87380 6291456
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.udp_mem = 1445055 1926743 2890110
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.ens192.stable_secret"
sysctl: reading key "net.ipv6.conf.ens224.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"
vm.lowmem_reserve_ratio = 256 256 32
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.nr_hugepages_mempolicy = 0
vm.overcommit_memory = 1

Attached core dumps when this happens.
Comments:By: Asterisk Team (asteriskteam) 2020-05-14 07:29:00.954-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Martin Nyström (martin.nystrom) 2020-05-14 07:30:04.118-0500

4.2G core-asterisk-running-2020-05-14T12-09-57+0000
464K core-asterisk-running-2020-05-14T12-09-57+0000-brief.txt
1.7M core-asterisk-running-2020-05-14T12-09-57+0000-full.txt
4.0K core-asterisk-running-2020-05-14T12-09-57+0000-locks.txt
4.0K core-asterisk-running-2020-05-14T12-09-57+0000-thread1.txt

By: Joshua C. Colp (jcolp) 2020-05-14 07:39:51.328-0500

You are using pbx_realtime which has locking issues and can cause a deadlock. This results in other threads getting blocked, such as the chan_sip thread which handles UDP traffic. It is highly recommended to not use pbx_realtime.

By: Martin Nyström (martin.nystrom) 2020-05-14 07:41:24.628-0500

Do we know pbx_realtime is the cause or was it just a wild guess?

By: Asterisk Team (asteriskteam) 2020-05-14 07:41:24.949-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Joshua C. Colp (jcolp) 2020-05-14 08:00:43.761-0500

I analyzed the backtrace. A channel is deadlocked while trying to stop autoservice as a result of pbx_realtime being in use, same as in ASTERISK-21228. Other threads are iterating through the channel container which requires locking the channel that is deadlocked, they then become deadlocked as they wait for the lock. If pbx_realtime were not in use, autoservice would not be in use, the code path would not be executed, the deadlock would not occur.