Summary: | ASTERISK-25279: Deadlock using chan_sip | ||
Reporter: | Dmitriy Serov (Demon) | Labels: | |
Date Opened: | 2015-07-24 12:40:22 | Date Closed: | |
Priority: | Major | Regression? | Yes |
Status: | Open/New | Components: | |
Versions: | 13.4.0 | Frequency of Occurrence | Frequent |
Related Issues: | |||
Environment: | Attachments: | ( 0) 2015_07_23__10_11_01.backtrace-threads.log ( 1) 2015_07_24__04_26_01.backtrace-threads.log ( 2) 2015_07_24__17_28_01.backtrace-threads.log ( 3) 2015_07_25__21_40_01.backtrace-threads.txt ( 4) 2015_07_25__21_40_01.locks.txt ( 5) 2015_07_25__22_53_01.locks.txt ( 6) 2015_08_04__20_51_01.backtrace-threads.txt ( 7) 2015_08_04__20_51_01.locks.txt ( 8) 2015_08_05__02_20_01.backtrace-threads.txt ( 9) 2015_08_05__02_20_01.full.tail.txt (10) 2015_08_05__02_20_01.locks.txt (11) 2015_08_05__08_36_01.backtrace-threads.txt (12) 2015_08_05__08_36_01.full.tail.txt (13) 2015_08_05__08_36_01.locks.txt (14) full.txt | |
Description: | Used watchdog that monitors netstat Recv-Q.
At the excess value Recv-Q makes the backtrace and restart asterisk. Deadlock occurs almost every day, sometimes several times a day. Attached backtraces of last restarts. ice_support is off. stun support is off. res_stun_monitor.so is unloaded. | ||
Comments: | By: Asterisk Team (asteriskteam) 2015-07-24 12:40:24.503-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Dmitriy Serov (Demon) 2015-07-24 12:43:26.769-0500 Backtraces of last deadlocks. By: Rusty Newton (rnewton) 2015-07-24 18:53:36.440-0500 You marked regression, what was the last version this didn't occur in? You may have already answered this in another issue, but are you able to run with DEBUG_THREADS for a time and get the output of "core show locks" ? By: Dmitriy Serov (Demon) 2015-07-25 14:02:02.664-0500 with "core show locks" now deadlock in chan_sip (port 5060 with 177600 bytes RecV-Q) log has not any ERROR. By: Dmitriy Serov (Demon) 2015-07-25 14:04:23.082-0500 Regression. Before 13.4 asterisk worked for weeks without any deadlocks (except in very rare problems with DNS) By: Dmitriy Serov (Demon) 2015-07-25 16:47:31.271-0500 watchdog restarted asterisk 3 times last 5 hours! :( By: Rusty Newton (rnewton) 2015-07-27 16:42:48.762-0500 Thanks for the additional output {quote} Regression. Before 13.4 asterisk worked for weeks without any deadlocks (except in very rare problems with DNS) {quote} What was the last version this didn't occur in? That is, what was the *last* version you used *previous* to 13.4 where the issue *did not* occur? Thanks! By: Rusty Newton (rnewton) 2015-07-27 16:50:11.657-0500 In addition to answering my previous question please also attach a debug log captured leading up to the deadlock. Please follow the instructions here: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information We need the DEBUG channel and VERBOSE channel. Turn them up to 5 if possible. You may want to monitor the log size on your server as it will be quite large. By: Dmitriy Serov (Demon) 2015-08-04 13:56:54.718-0500 I don't know the exact point in the "full.txt" log. Watchdog triggereded in 20:51:01. In the logs sometimes there is a line: [2015-08-04 20:46:18] ERROR[24901] netsock2.c: getaddrinfo("nm", "(null)", ...): Name or service not known There is no any peer/endpoint with host/domain "nm". By: Rusty Newton (rnewton) 2015-08-04 16:00:07.609-0500 Thanks for the debug log! I'll also ask this question a third time. :) {quote} Regression. Before 13.4 asterisk worked for weeks without any deadlocks (except in very rare problems with DNS) {quote} What was the last version this issue didn't occur in? *That is, what was the last version you used previous to 13.4 where the issue did not occur?* By: Dmitriy Serov (Demon) 2015-08-05 00:43:41.496-0500 I used all versions starting with 13.1. Updated with git "between" versions. Of course, something changed, but in version 13.1, 13.2 and, not too sure, 13.3 I don't remember such problems. On all latest versions of branch 11 of the server with the same settings (except for the lack of pjsip) worked for months without problems. Strictly speaking it is incorrect to name "the deadlocks" the situation in these backtraces. Watchdog ensures a strong excess queue, essentially to stop processing packets for clients. By: Dmitriy Serov (Demon) 2015-08-05 01:20:50.311-0500 Two additional cases By: Dmitriy Serov (Demon) 2015-08-07 05:32:44.270-0500 Some change the watchdog and use it a couple of days showed that I described the situation cannot be called a classic deadlock. Occurs temporary suspension of processing of incoming UDP packets in chan_sip, but when checking in a minute the situation is corrected. This is not a deadlock, but the result affects the worsening or the temporary stop of the voice. |