[Home]

Summary:ASTERISK-25279: Deadlock using chan_sip
Reporter:Dmitriy Serov (Demon)Labels:
Date Opened:2015-07-24 12:40:22Date Closed:
Priority:MajorRegression?Yes
Status:Open/NewComponents:
Versions:13.4.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Attachments:( 0) 2015_07_23__10_11_01.backtrace-threads.log
( 1) 2015_07_24__04_26_01.backtrace-threads.log
( 2) 2015_07_24__17_28_01.backtrace-threads.log
( 3) 2015_07_25__21_40_01.backtrace-threads.txt
( 4) 2015_07_25__21_40_01.locks.txt
( 5) 2015_07_25__22_53_01.locks.txt
( 6) 2015_08_04__20_51_01.backtrace-threads.txt
( 7) 2015_08_04__20_51_01.locks.txt
( 8) 2015_08_05__02_20_01.backtrace-threads.txt
( 9) 2015_08_05__02_20_01.full.tail.txt
(10) 2015_08_05__02_20_01.locks.txt
(11) 2015_08_05__08_36_01.backtrace-threads.txt
(12) 2015_08_05__08_36_01.full.tail.txt
(13) 2015_08_05__08_36_01.locks.txt
(14) full.txt
Description:Used watchdog that monitors netstat Recv-Q.
At the excess value Recv-Q makes the backtrace and restart asterisk.
Deadlock occurs almost every day, sometimes several times a day.
Attached backtraces of last restarts.
ice_support is off.
stun support is off. res_stun_monitor.so is unloaded.
Comments:By: Asterisk Team (asteriskteam) 2015-07-24 12:40:24.503-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Dmitriy Serov (Demon) 2015-07-24 12:43:26.769-0500

Backtraces of last deadlocks.

By: Rusty Newton (rnewton) 2015-07-24 18:53:36.440-0500

You marked regression, what was the last version this didn't occur in?

You may have already answered this in another issue, but are you able to run with DEBUG_THREADS for a time and get the output of "core show locks" ?

By: Dmitriy Serov (Demon) 2015-07-25 14:02:02.664-0500

with "core show locks"
now deadlock in chan_sip (port 5060 with 177600 bytes RecV-Q)
log has not any ERROR.

By: Dmitriy Serov (Demon) 2015-07-25 14:04:23.082-0500

Regression. Before 13.4 asterisk worked for weeks without any deadlocks (except in very rare problems with DNS)

By: Dmitriy Serov (Demon) 2015-07-25 16:47:31.271-0500

watchdog restarted asterisk 3 times last 5 hours! :(

By: Rusty Newton (rnewton) 2015-07-27 16:42:48.762-0500

Thanks for the additional output
{quote}
Regression. Before 13.4 asterisk worked for weeks without any deadlocks (except in very rare problems with DNS)
{quote}
What was the last version this didn't occur in?

That is, what was the *last* version you used *previous* to 13.4 where the issue *did not* occur?

Thanks!

By: Rusty Newton (rnewton) 2015-07-27 16:50:11.657-0500

In addition to answering my previous question please also attach a debug log captured leading up to the deadlock.

Please follow the instructions here: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

We need the DEBUG channel and VERBOSE channel. Turn them up to 5 if possible. You may want to monitor the log size on your server as it will be quite large.

By: Dmitriy Serov (Demon) 2015-08-04 13:56:54.718-0500

I don't know the exact point in the "full.txt" log. Watchdog triggereded in 20:51:01.

In the logs sometimes there is a line:
[2015-08-04 20:46:18] ERROR[24901] netsock2.c: getaddrinfo("nm", "(null)", ...): Name or service not known
There is no any peer/endpoint with host/domain "nm".

By: Rusty Newton (rnewton) 2015-08-04 16:00:07.609-0500

Thanks for the debug log!

I'll also ask this question a third time. :)

{quote}
   Regression. Before 13.4 asterisk worked for weeks without any deadlocks (except in very rare problems with DNS)
{quote}

What was the last version this issue didn't occur in?

*That is, what was the last version you used previous to 13.4 where the issue did not occur?*

By: Dmitriy Serov (Demon) 2015-08-05 00:43:41.496-0500

I used all versions starting with 13.1. Updated with git "between" versions.
Of course, something changed, but in version 13.1, 13.2 and, not too sure, 13.3 I don't remember such problems.
On all latest versions of branch 11 of the server with the same settings (except for the lack of pjsip) worked for months without problems.

Strictly speaking it is incorrect to name "the deadlocks" the situation in these backtraces. Watchdog ensures a strong excess queue, essentially to stop processing packets for clients.


By: Dmitriy Serov (Demon) 2015-08-05 01:20:50.311-0500

Two additional cases

By: Dmitriy Serov (Demon) 2015-08-07 05:32:44.270-0500

Some change the watchdog and use it a couple of days showed that I described the situation cannot be called a classic deadlock.

Occurs temporary suspension of processing of incoming UDP packets in chan_sip, but when checking in a minute the situation is corrected. This is not a deadlock, but the result affects the worsening or the temporary stop of the voice.