[Home]

Summary:ASTERISK-25435: Asterisk periodically hangs. UDP Recv-Q greatly exceeds zero.
Reporter:Dmitriy Serov (Demon)Labels:
Date Opened:2015-09-30 02:24:05Date Closed:2015-10-08 13:14:06
Priority:MajorRegression?No
Status:Closed/CompleteComponents:
Versions:13.5.0 13.6.0 Frequency of
Occurrence
Frequent
Related
Issues:
is duplicated byASTERISK-25653 Deadlock - PJ_ENOMEM errors & high Recv-Q counts when using PJSIP TLS extensions
Environment:Attachments:( 0) 2015_09_29__21_42_01.full.tail.txt
( 1) 2015_09_29__21_42_01.netstat.txt
( 2) 2015_09_29__21_43_01.backtrace-threads.txt
( 3) 2015_09_29__21_43_01.full.tail.txt
( 4) 2015_09_29__21_43_01.locks.txt
( 5) 2015_09_29__21_43_01.netstat.txt
( 6) 2015_09_29__21_44_07.backtrace-threads.txt
( 7) 2015_09_29__21_44_07.full.tail.txt
Description:Asterisk periodic hangs.
UDP Recv-Q greatly exceeds zero.
No errors in log (like DNS error, function getaddr).
The system behavior is very similar to ASTERISK-25421.
STUN is Off
Backtraces attached.
Comments:By: Asterisk Team (asteriskteam) 2015-09-30 02:24:06.253-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Dmitriy Serov (Demon) 2015-09-30 02:28:12.364-0500

Watchdog monitoring netstat Recv-Q length.
21:42:01 - size exceeded (netstat result, full log tail attached)
21:43:01 - size still exceeded (netstat result, full log tail, backtrace, locks attached)
21:44:07 - size still exceeded (full log tail, backtrace attached). Asterisk was killed -9

The situation is repeated at least once a day


By: Mark Michelson (mmichelson) 2015-10-01 15:44:06.011-0500

It looks like the problem is that the send_request_wrapper structure in res_pjsip.c has its mutex created using pj_mutex_create_simple(). The lock is then attempted to be locked recursively (see thread 14 of 2015_09_29__21_43_01.backtrace-threads.txt), which results in the thread blocking forever. There are two potential solutions here:

1) Declare the lock using pj_mutex_create_recursive() so that this will not cause a deadlock
2) Use an ast_mutex_t, which is always created recursive. This also would allow for the lock to show up in 'core show locks' output.

By: Richard Mudgett (rmudgett) 2015-10-07 12:42:27.872-0500

Patch up on gerrit to fix the deadlock identified by [~mmichelson]:
https://gerrit.asterisk.org/#/c/1412/ v13
https://gerrit.asterisk.org/#/c/1413/ master