[Home]

Summary:ASTERISK-25653: Deadlock - PJ_ENOMEM errors & high Recv-Q counts when using PJSIP TLS extensions
Reporter:Phi Tran (pixel)Labels:
Date Opened:2015-12-31 20:28:37.000-0600Date Closed:2016-04-15 12:05:43
Priority:MajorRegression?
Status:Closed/CompleteComponents:pjproject/pjsip
Versions:13.4.0 13.5.0 13.6.0 13.7.0 Frequency of
Occurrence
Constant
Related
Issues:
duplicatesASTERISK-25435 Asterisk periodically hangs. UDP Recv-Q greatly exceeds zero.
is duplicated byASTERISK-25870 Deadlock while using Asterisk over mobile networks
Environment:CentOS 7 (with Asterisk installed separately), FreePBX 13 (with Asterisk bundled)Attachments:( 0) backtrace-threads.txt
( 1) backtrace-threads.txt
( 2) core-show-locks.txt
Description:Deadlock occurs when PJSIP TLS connections to Asterisk fail after 8-30 hours.  The logs show "PJ_ENOMEM" errors right around the time this happens.  Happens after a variable period of time once Asterisk is started; normally happens when I am moving between Wi-Fi and LTE networks on Bria for iOS v3.5.2.  System has 4GB of RAM.

Tried first on a FreePBX distro.  When testing on a completely separate CentOS 7 clean install, this issue also happens.  Followed the Asterisk Secure Calling wiki instructions exactly.  Connections are stable when Asterisk is first started and calls can occur successfully over SRTP.

Telnet to TCP 5061 is successful even after deadlock occurs, but no registrations can occur.  Bria shows "TLS connection errors" in logs.
Comments:By: Asterisk Team (asteriskteam) 2015-12-31 20:28:38.915-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Rusty Newton (rnewton) 2016-01-04 08:33:57.238-0600

We suspect that you have a deadlock occurring within Asterisk. Please follow the instructions on the wiki [1] for obtaining debug relevant to a deadlock. Once you have that information, attach it to the issue. Be sure the instructions are followed exactly as the debug may otherwise not be useful.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock



By: Rusty Newton (rnewton) 2016-01-04 08:34:44.840-0600

In addition to the deadlock info we will require additional debug logs to continue with triage of your issue. Please follow the instructions on the wiki [1] for how to collect debugging information from Asterisk. For expediency, where possible, attach the debug with a '.txt' file extension so that the debug will be usable for further analysis.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Phi Tran (pixel) 2016-01-04 12:55:10.294-0600

Attached is the backtrace-threads log - I am on the FreePBX distro, so it will be a bit difficult to see if I can recompile on that.  This is the backtrace-threads while the deadlock is happening.  I'll see if I can get some other backtraces without recompiling.

By: Phi Tran (pixel) 2016-01-09 11:45:14.325-0600

Here is the output on a clean CentOS 7 server while the deadlock is happening.  Rusty - I sent you an e-mail regarding the logs.

By: Rusty Newton (rnewton) 2016-01-12 19:31:29.585-0600

Just noting here that Phi and I discussed this issue via E-mail. He is now testing with the latest rc of 13.7 to make sure things are fixed.

By: Phi Tran (pixel) 2016-01-29 23:42:17.870-0600

Sorry - this is still happening.  It seems like it takes longer (a couple of weeks), but it deadlocks again.  The Recv-Q for the connections that were open are in the thousands.  What other information should I submit?

By: Asterisk Team (asteriskteam) 2016-01-29 23:42:18.877-0600

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: Denis A. Valeev (dendionx) 2016-03-24 00:58:58.291-0500

I have the same problem.

By: Joshua C. Colp (jcolp) 2016-03-29 11:11:34.230-0500

Since changes went in which may have impacted this issue.  New logs with the information would be needed, as well as the specific version of PJSIP and Asterisk in use.

By: Joshua C. Colp (jcolp) 2016-04-01 07:03:26.460-0500

Asterisk is an open source project and anyone can be a developer or contribute to it. While it is technically possible to limit it is not encouraged because you DRASTICALLY limit the number of people who can help you. Removing any passwords and other information and attaching a file for all users is the preferred way.

By: Asterisk Team (asteriskteam) 2016-04-15 12:00:01.327-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines