[Home]

Summary:ASTERISK-28643: Deadlock, possibly in Parking, maybe in combination with AMI status messages.
Reporter:Steve Sether (stevesether)Labels:
Date Opened:2019-12-02 14:35:43.000-0600Date Closed:
Priority:CriticalRegression?
Status:Open/NewComponents:PBX/General
Versions:16.6.1 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:Centos 6.10Attachments:( 0) core_show_locks-2019-11-14-crash1.txt
( 1) full-asterisk-crash.gz
Description:Asterisk creates a deadlock, and stops responding to any SIP messages when I park a call. No new incoming calls are accepted, no outgoing calls are accepted, etc.  Some initial investigation using gdb and the core file show this might indicate it's related to generating STATUS messages from the AMI.  This happened when only one or maybe two calls were on the system.  Looking through the lock file it looks like it's a simple deadlock between two threads, each of which holing locks the other needs.   One of which is in parking code.

I was able to reproduce it initially a couple times, (and got some debug information out of it, including from a version where core show locks was compiled in).  For whatever reason, I can't seem to reproduce it again, even under the same conditions.

Comments:By: Asterisk Team (asteriskteam) 2019-12-02 14:35:44.763-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Steve Sether (stevesether) 2019-12-02 14:37:11.734-0600

full debug logs, and output from core show locks.

By: Steve Sether (stevesether) 2019-12-02 14:41:06.381-0600

I also have a core dump that I created by sending a SIGSEV to asterisk, but it's a 65MB compressed .gz file, which isn't allowed.  I can provide the file if you can provide me a way to send it.

By: George Joseph (gjoseph) 2019-12-02 16:02:48.316-0600

Do you have a dropbox account or something?  If not, then no worries.  We'll see how far we get with what was provided.  Is the core-show-locks output from the same incident as full-asterisk-crash.gz?


By: Steve Sether (stevesether) 2019-12-02 16:10:47.897-0600

Yes, the core show locks and the full-asterisk-crash are from the same incident.  I don't have a dropbox account, but I did upload it to my google drive.  I shared it with your email address

By: George Joseph (gjoseph) 2019-12-02 16:17:36.499-0600

I got it but without the binaries I can't do much with it.
Can you run the following on the machine that produced the coredump?
{noformat}
$ sudo /var/lib/asterisk/scripts/ast_coredumper --tarball-coredumps --no-default-search /tmp/core.usi-vf13-mtka.ravon.net-2019-11-14T15_20_29-0600
{noformat}

This will tarball up the coredump, the text files and the asterisk binaries in use.

Now I'll have to see if we've still got a CentOS 6 system hanging around. :)




By: Steve Sether (stevesether) 2019-12-02 16:59:51.848-0600

We actually produce full RPMs, so I can just send you the RPMs we created for this run which might be more helpful.  Otherwise I can give you the binaries from that command.

By: George Joseph (gjoseph) 2019-12-02 20:21:15.185-0600

Please use the command.  The resulting tarball is what we prefer since we all know how to deal with it.


By: Steve Sether (stevesether) 2019-12-03 09:47:39.806-0600

Ok, I ran the script, and the file is now shared with you.

By: George Joseph (gjoseph) 2019-12-03 09:55:08.176-0600

Got it, thanks!  I'll try and take a look later today.

By: Steve Sether (stevesether) 2019-12-04 09:44:25.679-0600

It looks like you've found this is a known issue, already in your internal tracking system.  

Does that mean it'll be fixed in a future release?  Is there any expectation on when?  Also, what conditions will this happen under?  Are there any ways to mitigate the problem?

We'd like to put Asterisk 16 in production, but are rather leery of doing so with this bug since it locks up SIP, and requires a reboot to fix.

By: Steve Sether (stevesether) 2019-12-09 14:39:43.841-0600

I don't mean to be a pest, but is there any update on this with regard to what I mentioned in the above comments?  It's a bit of a critical path item for us.

By: Joshua C. Colp (jcolp) 2019-12-09 14:48:11.430-0600

The issue was cloned into internal tracking. There is no time frame on when this will be resolved, and any updates will be provided here.