[Home]

Summary:ASTERISK-27909: cdr: Deadlock with submit_scheduled_batch and submit_unscheduled_batch
Reporter:Denis Lebedev (coredumped)Labels:
Date Opened:2018-06-08 07:39:36Date Closed:2018-07-02 06:41:36
Priority:MinorRegression?
Status:Closed/CompleteComponents:CDR/General
Versions:15.4.0 Frequency of
Occurrence
Occasional
Related
Issues:
is duplicated byASTERISK-28108 Deadlock in publish_msg (stasis.c)
Environment:CentOS Linux 7 (Core) Linux *** 3.10.0-862.2.3.el7.x86_64 #1 SMP Wed May 9 18:05:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Asterisk versions: 15.4.0Attachments:( 0) gdb.txt
Description:We faced with deadlock in cdr.c in functions:
{noformat}
static int submit_scheduled_batch(const void *data)

static void submit_unscheduled_batch(void)
{noformat}

Previously there was another deadlock which was fixed in ASTERISK-21162. That task added pretty the same mutex {{cdr_sched_lock}} on which asterisk is stucked in deadlock in consequent versions.

The problem is quite rare so it's almost impossible to reproduce it under some artificial circumstances.

Symptoms:
* asterisk stops to flush cdr items into DB
* pings to cdr are performed in 5s (afaiu, they are timed out)
{noformat}
*CLI> core ping taskprocessor subm:cdr_engine-00000003

pinging subm:cdr_engine-00000003 ...
subm:cdr_engine-00000003 ping time: 5.000129 sec
{noformat}
* asterisk begins to "eat" memory on the host under load
* but proceeds serving incoming calls traffic

Also asterisk can't be restarted from cli.
Comments:By: Asterisk Team (asteriskteam) 2018-06-08 07:39:38.325-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Denis Lebedev (coredumped) 2018-06-08 07:43:25.291-0500

Threads states attached in gdb.txt

By: Matthew Fredrickson (mattf) 2018-06-21 13:13:27.814-0500

Hey Denis, I just pushed up a review for this issue that I think should resolve the deadlock.  Do you think you can try it out?  It's at https://gerrit.asterisk.org/#/c/9270/

Thanks,
Matthew Fredrickson

By: Denis Lebedev (coredumped) 2018-06-26 10:10:10.155-0500

Matthew, hi!

Thanks for the fix! Unfortunately we don't have some sane environment for call traffic testing.
As I understand you'll (possibly) perform some changes after review by @Richard Mudgett.
Could you please clarify which version (tag name) will contain this fix (rough estimation is enough)?

By: Richard Mudgett (rmudgett) 2018-06-26 11:05:47.773-0500

The fix will go into the 13, 15, and master branches.  The next 15 release will be 15.5.0 which is due to be cut in a couple weeks.  If the patch is merged before the next release is cut then it will be in that release.  Otherwise it will be in the one following.

By: Denis Lebedev (coredumped) 2018-06-26 15:31:44.294-0500

Thanks a lot guys! Well done!
We waiting for the fix in 15.5.0 :)

By: Friendly Automation (friendly-automation) 2018-07-02 06:41:38.354-0500

Change 9316 merged by Jenkins2:
main/cdr.c: Alleviate CDR deadlock

[https://gerrit.asterisk.org/9316|https://gerrit.asterisk.org/9316]

By: Friendly Automation (friendly-automation) 2018-07-02 06:49:54.428-0500

Change 9270 merged by Jenkins2:
main/cdr.c: Alleviate CDR deadlock

[https://gerrit.asterisk.org/9270|https://gerrit.asterisk.org/9270]

By: Friendly Automation (friendly-automation) 2018-07-02 06:55:32.147-0500

Change 9317 merged by Joshua Colp:
main/cdr.c: Alleviate CDR deadlock

[https://gerrit.asterisk.org/9317|https://gerrit.asterisk.org/9317]