[Home]

Summary:ASTERISK-27617: Frack (crash), excessive refcount during Jitterbuffer operation
Reporter:Colin (dixoncb)Labels:pjsip
Date Opened:2018-01-24 07:25:12.000-0600Date Closed:2020-01-14 11:13:54.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/Jitterbuffer
Versions:14.6.0 Frequency of
Occurrence
One Time
Related
Issues:
Environment:Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-109-generic x86_64) Attachments:( 0) CoreDump-brief.txt
( 1) CoreDump-full.txt
( 2) CoreDump-thread1.txt
( 3) jitterbuffer_frack.txt
Description:During an __ast_read operation on a pushed frame (in abstract_jb::hook_event_cb()), the reference count on the ast_frame_subclass resulted in a FRACK, and system crash.

The referenced field which triggered the failed assert was out->subclass.format - a reference to the Asterisk media format. Bumping this causes the EXCESSIVE_REF_COUNT to be exceeded.

This instance of Asterisk was participating in a Stasis application, with around 28 callers. At the time of the crash around 20 were muted, but no bridge or channel operations were being carried out by the Stasis application.

Core dump included as text file.
Comments:By: Asterisk Team (asteriskteam) 2018-01-24 07:25:13.377-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Colin (dixoncb) 2018-01-24 07:26:45.633-0600

Core dump extract.

By: George Joseph (gjoseph) 2018-01-24 07:48:02.458-0600

Thank you for the crash report. However, we need more information to investigate the crash. Please provide:

1. A backtrace generated from a core dump using the instructions provided on the Asterisk wiki [1].
2. Specific steps taken that lead to the crash.
3. All configuration information necesary to reproduce the crash.

Thanks!

[1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

Colin,

Can you get the full backtrace for us using these instructions?

Also is this something you can reproduce?


By: Colin (dixoncb) 2018-01-24 08:30:49.159-0600

We complied DONT_OPTIMIZE and BETTER_BACKTRACES, but not DEBUG_THREADS nor MALLOC_DEBUG because of performance considerations. I hope the enclosed files will give you the information you requested.

As I said above, the crash came out of a 'clear blue sky', because we weren't carrying out any Stasis operations at the time (I can send you logs for our Stasis application if you wish, which tracks all outgoing commands and incoming notifications for the Asterisk box). We've been running with the same configuration for six months and haven't had a crash like this. It's not something we can reproduce (nor would we want to as this is a production system).

The nature of the unprovoked FRACK (unprovoked at least by our Stasis app) led me to suspect the jitterbuffer, and tracing through the code seems to confirm this, as the reference count is exceeded when the jitterbuffer code references the media type of the frame. We don't have a record of the specific codecs being used, but this is from our pjsip.conf:

allow = alaw
allow = ulaw
allow = gsm



By: George Joseph (gjoseph) 2018-01-24 08:57:38.976-0600

Thanks.  Hang on to that raw coredump for a bit.  I may ask you to generate something else from it.


By: George Joseph (gjoseph) 2018-01-24 11:08:55.769-0600

It looks like ASTERISK-27340, ASTERISK-27238, or ASTERISK-27412 may be root causes of your issue and they were all fixed in 13.19 and 15.2.   Since 14.x is no longer receiving non-security fixes you'd have to use one of the 2 supported versions to get the fixes.


By: Colin (dixoncb) 2018-01-26 04:29:30.478-0600

Thank you. We will do the upgrade to 15.2 or later. Perhaps you could explain the connection between the fixed issues and ours, which seems specifically to be caused by an incremented reference count on the field referring to media format of a frame used by the jitterbuffer. Perhaps one of those issues has produced a situation where there is an excessive reference count and the jitterbuffer media field read was "the straw that broke the camel's back" - only time will tell. I'd like to understand the causal connection, but I think that requires a better knowledge of frame memory management than I currently have.

By: Colin (dixoncb) 2018-01-26 05:47:42.931-0600

Upgrade done. If you can suggest any test that we might do to check that our issue is fixed, please let me know.

By: George Joseph (gjoseph) 2018-01-26 06:43:04.339-0600

Maybe [~rmudgett] can give you a better explanation.
In the mean time, let's see what happens.


By: Asterisk Team (asteriskteam) 2018-02-09 12:00:02.097-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines