Summary: | ASTERISK-27617: Frack (crash), excessive refcount during Jitterbuffer operation | ||
Reporter: | Colin (dixoncb) | Labels: | pjsip |
Date Opened: | 2018-01-24 07:25:12.000-0600 | Date Closed: | 2020-01-14 11:13:54.000-0600 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Core/Jitterbuffer |
Versions: | 14.6.0 | Frequency of Occurrence | One Time |
Related Issues: | |||
Environment: | Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-109-generic x86_64) | Attachments: | ( 0) CoreDump-brief.txt ( 1) CoreDump-full.txt ( 2) CoreDump-thread1.txt ( 3) jitterbuffer_frack.txt |
Description: | During an __ast_read operation on a pushed frame (in abstract_jb::hook_event_cb()), the reference count on the ast_frame_subclass resulted in a FRACK, and system crash.
The referenced field which triggered the failed assert was out->subclass.format - a reference to the Asterisk media format. Bumping this causes the EXCESSIVE_REF_COUNT to be exceeded. This instance of Asterisk was participating in a Stasis application, with around 28 callers. At the time of the crash around 20 were muted, but no bridge or channel operations were being carried out by the Stasis application. Core dump included as text file. | ||
Comments: | By: Asterisk Team (asteriskteam) 2018-01-24 07:25:13.377-0600 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Colin (dixoncb) 2018-01-24 07:26:45.633-0600 Core dump extract. By: George Joseph (gjoseph) 2018-01-24 07:48:02.458-0600 Thank you for the crash report. However, we need more information to investigate the crash. Please provide: 1. A backtrace generated from a core dump using the instructions provided on the Asterisk wiki [1]. 2. Specific steps taken that lead to the crash. 3. All configuration information necesary to reproduce the crash. Thanks! [1]: https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace Colin, Can you get the full backtrace for us using these instructions? Also is this something you can reproduce? By: Colin (dixoncb) 2018-01-24 08:30:49.159-0600 We complied DONT_OPTIMIZE and BETTER_BACKTRACES, but not DEBUG_THREADS nor MALLOC_DEBUG because of performance considerations. I hope the enclosed files will give you the information you requested. As I said above, the crash came out of a 'clear blue sky', because we weren't carrying out any Stasis operations at the time (I can send you logs for our Stasis application if you wish, which tracks all outgoing commands and incoming notifications for the Asterisk box). We've been running with the same configuration for six months and haven't had a crash like this. It's not something we can reproduce (nor would we want to as this is a production system). The nature of the unprovoked FRACK (unprovoked at least by our Stasis app) led me to suspect the jitterbuffer, and tracing through the code seems to confirm this, as the reference count is exceeded when the jitterbuffer code references the media type of the frame. We don't have a record of the specific codecs being used, but this is from our pjsip.conf: allow = alaw allow = ulaw allow = gsm By: George Joseph (gjoseph) 2018-01-24 08:57:38.976-0600 Thanks. Hang on to that raw coredump for a bit. I may ask you to generate something else from it. By: George Joseph (gjoseph) 2018-01-24 11:08:55.769-0600 It looks like ASTERISK-27340, ASTERISK-27238, or ASTERISK-27412 may be root causes of your issue and they were all fixed in 13.19 and 15.2. Since 14.x is no longer receiving non-security fixes you'd have to use one of the 2 supported versions to get the fixes. By: Colin (dixoncb) 2018-01-26 04:29:30.478-0600 Thank you. We will do the upgrade to 15.2 or later. Perhaps you could explain the connection between the fixed issues and ours, which seems specifically to be caused by an incremented reference count on the field referring to media format of a frame used by the jitterbuffer. Perhaps one of those issues has produced a situation where there is an excessive reference count and the jitterbuffer media field read was "the straw that broke the camel's back" - only time will tell. I'd like to understand the causal connection, but I think that requires a better knowledge of frame memory management than I currently have. By: Colin (dixoncb) 2018-01-26 05:47:42.931-0600 Upgrade done. If you can suggest any test that we might do to check that our issue is fixed, please let me know. By: George Joseph (gjoseph) 2018-01-26 06:43:04.339-0600 Maybe [~rmudgett] can give you a better explanation. In the mean time, let's see what happens. By: Asterisk Team (asteriskteam) 2018-02-09 12:00:02.097-0600 Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1]. [1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines |