[Home]

Summary:ASTERISK-29296: ari: Bridge is partially invalid when it shouldn't be
Reporter:BJ Weschke (bweschke)Labels:
Date Opened:2021-02-16 15:37:29.000-0600Date Closed:
Priority:MinorRegression?
Status:Open/NewComponents:Resources/res_stasis
Versions:18.2.0 18.4.0 Frequency of
Occurrence
One Time
Related
Issues:
is related toASTERISK-29187 stasis.c: FRACK!, Failed assertion bad magic number 0x0 for object in publish_msg
Environment:Ubuntu 18Attachments:( 0) asterisk-18-2-frack-core-dump-20210708.tar.bz2
( 1) ASTERISK-29296-full.txt
( 2) asterisk-frack.log.bz2
( 3) full.20210708.bz2
Description:Upon trying to add a channel to a bridge while testing Asterisk 18.2 in our QA environment, we started seeing these messages (see in attached log beginning at 14:09:42) in the full log, and after that point, anything related to the channel or the bridge in question seems to stream these errors into the log. The testers than gracefully shut down the Asterisk instance to clear the issue. We've not been able to reproduce this by doing the same use case that caused the issue in the first place.
Comments:By: Asterisk Team (asteriskteam) 2021-02-16 15:37:30.285-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

Please note that by submitting data, code, or documentation to Sangoma through JIRA, you accept the Terms of Use present at [https://www.asterisk.org/terms-of-use/|https://www.asterisk.org/terms-of-use/].

By: BJ Weschke (bweschke) 2021-02-16 15:40:31.011-0600

Log file with errors

By: Kevin Harwell (kharwell) 2021-02-16 18:09:30.127-0600

Please explain the steps involved in the scenario that lead to the issue. Include ARI commands as well.

Also please post relevant dialplan, and _pjsip.conf_ configurations involved.

By: Kevin Harwell (kharwell) 2021-02-16 18:11:14.717-0600

If this happens again please execute the {{ast_coredumper}} script [1] against the running Asterisk, and post the resulting backtraces and other files here.

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-Runningast_coredumperfordeadlocks,taskprocessorbackups,etc.

By: Asterisk Team (asteriskteam) 2021-03-03 12:00:00.673-0600

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

By: Asterisk Team (asteriskteam) 2021-07-07 13:22:38.864-0500

This issue has been reopened as a result of your commenting on it as the reporter. It will be triaged once again as applicable.

By: BJ Weschke (bweschke) 2021-07-07 13:24:49.841-0500

@kharwell This happened again today and we are able to capture a core dump while the instance was still running. It is a 8.4Mb bz2 tarball. Can you let me know where you'd like me to upload this to?

By: BJ Weschke (bweschke) 2021-07-07 13:34:05.981-0500

scratch that. It would appear that one of our support folks restarted the server instance prior to us getting a capture of the core dump of the running process that was unhealthy. It happened on June 22 and again today with 18.4, so presumably, it'll happen again and hopefully we can actually get a useful core dump the next time around. Please let us know where you'd like for us to send it up to once we get it.

By: Kevin Harwell (kharwell) 2021-07-07 13:44:34.830-0500

When you run the {{ast_coredumper}} script [[1|https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-ast_coredumper]] it'll output several files. For example:
{noformat}
Processing core
Creating core-thread1.txt
Creating core-brief.txt
Creating core-info.txt
Creating core-full.txt
Creating core-locks.txt
{noformat}
These *.txt files should be small enough to attach to this issue. For now those files should be sufficient unless further indicated.

By: BJ Weschke (bweschke) 2021-07-08 16:50:38.211-0500

We were "fortunate" enough to have a crash/core dump occur on one of our 18.2 QA instances this afternoon. Full log leading up attached.

By: Joshua C. Colp (jcolp) 2021-07-09 09:03:20.267-0500

Can you attach the full log leading up to it, from the point at which the bridge is created so the lifetime of the topic can be followed?

By: BJ Weschke (bweschke) 2021-07-09 11:50:51.303-0500

attached.

By: George Joseph (gjoseph) 2021-07-12 08:15:29.102-0500

Do you still have the raw coredump file?  If so, can you run ast_coredumper again as follows:
{{$ sudo ast_coredumper --tarball-coredumps --no-default-search <path_to_coredump>}}

The result will be fairly large so could you host it somewhere (DropBox, Google Drive, Live, etc) and send the link to asteriskteam @ sangoma.com with the subject "ASTERISK-29296: Coredump Tarball"

Thanks.


By: BJ Weschke (bweschke) 2021-07-12 09:56:35.515-0500

I'm sorry. I don't have it any longer. This particular host is an AWS EC2 instance that recycles itself every evening which results in /tmp getting cleared out.

By: George Joseph (gjoseph) 2021-07-12 12:19:20.302-0500

No worries, thanks.  If issue does happen again and you an catch it, use the ast_coredumper command in my last comment to capture everything.

By: BJ Weschke (bweschke) 2021-08-09 15:12:34.711-0500

Sent a couple of tarballs over from an incident that occurred yesterday afternoon on 18.4 along with the requested ast_coredump run mentioned above.

By: Joshua C. Colp (jcolp) 2021-08-09 15:39:03.628-0500

The files have been downloaded and associated with the issue. You can remove them from Dropbox.