[Home]

Summary:ASTERISK-27262: res_ari: Leaking eventfds when using ARI Dial
Reporter:Thomas Wirum Larsen (wirum)Labels:
Date Opened:2017-09-08 07:05:34Date Closed:2020-01-14 11:13:46.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_ari
Versions:14.6.1 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Ubuntu 17.04 Attachments:
Description:When using ARI4Java bindings and their Dial() method, Asterisk leaks (fails to close?) an eventfd, leading to a crash once the underlying OS' ulimit is reached.

This behavior is only seen using dial(). When using originate(), asterisk behaves correctly.

This bug is reproducible on Asterisk 13x and 14x using Ari4Java 0.4.3 and 0.4.4 and ARI protocol version 1.7 through 1.10. We have been unsuccessful in testing 2.0.0 or 3.0.0 protocol variants, thus we cannot guarantee this behavior exists in the 2x and 3x protocol branches.

While the actual bug may be in Ari4Java, it is arguable that a REST interface should not be allowed to cause the host to leak file descriptors as this a potential DOS attack (or worse). This is thus clearly a bug in Asterisks implementation of ARI and requires immediate attention.

We have filed an issue with Ari4Java as well.
Comments:By: Asterisk Team (asteriskteam) 2017-09-08 07:05:36.597-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Joshua C. Colp (jcolp) 2017-09-08 07:28:12.908-0500

Thank you for taking the time to report this bug and helping to make Asterisk better. Unfortunately, we cannot work on this bug because your description did not include enough information. Please read over the Asterisk Issue Guidelines [1] which discusses the information necessary for your issue to be resolved and the format that information needs to be in. We would be grateful if you would then provide a more complete description of the problem. At a minimum, we need:

1. The specific steps or actions you took that caused you to encounter the problem.
2. The behavior you expected and the location of documentation that led you to that expectation.
3. The behavior you actually encountered.

To demonstrate the issue in detail, please include Asterisk log files generated per the instructions on the wiki [2]. If applicable, please ensure that protocol-level trace debugging is enabled, e.g., 'sip set debug on' if the issue involves chan_sip, and configuration information such as dialplan and channel configuration.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

[2] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Thomas Wirum Larsen (wirum) 2017-09-08 07:57:26.193-0500

Added an strace-log and /var/log/asterisk/messages
Also added sip.conf and extensions.conf
https://github.com/zicada/bug-ASTERISK-27262

***************************************************************
Relevant code/documentation in ARI4Java:

       ari.channels().create(
               destination,                        // String endpoint
               "tsip",                             // String app
               "",                                 // String appargs
               secondChId,                             // String channelId
               "",                                    // String otherChannelId
               "",                                    // String originator
               "");                                // String formats

ari.channels().dial( secondChId, destination, 30 );
****************************************************************

When a call comes into our stasis application, we answer it and place a new outgoing call using "ari.channels().create" and ari.channels().dial.
This alone will cause a hanging file descriptor "eventfd".

When the outbound call is answered, we join the calls with ari.bridges().addChannel on each channel. This will create a second hanging file descriptor.

The commands used are:

ari.channels().create( destination, "tsip", "", secondChId, "", "", "");
ari.channels().dial( secondChId, destination, 30 );

ari.bridges().create( "mixing,dtmf_events", bridgeId, bridgeId );
ari.bridges().addChannel( bridgeId, firstChId, "" );

************************************************************************************
Expected outcome is for
'lsof -p <pid of asterisk> | grep eventfd | wc -l'
to not grow each time dial() is invoked until it hits the OS's ulimit and crashes.

By: Joshua C. Colp (jcolp) 2017-09-08 08:04:19.626-0500

Are the channels being hung up - be it by what has been called or by yourself? Are you ever destroying the bridge you've created?

By: Joshua C. Colp (jcolp) 2017-09-08 08:11:12.246-0500

Does "core show channels" and "bridge show all" in the CLI show a ton of channels and bridges?

By: Thomas Wirum Larsen (wirum) 2017-09-08 08:14:41.404-0500

One channel is hung up by A or C party, the application hangs up the other leg. (So no need for RemoveChannel() from bridge)
The bridges are created initially and are kept as a pool in our app.

This might be related:
ASTERISK-26718

By: Thomas Wirum Larsen (wirum) 2017-09-08 08:21:50.373-0500

core show channels does not equal the number of stuck eventfd's but shows the expected result.
bridge show all shows the 10 bridges we create and keep track of in our pool,- as expected.

Put differently, everything looks good and works as expected until we crash with too many open files. It was only after some troubleshooting that we noticed the leaking fd's.


By: Joshua C. Colp (jcolp) 2017-09-11 05:54:02.489-0500

Can you confirm for sure this also impacts 13? I ask because 13 has no "dial" method, while 14 and above does.

By: Thomas Wirum Larsen (wirum) 2017-09-13 09:24:35.207-0500

We may have misread logs relating to 13 while testing / troubleshooting.
Please ignore this impacting 13.

By: Rusty Newton (rnewton) 2017-09-14 16:21:20.259-0500

Please attach all of your debug and configuration files directly to this issue as per the guidelines: https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

Make sure to include a backtrace collected as described here:
https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace

By: Asterisk Team (asteriskteam) 2017-09-29 12:00:02.265-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines