[Home]

Summary:ASTERISK-28237: "FRACK!, Failed assertion bad magic number" happens when unsubscribe an application from an event source
Reporter:Lucas Tardioli Silveira (ltardioli)Labels:pjsip
Date Opened:2019-01-09 10:07:42.000-0600Date Closed:2021-05-26 10:34:44
Priority:MinorRegression?
Status:Closed/CompleteComponents:Core/Stasis
Versions:13.18.0 16.1.1 Frequency of
Occurrence
Constant
Related
Issues:
is duplicated byASTERISK-29409 ARI / endpoint subscription : FRACK when removing an endpoint
Environment:Attachments:( 0) ari.conf
( 1) ari-test2
( 2) core-brief.txt
( 3) core-full.txt
( 4) debug_frack1
Description:I'm facing this issue when I try to unsubscribe an application.

Scenario:
I have multiple applications connecting to my Asterisk server.
All the time that my application connects to Asterisk I send a subscribe message like:

POST /ari/applications/my-app1/subscription?eventSource=endpoint:PJSIP
(for the other applications I can have my-app2, my-app3 and so on...)

And when I disconnect to Asterisk I send:

DELETE /ari/applications/my-app1/subscription?eventSource=endpoint:PJSIP

The problem happens when I have at least two applications running and one of them needs to disconnect. So, at the moment that I send the message to unsubscribe the "FRACK" happens.

That was happening on asterisk 13.18-cert2 then I updated it to the lastest version, 16.1.1 but it is still happening.

Comments:By: Asterisk Team (asteriskteam) 2019-01-09 10:07:43.646-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Lucas Tardioli Silveira (ltardioli) 2019-01-09 10:11:37.585-0600

I've followed these instructions to get more info: https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

By: Kevin Harwell (kharwell) 2019-01-15 15:54:28.695-0600

Was able to replicate, and collect a better backtrace. Steps to reproduce:

1. Start Asterisk with appropriate configuration (attached the _ari.conf_ I used)
2. Start two applications named _foo_ and _bar_ using something like [wscat.py|https://github.com/leedm777/wscat-py/blob/master/wscat/wscat.py] or similar. Be sure to start application _foo_ first, then _bar_ second (seemed like it wouldn't crash the other way around).
3. Execute the attached [^ari-test2] script

By: Evgenios Muratidis (Evgenios) 2021-03-30 04:39:51.371-0500

Hello everybody. Most likely I found how to fix the current problem in this version. The main problem causing the exception to be thrown is related to several situations:
1. In file res/stasis/messaging.c, function "messaging_app_subscribe_endpoint" the macro "RAII_VAR" is used for local variable "sub" of type "message_subscription", therefore, the decrement of the reference counter to the created object is triggered every time this function is exited. This point is not obvious and must be taken into account.

2. Considering the above, in the same function "messaging_app_subscribe_endpoint", it is necessary that the subscription reference counter must always be equal to 2 or more (in case the subscription has more than one application subscribed to), otherwise the object (subscription) will self-destruct, when function is exited, and any subsequent access to it will throw an exception (by checking the value of the "magic" field, which must be equal to a certain static value)

3. When subscribing more than one application to a subscription, it is necessary that the reference counter to the subscription increases, otherwise, when unsubscribing from the second application, it will be found that the subscription no longer exists (it self-destructed when the first application unsubscribed from it). To solve this problem, it is proposed to add the following lines of code of the function "messaging_app_subscribe_endpoint":

...

if (AST_VECTOR_APPEND(&sub->applications, tuple)) {
ast_debug(3, "TST: app name: %s failed to add to subscription \n", app_name);
ao2_ref(tuple, -1);
ao2_unlock(sub);
return -1;
}

// <= added lines begin
/* In case, subscription has more than 1 application subscribed to, we need up the ref_counter of the subscription */
if (AST_VECTOR_SIZE(&sub->applications) > 1) {
        ao2_bump(sub);
}
// <= added lines end


ao2_unlock(sub);

...

4. This approach will work for application subscriptions to technology subscriptions. However, when you subscribe to a specific resource, the logic of the code changes (the subscription is searched not in the list of technology subscriptions, but in a special container). When checking it, it turned out that with each subscription of a new application to a previously created subscription to a specific resource, a new subscription was created (with the same parameters), and the previous one remained in memory. And when unsubscribing any application from a previously created subscription (and provided that only one application was signed in the subscription), the subscription was not destroyed. This leads to a memory leak.The problem turned out to be an alleged typo when passing a parameter to a macro "ao2_find". To solve this problem, it is suggested to replace the following line of code in file res/stasis/messaging.c, function "get_subscription":

...

if (endpoint && !ast_strlen_zero(ast_endpoint_get_resource(endpoint))) {
    // <= replace line start
    // sub = ao2_find(endpoint_subscriptions, endpoint, OBJ_SEARCH_KEY); // we need to pass a key, not object
    sub = ao2_find(endpoint_subscriptions, ast_endpoint_get_id(endpoint), OBJ_SEARCH_KEY);
// <= replace line end
} else {
    int i;
...

5. Also in the same function ("get_subscription") it was found that when trying to subscribe an application to a subscription with a different technology, the first created one will return, which in the future also leads to an exception. To solve this problem, it is proposed to zero the pointer to the subscription object at the end of the loop:

...

if (endpoint && !ast_strlen_zero(ast_endpoint_get_resource(endpoint))) {
// <= replace line start
       // sub = ao2_find(endpoint_subscriptions, endpoint, OBJ_SEARCH_KEY); // we need to pass a key, not object
sub = ao2_find(endpoint_subscriptions, ast_endpoint_get_id(endpoint), OBJ_SEARCH_KEY);
// <= replace line end
} else {
int i;

ast_rwlock_rdlock(&tech_subscriptions_lock);
for (i = 0; i < AST_VECTOR_SIZE(&tech_subscriptions); i++) {
sub = AST_VECTOR_GET(&tech_subscriptions, i);

if (sub && !strcmp(sub->token, endpoint ? ast_endpoint_get_tech(endpoint) : TECH_WILDCARD)) {
ao2_bump(sub);
break;
}
               // <= add line start
sub = NULL; /* We need to reset pointer at this line */
               // <= add line end
}
ast_rwlock_unlock(&tech_subscriptions_lock);
}

...

6. After fixing the previous problems, it became possible to subscribe several applications to the same subscription of a specific resource. However, when trying to unsubscribe from the subscription of a specific resource of the last application, an exception was thrown. The problem was that when the last application unsubscribed from a subscription to a specific resource, when the macro "ao2_unlink" was called, the subscription's reference counter decreased by 1 (2-1 = 1). Then the subscription reference counter was decremented by 1 (1-1 = 0 = self-destruction of the object) again. And then, when exiting the function, the macro "RAII_VAR" tries to decrease the reference counter of the (no longer existing) subscription by 1 again, which leads to an exception.To solve this problem, it is proposed to add the following line of code to the function "messaging_app_unsubscribe_endpoint":

...

AST_VECTOR_REMOVE_CMP_UNORDERED(&sub->applications, app_name, application_tuple_cmp, ao2_cleanup);
if (AST_VECTOR_SIZE(&sub->applications) == 0) {
if (endpoint && !ast_strlen_zero(ast_endpoint_get_resource(endpoint))) {
               // <= added comment start
/* NOTE: The follow call will decrease ref_counter to the subscription */
               // <= added comment end
ao2_unlink(endpoint_subscriptions, sub);
               // <= added lines start
/* NOTE: Because of above call will decrease ref_counter to the subscription - we'll up it again! */
ao2_bump(sub);
               // <= added lines end
} else {
ast_rwlock_wrlock(&tech_subscriptions_lock);
AST_VECTOR_REMOVE_CMP_UNORDERED(&tech_subscriptions, endpoint ? ast_endpoint_get_id(endpoint) : TECH_WILDCARD,
messaging_subscription_cmp, AST_VECTOR_ELEM_CLEANUP_NOOP);
ast_rwlock_unlock(&tech_subscriptions_lock);
}
}
ao2_unlock(sub);

// <= added comment start
/* NOTE: Now it is normaly to decrease ref_counter to the subscription */
ao2_ref(sub, -1);
// <= added comment end

ast_debug(3, "App '%s' unsubscribed to messages from endpoint '%s'\n", app_name, endpoint ? ast_endpoint_get_id(endpoint) : "-- ALL --");
ast_test_suite_event_notify("StasisMessagingSubscription", "SubState: Unsubscribed\r\nAppName: %s\r\nToken: %s\r\n",
app_name, endpoint ? ast_endpoint_get_id(endpoint) : "ALL");

...

After all the previous fixes, there were no problems with subscriptions and applications. Good day

By: George Joseph (gjoseph) 2021-03-30 06:50:31.150-0500

Hi,

If you sign a license agreement you can submit a patch yourself.
Read the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process] page on the Wiki then click the [Sign a License Agreement|https://issues.asterisk.org/jira/secure/DigiumLicense.jspa] link at the top of the page to get started.


By: Evgenios Muratidis (Evgenios) 2021-04-13 03:38:56.164-0500

Hi, George
I just applied the patch on your recomendation and it is passed successfully by automatic system

By: Evgenios Muratidis (Evgenios) 2021-05-24 08:37:38.399-0500

Hi George! Could you, please, lookup at Gerrit reviews of this issue. I think the current reviewer cannot understand with my suggested changes, with applied patch. Please, help!

By: George Joseph (gjoseph) 2021-05-24 08:46:17.224-0500

OK, I'll look.


By: Evgenios Muratidis (Evgenios) 2021-05-24 09:44:52.106-0500

George, we (me and the gerrit reviewer) totally found a way to figure out what kind of changes will be applied. Thanks

By: Friendly Automation (friendly-automation) 2021-05-26 10:34:45.711-0500

Change 15762 merged by Friendly Automation:
stasis: Fix "FRACK!, Failed assertion bad magic number" when unsubscribing

[https://gerrit.asterisk.org/c/asterisk/+/15762|https://gerrit.asterisk.org/c/asterisk/+/15762]

By: Friendly Automation (friendly-automation) 2021-05-26 11:10:17.986-0500

Change 15931 merged by George Joseph:
stasis: Fix "FRACK!, Failed assertion bad magic number" when unsubscribing

[https://gerrit.asterisk.org/c/asterisk/+/15931|https://gerrit.asterisk.org/c/asterisk/+/15931]

By: Friendly Automation (friendly-automation) 2021-05-26 11:14:02.496-0500

Change 15932 merged by George Joseph:
stasis: Fix "FRACK!, Failed assertion bad magic number" when unsubscribing

[https://gerrit.asterisk.org/c/asterisk/+/15932|https://gerrit.asterisk.org/c/asterisk/+/15932]