[Home]

Summary:ASTERISK-27121: res_pjsip_mwi: Memory leak on reload
Reporter:Sergej Kasumovic (sergej)Labels:pjsip
Date Opened:2017-07-11 02:19:10Date Closed:2019-07-12 09:23:35
Priority:MajorRegression?
Status:Closed/CompleteComponents:Resources/res_pjsip_mwi
Versions:13.17.0 Frequency of
Occurrence
Constant
Related
Issues:
is related toASTERISK-28222 Regression: MWI polling no longer works
is related toASTERISK-28443 app_voicemail: remove dependency on stasis cache
Environment:Attachments:( 0) refs.txt
Description:There seems to be a memory leak around stasis_cache which can be easily replicated with reload of res_pjsip_mwi module. I have tracked it down with REF_DEBUG config option. This is on git version of Asterisk 13 branch (cad74cdd8f).

My test case includes around 30 PJSIP endpoints with defined mailboxes option.
The number of endpoints will affect the impact of memory leak due to number of active MWI subscriptions.

Basically, here is the replicable scenario:
Fetch RSS of Asterisk process
Perform 200 reloads of res_pjsip_mwi
Fetch RSS of Asterisk process again

{noformat}
ps -o rss -p $(pidof -s asterisk)
 RSS
69048

for _ in $(seq 1 200); do asterisk -rx 'module reload res_pjsip_mwi.so' >/dev/null; done

ps -o rss -p $(pidof -s asterisk)
 RSS
90460
{noformat}

By repeating the for loop, RSS of Asterisk process simply grows.

The problem seems to be around subscribe and unsubscribe message that got cached in stasis_cache. Each subscribe message is cached. However, on unsubscribe message, subscribe message is replaced with unsubscribe message and is never freed as the id of new subscription messages changes.

I cannot know for sure what is an ideal fix for this, but so far I was able to stop it by doing one of the following:
1) Do not cache stasis_subscription_change_type() messages in stasis_cache: caching_topic_exec. It seems app_voicemail is reading these messages from cache when starting polling thread, so not sure if this is ideal solution.
2) On unsubscribe message received in caching_topic_exec, make sure to remove subscription and cleanup the received message
3) Add another stasis_cache_clear_type() message in main/stasis.c for type stasis_subscription_change_type when doing unsubscribe

When such fix is performed RSS does grow a bit, but then it slows down considerably.
Comments:By: Asterisk Team (asteriskteam) 2017-07-11 02:19:11.321-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Benjamin Keith Ford (bford) 2017-07-12 14:16:19.129-0500

[~sergej], I do see RSS increase when reloading res_pjsip_mwi.so, but to a smaller degree. Can you provide your ref count log? There is a wiki page [1] with instructions how to obtain a ref count log.

[1] https://wiki.asterisk.org/wiki/display/AST/Reference+Count+Debugging

By: Sergej Kasumovic (sergej) 2017-07-13 01:39:51.510-0500

Sure, here is the snippet output.
So unmodified version:
{noformat}
==== Leaked Object 0x21a81c8 history ====
[12352] stasis.c:354 stasis_topic_create: +1 100@default - [**constructor**]
[12352] stasis.c:929 stasis_forward_all: +1  - [1]
[12352] stasis.c:498 internal_stasis_subscribe: +1  - [2]
[12352] stasis.c:969 subscription_change_alloc: +1  - [3]
[12352] stasis.c:828 publish_msg: +1  - [4]
[12352] stasis.c:838 publish_msg: -1  - [5]
[12352] stasis.c:828 publish_msg: +1  - [4]
[12352] stasis.c:838 publish_msg: -1  - [5]
— reload —
[12687] stasis.c:541 stasis_unsubscribe: +1  - [4]
[12687] stasis.c:969 subscription_change_alloc: +1  - [5] <- important step (called from stasis_unsubscribe)
[12687] stasis.c:828 publish_msg: +1  - [6]
[12687] stasis.c:954 subscription_change_dtor: -1  - [7] <- destructor for subscription_change_alloc for subscription (not unsubscribe)
[12687] stasis.c:838 publish_msg: -1  - [6]
[12687] stasis.c:541 _dtor_topic: -1  - [5]
[12687] stasis.c:409 subscription_dtor: -1  - [4]
[12687] stasis.c:498 internal_stasis_subscribe: +1  - [3]
[12687] stasis.c:969 subscription_change_alloc: +1  - [4]
[12687] stasis.c:828 publish_msg: +1  - [5]
[12687] stasis.c:838 publish_msg: -1  - [6]
— reload —
[12687] stasis.c:541 stasis_unsubscribe: +1  - [5]
[12687] stasis.c:969 subscription_change_alloc: +1  - [6]
[12687] stasis.c:828 publish_msg: +1  - [7]
[12687] stasis.c:954 subscription_change_dtor: -1  - [8]
[12687] stasis.c:838 publish_msg: -1  - [7]
[12687] stasis.c:541 _dtor_topic: -1  - [6]
[12687] stasis.c:409 subscription_dtor: -1  - [5]
[12687] stasis.c:498 internal_stasis_subscribe: +1  - [4]
[12687] stasis.c:969 subscription_change_alloc: +1  - [5]
[12687] stasis.c:828 publish_msg: +1  - [6]
[12687] stasis.c:838 publish_msg: -1  - [7]
— reload —
[12687] stasis.c:541 stasis_unsubscribe: +1  - [6]
[12687] stasis.c:969 subscription_change_alloc: +1  - [7]
[12687] stasis.c:828 publish_msg: +1  - [8]
[12687] stasis.c:954 subscription_change_dtor: -1  - [9]
[12687] stasis.c:838 publish_msg: -1  - [8]
[12687] stasis.c:541 _dtor_topic: -1  - [7]
[12687] stasis.c:409 subscription_dtor: -1  - [6]
[12687] stasis.c:498 internal_stasis_subscribe: +1  - [5]
[12687] stasis.c:969 subscription_change_alloc: +1  - [6]
[12687] stasis.c:828 publish_msg: +1  - [7]
[12687] stasis.c:838 publish_msg: -1  - [8]
— reload —
[12687] stasis.c:541 stasis_unsubscribe: +1  - [7]
[12687] stasis.c:969 subscription_change_alloc: +1  - [8]
[12687] stasis.c:828 publish_msg: +1  - [9]
[12687] stasis.c:954 subscription_change_dtor: -1  - [10]
[12687] stasis.c:838 publish_msg: -1  - [9]
[12687] stasis.c:541 _dtor_topic: -1  - [8]
[12687] stasis.c:409 subscription_dtor: -1  - [7]
[12687] stasis.c:498 internal_stasis_subscribe: +1  - [6]
[12687] stasis.c:969 subscription_change_alloc: +1  - [7]
[12687] stasis.c:828 publish_msg: +1  - [8]
[12687] stasis.c:838 publish_msg: -1  - [9]
— reload —
[12687] stasis.c:541 stasis_unsubscribe: +1  - [8]
[12687] stasis.c:969 subscription_change_alloc: +1  - [9]
[12687] stasis.c:828 publish_msg: +1  - [10]
[12687] stasis.c:954 subscription_change_dtor: -1  - [11]
[12687] stasis.c:838 publish_msg: -1  - [10]
[12687] stasis.c:541 _dtor_topic: -1  - [9]
[12687] stasis.c:409 subscription_dtor: -1  - [8]
[12687] stasis.c:498 internal_stasis_subscribe: +1  - [7]
[12687] stasis.c:969 subscription_change_alloc: +1  - [8]
[12687] stasis.c:828 publish_msg: +1  - [9]
[12687] stasis.c:838 publish_msg: -1  - [10]
— reload —
[12687] stasis.c:541 stasis_unsubscribe: +1  - [9]
[12687] stasis.c:969 subscription_change_alloc: +1  - [10]
[12687] stasis.c:828 publish_msg: +1  - [11]
[12687] stasis.c:954 subscription_change_dtor: -1  - [12]
[12687] stasis.c:838 publish_msg: -1  - [11]
[12687] stasis.c:541 _dtor_topic: -1  - [10]
[12687] stasis.c:409 subscription_dtor: -1  - [9]
[12687] stasis.c:498 internal_stasis_subscribe: +1  - [8]
[12687] stasis.c:969 subscription_change_alloc: +1  - [9]
[12687] stasis.c:828 publish_msg: +1  - [10]
[12687] stasis.c:838 publish_msg: -1  - [11]
{noformat}

As you can see reference count keeps growing, due to the fact destructor for unsubscribe part of subscription_change_alloc is not happening.

Now, if we modify stasis_cache a bit:
{noformat}
@@ -803,6 +805,11 @@ static void caching_topic_exec(void *data, struct stasis_subscription *sub,
       const struct ast_eid *msg_eid;
       const char *msg_id;

+       msg_type = stasis_message_type(message);
+       if (msg_type == stasis_subscription_change_type()) {
+               return;
+       }
+
       ast_assert(caching_topic != NULL);
       ast_assert(caching_topic->topic != NULL);
       ast_assert(caching_topic->cache != NULL);
{noformat}

Modified version:
{noformat}
==== Leaked Object 0x2197528 history ====
[18861] stasis.c:354 stasis_topic_create: +1 100@default - [**constructor**]
[18861] stasis.c:932 stasis_forward_all: +1  - [1]
[18861] stasis.c:498 internal_stasis_subscribe: +1  - [2]
[18861] stasis.c:973 subscription_change_alloc: +1  - [3]
[18861] stasis.c:831 publish_msg: +1  - [4]
[18861] stasis.c:841 publish_msg: -1  - [5]
[19003] stasis.c:958 subscription_change_dtor: -1  - [4] <- destructor for subscription_change_alloc (subscribe)
[18861] stasis.c:831 publish_msg: +1  - [3]
[18861] stasis.c:841 publish_msg: -1  - [4]
— reload —
[19015] stasis.c:541 stasis_unsubscribe: +1  - [3]
[19015] stasis.c:973 subscription_change_alloc: +1  - [4]
[19015] stasis.c:831 publish_msg: +1  - [5]
[19015] stasis.c:841 publish_msg: -1  - [6]
[19015] stasis.c:541 _dtor_topic: -1  - [5]
[19002] stasis.c:958 subscription_change_dtor: -1  - [4] <- destructor for subscription_change_alloc (unsubscribe)
[19015] stasis.c:409 subscription_dtor: -1  - [3]
[19015] stasis.c:498 internal_stasis_subscribe: +1  - [2]
[19015] stasis.c:973 subscription_change_alloc: +1  - [3]
[19015] stasis.c:831 publish_msg: +1  - [4]
[19015] stasis.c:841 publish_msg: -1  - [5]
[19002] stasis.c:958 subscription_change_dtor: -1  - [4]
— reload —
[19015] stasis.c:541 stasis_unsubscribe: +1  - [3]
[19015] stasis.c:973 subscription_change_alloc: +1  - [4]
[19015] stasis.c:831 publish_msg: +1  - [5]
[19015] stasis.c:841 publish_msg: -1  - [6]
[19015] stasis.c:541 _dtor_topic: -1  - [5]
[19074] stasis.c:958 subscription_change_dtor: -1  - [4]
[19015] stasis.c:409 subscription_dtor: -1  - [3]
[19015] stasis.c:498 internal_stasis_subscribe: +1  - [2]
[19015] stasis.c:973 subscription_change_alloc: +1  - [3]
[19015] stasis.c:831 publish_msg: +1  - [4]
[19015] stasis.c:841 publish_msg: -1  - [5]
[19002] stasis.c:958 subscription_change_dtor: -1  - [4]
— reload —
[19015] stasis.c:541 stasis_unsubscribe: +1  - [3]
[19015] stasis.c:973 subscription_change_alloc: +1  - [4]
[19015] stasis.c:831 publish_msg: +1  - [5]
[19015] stasis.c:841 publish_msg: -1  - [6]
[19015] stasis.c:541 _dtor_topic: -1  - [5]
[19002] stasis.c:958 subscription_change_dtor: -1  - [4]
[19002] stasis.c:409 subscription_dtor: -1  - [3]
[19015] stasis.c:498 internal_stasis_subscribe: +1  - [2]
[19015] stasis.c:973 subscription_change_alloc: +1  - [3]
[19015] stasis.c:831 publish_msg: +1  - [4]
[19015] stasis.c:841 publish_msg: -1  - [5]
[19002] stasis.c:958 subscription_change_dtor: -1  - [4]
— reload —
[19015] stasis.c:541 stasis_unsubscribe: +1  - [3]
[19015] stasis.c:973 subscription_change_alloc: +1  - [4]
[19015] stasis.c:831 publish_msg: +1  - [5]
[19015] stasis.c:841 publish_msg: -1  - [6]
[19015] stasis.c:541 _dtor_topic: -1  - [5]
[19002] stasis.c:958 subscription_change_dtor: -1  - [4]
[19002] stasis.c:409 subscription_dtor: -1  - [3]
[19015] stasis.c:498 internal_stasis_subscribe: +1  - [2]
[19015] stasis.c:973 subscription_change_alloc: +1  - [3]
[19015] stasis.c:831 publish_msg: +1  - [4]
[19015] stasis.c:841 publish_msg: -1  - [5]
[19002] stasis.c:958 subscription_change_dtor: -1  - [4]
{noformat}

Some notes about configuration:
- Static pjsip.conf
- Have enough PJSIP endpoints with unsolicited MWI turned on (the more you have, it will increase due to number of MWIs)

By: Benjamin Keith Ford (bford) 2017-07-13 09:37:13.691-0500

We have enough information now to open the issue. Thanks for the report!

By: Corey Farrell (coreyfarrell) 2017-10-18 14:44:05.921-0500

Can you provide an example pjsip.conf that will reproduce this error?  Also let me know if this leak requires SIP devices to actually be connected to Asterisk or if just having them configured is enough.

By: Sergej Kasumovic (sergej) 2017-10-19 08:00:48.984-0500

Corey,
Here is the simplest possible configuration:
{noformat}
[udp]
protocol=udp
type=transport
bind=0.0.0.0:5060

; Create 100+ endpoints like this
[100]
type=endpoint
mailboxes=100@default
aors=100

[100]
type=aor

[101]
type=endpoint
mailboxes=101@default
aors=101

[101]
type=aor

;
; Make additional endpoints with a script ...
;
{noformat}

Nothing has to be registered.

Possible you could configure such mailboxes in voicemail.conf to avoid errors.  
Then repeat 200 reloads couple of times and you will see how RSS grows.

By: Corey Farrell (coreyfarrell) 2017-10-20 11:15:58.571-0500

I'm able to reproduce the issue and your patch seems to resolve it.  Do you want to post your change to gerrit [1]?  Posting to gerrit is the only way to get attention to it so others can tell us if this is the correct fix (this is slightly above my knowledge of stasis).

Note in the future please attach patches rather than posting as an inline comment.  Near the top of the page you click 'More' and then Attach files.  Even better when you have a patch that you have tested you can just post it to gerrit.  I only saw this bug because I was searching for PJSIP leak related JIRA issues.

[1] https://wiki.asterisk.org/wiki/display/AST/Code+Review

By: Friendly Automation (friendly-automation) 2018-09-07 05:40:45.652-0500

Change 10024 merged by Joshua Colp:
stasis_cache: Prune stasis_subscription_change messages

[https://gerrit.asterisk.org/10024|https://gerrit.asterisk.org/10024]

By: Friendly Automation (friendly-automation) 2018-09-07 05:40:54.513-0500

Change 10025 merged by Joshua Colp:
stasis_cache: Prune stasis_subscription_change messages

[https://gerrit.asterisk.org/10025|https://gerrit.asterisk.org/10025]

By: Friendly Automation (friendly-automation) 2018-09-07 05:41:05.517-0500

Change 10023 merged by Joshua Colp:
stasis_cache: Prune stasis_subscription_change messages

[https://gerrit.asterisk.org/10023|https://gerrit.asterisk.org/10023]

By: Friendly Automation (friendly-automation) 2018-09-07 05:41:15.603-0500

Change 10026 merged by Joshua Colp:
stasis_cache: Prune stasis_subscription_change messages

[https://gerrit.asterisk.org/10026|https://gerrit.asterisk.org/10026]

By: Corey Farrell (coreyfarrell) 2018-09-12 11:14:35.393-0500

Sorry I didn't get a chance to test the new patch before it was merged.  The patch actually made the leak worse, now every testsuite test leaks.  I've attached refs.txt from tests/agi/nominal.  I haven't determined the cause yet.

By: Friendly Automation (friendly-automation) 2018-09-20 04:53:48.968-0500

Change 10132 merged by Joshua Colp:
app_voicemail: Remove need to subscribe to stasis

[https://gerrit.asterisk.org/10132|https://gerrit.asterisk.org/10132]

By: Friendly Automation (friendly-automation) 2018-09-20 04:54:02.407-0500

Change 10135 merged by Joshua Colp:
app_voicemail: Remove need to subscribe to stasis

[https://gerrit.asterisk.org/10135|https://gerrit.asterisk.org/10135]

By: Friendly Automation (friendly-automation) 2018-09-20 04:54:12.866-0500

Change 10133 merged by Joshua Colp:
app_voicemail: Remove need to subscribe to stasis

[https://gerrit.asterisk.org/10133|https://gerrit.asterisk.org/10133]

By: Friendly Automation (friendly-automation) 2018-09-20 04:54:27.294-0500

Change 10134 merged by Joshua Colp:
app_voicemail: Remove need to subscribe to stasis

[https://gerrit.asterisk.org/10134|https://gerrit.asterisk.org/10134]

By: Friendly Automation (friendly-automation) 2018-09-20 08:59:42.922-0500

Change 10136 merged by Joshua Colp:
stasis_cache:  Stop caching stasis subscription change messages

[https://gerrit.asterisk.org/10136|https://gerrit.asterisk.org/10136]

By: Friendly Automation (friendly-automation) 2018-09-20 08:59:54.452-0500

Change 10137 merged by Joshua Colp:
stasis_cache:  Stop caching stasis subscription change messages

[https://gerrit.asterisk.org/10137|https://gerrit.asterisk.org/10137]

By: Friendly Automation (friendly-automation) 2018-09-20 09:21:41.538-0500

Change 10138 merged by Joshua Colp:
stasis_cache:  Stop caching stasis subscription change messages

[https://gerrit.asterisk.org/10138|https://gerrit.asterisk.org/10138]

By: Friendly Automation (friendly-automation) 2018-09-20 09:44:13.482-0500

Change 10139 merged by Joshua Colp:
stasis_cache:  Stop caching stasis subscription change messages

[https://gerrit.asterisk.org/10139|https://gerrit.asterisk.org/10139]

By: Corey Farrell (coreyfarrell) 2018-12-26 10:36:10.042-0600

Re-opened as the fix is about to be reverted due to regressions.

By: Friendly Automation (friendly-automation) 2019-07-12 09:23:37.827-0500

Change 11466 merged by Kevin Harwell:
app_voicemail: Remove dependency on the stasis cache

[https://gerrit.asterisk.org/c/asterisk/+/11466|https://gerrit.asterisk.org/c/asterisk/+/11466]