[Home]

Summary:ASTERISK-27140: Memory leak in sorcery.c – pjsip subscriptions not expired
Reporter:Jacek Konieczny (jkonieczny)Labels:
Date Opened:2017-07-18 06:43:09Date Closed:2020-01-14 11:13:50.000-0600
Priority:MajorRegression?
Status:Closed/CompleteComponents:
Versions:13.15.1 14.6.0 Frequency of
Occurrence
Frequent
Related
Issues:
Environment:Linux 4.4.66, 32-bit asterisk Yealink T41, T23, T27 and Snom 710 SIP phonesAttachments:
Description:We have experienced quite a serious memory leak on a system running Asterisk 13.14.1. Asterisk process would grow until it fails due to OOM problems.

We have enabled memory debugging code in Asterisk, which pointed us to allocation in the 'sorcery.c' file. Here are some of regular measurements:

{noformat}
Jul 18 08:29:05  12293024 bytes in       1871 allocations in file sorcery.c
Jul 18 08:30:05  12427300 bytes in       1879 allocations in file sorcery.c
Jul 18 08:31:05  12494666 bytes in       1885 allocations in file sorcery.c
Jul 18 08:32:05  12662739 bytes in       1897 allocations in file sorcery.c
[...]
Jul 18 13:08:05  53012221 bytes in       4297 allocations in file sorcery.c
Jul 18 13:09:05  53146497 bytes in       4305 allocations in file sorcery.c
Jul 18 13:10:05  53246976 bytes in       4309 allocations in file sorcery.c
{noformat}

Further investigation lead us to the abnormal count of pjsip subscriptions:
{noformat}
CLI> pjsip show subscriptions inbound
Endpoint: <Endpoint/Caller-ID.............................................>
Resource: <Resource/Event.................................................>
 Expiry: <Expiry>  <Call-id..............................................>
===========================================================================

Endpoint: pauu9/"phone:pauu9" <phone_pauu9>
Resource: 603/dialog
 Expiry:        0  0_3532174332@X.X.X.X

Endpoint: 8wmgf/"phone:8wmgf" <phone_8wmgf>
Resource: 603/dialog
 Expiry:        0  0_1381764142@X.X.X.X
[...]
Endpoint: waqiy/"phone:waqiy" <phone_waqiy>
Resource: 400/dialog
 Expiry:     1748  0_710460239@X.X.X.X

1603 active subscriptions
{noformat}

We have seen even over 10000 active subscriptions there. Most of them with 'Expiry: 0' (similar to ASTERISK-26821).

The system would also occasionally crash due to ASTERISK-27057, so I hoped Asterisk upgrade will fix the problem.

Unfortunately, we are able to reproduce the memory/subscription leak on Asterisk 14.6.0 in our test system. It is not that dramatic there, though, as the system load is much lower.

{noformat}
qemu*CLI> pjsip show subscriptions inbound
Endpoint: <Endpoint/Caller-ID.............................................>
Resource: <Resource/Event.................................................>
 Expiry: <Expiry>  <Call-id..............................................>
===========================================================================

Endpoint: vfd5v/"phone:vfd5v" <phone_vfd5v>
Resource: 313/dialog
 Expiry:        0  0_3957488405@192.168.1.141

Endpoint: rcumw/"phone:rcumw" <phone_rcumw>
Resource: rcumw/message-summary
 Expiry:     2228  0_3402140230@192.168.1.100

Endpoint: rcumw/"phone:rcumw" <phone_rcumw>
Resource: 555/dialog
 Expiry:        0  0_2653972870@192.168.1.100

Endpoint: rcumw/"phone:rcumw" <phone_rcumw>
Resource: 327/dialog
 Expiry:        0  0_3105449958@192.168.1.100

Endpoint: rcumw/"phone:rcumw" <phone_rcumw>
Resource: 307/dialog
 Expiry:        0  0_3959991489@192.168.1.100

Endpoint: rcumw/"phone:rcumw" <phone_rcumw>
Resource: 313/dialog
 Expiry:        0  0_3788627546@192.168.1.100

Endpoint: rcumw/"phone:rcumw" <phone_rcumw>
Resource: 400/dialog
 Expiry:        0  0_2717320225@192.168.1.100

[...]

Endpoint: rcumw/"phone:rcumw" <phone_rcumw>
Resource: 400/dialog
 Expiry:     1786  0_1665913924@192.168.1.100

185 active subscriptions
{noformat}

This took just ~3 hours.

This test system has only 20 endpoints and only couple of them use SIP subscription to couple of others. Most entries on the list are duplicates, with 'Expiry: 0'.

{{sorcery.c}} allocations were also increasing:
{noformat}
  2569589 bytes in        857 allocations in file sorcery.c
[...]
  5994851 bytes in       1061 allocations in file sorcery.c
{noformat}

Step to reproduce:
* Set up Asterisk PBX with a few SIP phones
* Configure BLF on the sip phones to SIP-subscribe other endpoints
* Have some of the subscribed extensions unavailable (assigned SIP phones offline)
* wait until first entries are to be expired (Expiry: 0  in 'pjsip show subscriptions inbound')

Expected results: expired entries disappear after some time.
Actual result: expired entries stay in the 'pjsip show subscriptions inbound'

Note: we have reproduced it only in our specific Asterisk configuration, as we have no means to quickly reproduce it in and isolated simple Asterisk setup (our test systems are designed to test our product, not Asterisk alone).

I guess the way we use hints in asterisk dial plan may be related to the problem, so here is a sample:

{noformat}
[ext-local]
exten => 303,hint,${HINT(1581dca5-e34d-371c-aed0-09002af06d06@users)}
exten => 313,hint,${HINT(56d1d500-09dd-34ec-b40a-d65c065a186e@users)}
exten => 330,hint,${HINT(a4f5591b-78f2-3d9c-980e-4d6fa097b924@users)}
exten => 300,hint,PJSIP/ooz9t
...

[users]
exten => 56d1d500-09dd-34ec-b40a-d65c065a186e,hint,PJSIP/jfikn&custom:313,CustomPresence:313
exten => 1581dca5-e34d-371c-aed0-09002af06d06,hint,PJSIP/rcumw&custom:401,CustomPresence:401
exten => c3d62505-11dc-36db-9aa3-508693b6a3c0,hint,PJSIP/fgd6r,CustomPresence:327
...
{noformat}

We may attempt to gather more debugging information next week, if needed,
Comments:By: Asterisk Team (asteriskteam) 2017-07-18 06:43:10.148-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Rusty Newton (rnewton) 2017-07-20 13:56:28.025-0500

For us to investigate the issue with good odds of resolving it, we need to be able to reproduce it.

If you can build an isolated, simple test environment to reproduce the issue and then provide instructions we can follow - that will be the fastest way to resolution.

In the meantime if you can attach a debug log captured during a smaller scale test that exhibits the issue, that may help.

https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

We need the verbose and debug channels turned up to 5 or higher.

Along with that, follow the instructions here and provide the indicated output:

https://wiki.asterisk.org/wiki/display/AST/Memory+Leak+Debugging

Thanks!

By: Jacek Konieczny (jkonieczny) 2017-07-28 07:39:14.314-0500

I was able to reproduce the problem on a vanilla Asterisk 14.6 with a simple config.

Once I got dozens of duplicate stale subscriptions overnight. Now, after a restart I have only one non-expiring duplicate.

I am trying to prepare a simple reproducible test case, but I still have no idea what is the trigger. A bug is definitely there in the Asterisk code.

By: Rusty Newton (rnewton) 2017-07-28 08:59:11.124-0500

Even if you can't reliably reproduce - if you can gather the debug logs along with malloc_debug output that may be helpful.

By: Asterisk Team (asteriskteam) 2017-08-12 12:00:01.797-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines