[Home]

Summary:ASTERISK-28894: new memory leak when updating 16.9.0 to 16.10.0
Reporter:Matthias Hensler (cubbi)Labels:fax
Date Opened:2020-05-15 03:56:18Date Closed:2020-06-04 12:00:01
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:.Release/Targets
Versions:16.10.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Linux 64-bit, CentOS 7, Kernel 3.10.0-957.5.1.el7.x86_64Attachments:( 0) Memory_Summary.zip
( 1) memory.zip
( 2) memory-month.png
Description:We currently follow the Asterisk 16 mainline, updating whenever a new version is available. After the recent update from 16.9.0 to 16.10.0 we see a steady increase in memory consumption, needing to restart the process regularly.

Unfortunately I cannot pinpoint to a specific component, so I need some suggestions on how to hunt this down.

I already lookup up know issues, but the only reported leak so far on 16.10.0 is ASTERISK-28870. However, I am pretty sure that we hit a different issue, as I recompiled our 16.10.0 with the mentioned patchset (commit 7fbfbe7) and still see the issue.

Involved components in our setup are pretty straightforward, since we only route sip-calls while billing them. Beside that we are use fax, but no additional features like conferencing, voicemail, etc. The system routes a couple of 10,000 calls a day with a maximum of around 200 calls in parallel.

Used components in detail:
* chan_sip (our progress to migrate to res_pjsip is ongoing)
* cdr_mysql (yes, deprecation status is known)
* res_fax_spandsp (used for in- and outbound fax on g711a and t38)

Used codecs are g711a and g722 only (plus t38 for fax).

The dialplan uses an external generated extension.conf in ascii-format (no database, or ael/lua involved). All external scripting is done via FastAGI-calls from the dialplan.

Beside that there a several connections to the AMI manager per minute to collect call stats.

Asterisk is compiled from an unmodified sourcecode and installed under a separate prefix per version (in detail: "./configure --prefix=/usr/local/packages/asterisk-16.10.0-7fbfbe7-issue28870 --disable-xmldoc --with-jansson-bundled", followed by a "make menuconfig" to enable cdr_mysql and opus-codec translater (which is not used at the moment) and then "make" and "make install").

When looking at the memory usage the Asteriskprocess will increase by around 200-300 MBytes per day. The increase is only noticeable in the daytime when we have to most calls, in the nighttime were calls are few there is no increase. Therefore I would rule out any issues with the manager AMI, since connections are constant throughout the day.

I apologise for just reporting  the memory leak without giving more hints at to which component might be affected. I will study the changeset from 16.9.0 to 16.10.0 more carefully if something spots the eye. If you need more details or give me any more hints on how to debug further I will happily provide (currently the process is still in a leak state and I have no pressure to restart within the next days. So if any CLI-commands are needed for checking some detailed status, just ask).
Comments:By: Asterisk Team (asteriskteam) 2020-05-15 03:56:20.554-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Matthias Hensler (cubbi) 2020-05-15 03:59:04.190-0500

To visualize the problem I attached the monthly memory statistic of the system.

To the left (approx. 2.5 weeks) you see the consumption of memory for an unmodified 16.9.0. Then the update to 16.10.0 follows with static increase of memory. At the begin of this week I recompiled 16.10.0 with the patch provided from ASTERISK-28870, which had no effect.

By: Ross Beer (rossbeer) 2020-05-15 04:49:53.300-0500

I am seeing the same issue, but have not yet identified the cause

By: Joshua C. Colp (jcolp) 2020-05-15 04:58:58.675-0500

Can you build an environment running a similar load with the MALLOC_DEBUG option enabled? It will keep track of the allocations and can make it quite easy to see where memory is leaking from. It provides the CLI commands:

memory show summary
memory show allocations

To show the information.

By: Matthias Hensler (cubbi) 2020-05-15 05:11:50.879-0500

I will prepare a debug-enabled build and report back later (will take several hours till I can switch versions in our maintenance window).

By: Ross Beer (rossbeer) 2020-05-15 08:06:52.457-0500

I am attaching the output from a server that is doing very little work with calls passing between two sip endpoints.

I have done a few calls to queues also as our queue server is currently using over 10GB of memory just running Asterisk. It's only been up for around 12 hours. An earlier asterisk version is also running queues and runs at around 4-5GB and has been up for months processing the same number of calls.

By: Sean Bright (seanbright) 2020-05-18 13:46:34.015-0500

[~cubbi], do you use ICE? If so, could you see if the patch attached to ASTERISK-28904 resolves the leak?

By: Matthias Hensler (cubbi) 2020-05-18 14:02:30.640-0500

Memory allocations.

mem-summary-20200515-2303.txt is the memory just after startup.
mem-summary-20200518-2055.txt is the memory some days later (after 35717 calls and 8 calls still running).

At this point asterisk already consumes around 3.2 GB ram (virtual, 750 MB Res).

By: Matthias Hensler (cubbi) 2020-05-18 14:06:35.763-0500

Some quick observation: there seems to be a leak at least in res_rtp_asterisk.c:
mem-summary-20200515-2303.txt:     20324 bytes in         65 allocations in file res_rtp_asterisk.c
mem-summary-20200516-0830.txt:    171676 bytes in        591 allocations in file res_rtp_asterisk.c
mem-summary-20200517-1109.txt:    754680 bytes in       4306 allocations in file res_rtp_asterisk.c
mem-summary-20200517-1129.txt:    636064 bytes in       3942 allocations in file res_rtp_asterisk.c
mem-summary-20200518-1053.txt:   9082322 bytes in      45111 allocations in file res_rtp_asterisk.c
mem-summary-20200518-1524.txt:  12640772 bytes in      76112 allocations in file res_rtp_asterisk.c
mem-summary-20200518-1706.txt:  11985152 bytes in      79534 allocations in file res_rtp_asterisk.c
mem-summary-20200518-1833.txt:   9742000 bytes in      73829 allocations in file res_rtp_asterisk.c
mem-summary-20200518-2055.txt:   9580584 bytes in      73412 allocations in file res_rtp_asterisk.c

stream.c and format_cap.c also increases:
mem-summary-20200515-2303.txt:      1440 bytes in         18 allocations in file stream.c
mem-summary-20200516-0830.txt:    223728 bytes in       2817 allocations in file stream.c
mem-summary-20200517-1109.txt:   1129648 bytes in      14175 allocations in file stream.c
mem-summary-20200517-1129.txt:   1141760 bytes in      14306 allocations in file stream.c
mem-summary-20200518-1053.txt:   4215416 bytes in      53559 allocations in file stream.c
mem-summary-20200518-1524.txt:   7841552 bytes in      98777 allocations in file stream.c
mem-summary-20200518-1706.txt:   8909888 bytes in     111854 allocations in file stream.c
mem-summary-20200518-1833.txt:   9143808 bytes in     114386 allocations in file stream.c
mem-summary-20200518-2055.txt:   9244928 bytes in     115616 allocations in file stream.c

mem-summary-20200515-2303.txt:   1388432 bytes in       7740 allocations in file format_cap.c
mem-summary-20200516-0830.txt:   1637272 bytes in      11531 allocations in file format_cap.c
mem-summary-20200517-1109.txt:   2919080 bytes in      29322 allocations in file format_cap.c
mem-summary-20200517-1129.txt:   1987976 bytes in      22160 allocations in file format_cap.c
mem-summary-20200518-1053.txt:   5151400 bytes in      71128 allocations in file format_cap.c
mem-summary-20200518-1524.txt:   6091200 bytes in     109867 allocations in file format_cap.c
mem-summary-20200518-1706.txt:   6312952 bytes in     121837 allocations in file format_cap.c
mem-summary-20200518-1833.txt:   6069936 bytes in     122639 allocations in file format_cap.c
mem-summary-20200518-2055.txt:   6046680 bytes in     123462 allocations in file format_cap.c

Please note: this is a vanilla build of 16.10.0, without(!) the patch from ASTERISK-28870 this time.

By: Matthias Hensler (cubbi) 2020-05-18 14:11:00.728-0500

[~seanbright]: could you elaborate? We did not explicitly enable ICE-support (I think you are referring to rtp.conf in that case?), and I think ICE is disabled by default.

By: Matthias Hensler (cubbi) 2020-05-18 14:12:54.375-0500

@Asterisk-Team: the process with the leak and malloc-debugger is still running, so please let me know if you need any further data. I also can recompile with any patch provided (I will give ASTERISK-28904 a try if needed). Just note that I only can restart our instance at around midnight GMT usually.

By: Sean Bright (seanbright) 2020-05-18 14:15:22.061-0500

[~cubbi], if you don't use ICE then it would not have an effect. If you were using ICE, ASTERISK-28904 fixes a memory leak.

By: Matthias Hensler (cubbi) 2020-05-18 14:18:54.530-0500

[~seanbright] sorry, I have to correct myself. While we did not explicitly enable ice-support, I just saw that the comment on our rtp.conf is from an older Asterisk-version, and today ICE is indeed enabled by default.

Checking on the CLI it is enabled indeed:
sip*CLI> rtp show settings


General Settings:
----------------
 Port start:      10000
 Port end:        50000
 Checksums:       Yes
 DTMF Timeout:    1200
 Strict RTP:      Yes
 Probation:       4 frames
 ICE support:     Yes
sip*CLI>

By: Sean Bright (seanbright) 2020-05-18 14:26:43.365-0500

Then it couldn't hurt to give [the patch|https://gerrit.asterisk.org/c/asterisk/+/14430] a try

By: Matthias Hensler (cubbi) 2020-05-18 16:04:54.629-0500

OK, so for your information: I just rebuilt 16.10.0 again with malloc-debug and now both patches from ASTERISK-28870 and ASTERISK-28904:
[root@sip asterisk-16.10.0]# patch -p1 < ../7fbfbe7.diff·
patching file main/stream.c
Hunk #2 succeeded at 589 (offset -7 lines).
[root@sip asterisk-16.10.0]# patch -p1 < ../862b9fc.diff·
patching file res/res_rtp_asterisk.c
[root@sip asterisk-16.10.0]# ./configure --prefix=/usr/local/packages/asterisk-16.10.0-malloc-28870-28904 --disable-xmldoc --with-jansson-bundled

I just started a new instance and think that I can provided updated memory-statistics within the next 20 hours.

If the already provided memory-statistics already give you some more clue please let me know.

By: Joshua C. Colp (jcolp) 2020-05-19 04:18:26.063-0500

After looking at the summary nothing stood out to me, so we'll see after applying those patches.

By: Matthias Hensler (cubbi) 2020-05-19 13:06:34.305-0500

OK, at the moment it is looking a bit better. The new build has processed 28k calls so far, and I cannot see a significant increase in memory consumption, at least compared to the previous build. However I would like to watch the situation some 24 hours more, as I am not sure how much ramp up time is needed and how many caches and reusable memory segments are consumed.

I will provide a new zip with memory summaries tomorrow then.

By: Matthias Hensler (cubbi) 2020-05-20 12:44:00.442-0500

Well, good news. I did not see any significant memory increase today (currently 56k calls which is more with the first malloc-debug build on where the problem still was present). Therefore the patch from ASTERISK-28904 seems to have done the job.

I will have a close eye on our asteriskprocess in the next days, but if the fix will make it into 16.11.0 (or 16.10.1), I think this issue can be closed for the moment.

Thanks for your help and keep up the good work.

By: Joshua C. Colp (jcolp) 2020-05-20 12:50:44.595-0500

The fix has been merged and will be in 16.11.0. I'll leave this in waiting for feedback for now, it will automatically close after 2 weeks in case I forget or if you encounter feedback that does not show an improvement in the next few days commenting will send it back into triage.

By: Asterisk Team (asteriskteam) 2020-06-04 12:00:00.448-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines