[Home]

Summary:ASTERISK-28197: stasis: ast_endpoint struct holds the channel_ids of channels past destruction in certain cases
Reporter:Mohit Dhiman (mohitdhiman)Labels:bridge-application endpoints.c memory-leak
Date Opened:2018-12-05 08:21:01.000-0600Date Closed:2019-01-14 08:27:53.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/Channels Core/Stasis
Versions:13.21.0 Frequency of
Occurrence
Constant
Related
Issues:
is related toASTERISK-28180 ami: High memory (increased upto 15 GBs) while pushing 500 calls continuously for 7 hours
Environment:Centos 7Attachments:( 0) originate_bridge_channel_count_test.txt
Description:For every call i made through asterisk to an endpoint the {{ast_endpoint}} struct adds that channel to its channel_ids {{ao2_container}} and after the channel gets destroyed it should remove it from that channel_ids list but the {{channel_count}} for endpoint is continuously increasing for every call i make to that endpoint.
i verified this by adding some warning logs to the {{ast_endpoint_snapshot_create}} function of {{main/endpoints.c}}, this log prints the channel_count for that endpoint.
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107069 max channel: -1
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107070 max channel: -1
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107071 max channel: -1
[Dec 5 17:14:45] WARNING[7397] endpoints.c: channel count: 107072 max channel: -1

here channel count corresponds to
{{channel_count = ao2_container_count(endpoint->channel_ids);}}

also there are no more than 400 channels at any instance.
Comments:By: Asterisk Team (asteriskteam) 2018-12-05 08:21:03.267-0600

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Mohit Dhiman (mohitdhiman) 2018-12-06 00:10:43.394-0600

the channel count is not always increasing after few calls it decreases to some extent but then start increasing again, rate of decreasing is much lower and not constant than the rate of increasing.

By: Mohit Dhiman (mohitdhiman) 2018-12-06 00:23:31.627-0600

attached file originate_bridge_channel_count_test.txt is the actual console output with added warning logs that give me the channel count every time a endpoint_snapshot gets created.

*dialplan was as simple as:*
{code:title=extensions.conf}
[test]
exten => h,1,Hangup()

exten => originate,1,set(GLOBAL(channelorig)=${CHANNEL})
exten => originate,2,wait(20)

exten => bridge,1,Bridge(${channelorig})
{code}

*cli commands to make a call.*
{code:title=cli commands}
originate SIP/201 extension originate@test
originate SIP/201 extension bridge@test
{code}

endpoint *SIP/201* is another asterisk running on another machine and just receiving the call and putting the call to wait application.

By: Mohit Dhiman (mohitdhiman) 2018-12-06 01:29:49.594-0600

also found that for a call as there are two channels, on hangup channel_id of the channel that was executing the dialplan (pbx channel) is being removed from ast_endpoint's channel_ids list and not the other channel that was in the call.

Interesting point to note is that when i use {{Dial}} application (instead of {{Originate}} and {{Bridge}}) for calling everything is working fine. channel count returns to 0 after every call.

By: Mohit Dhiman (mohitdhiman) 2018-12-06 14:21:32.975-0600

this issue caused the memory leak problem as i mentioned in ASTERISK-28180

By: Mohit Dhiman (mohitdhiman) 2018-12-11 11:30:43.686-0600

I did some debugging and found that {{endpoint_cache_clear}} is the route callback for an endpoint which is responsible for cleanup of channel_ids

In case of {{Bridge}} application when {{bridge_exec}} from {{features.c}} is invoked it calls the {{ast_channel_unref(current_dest_chan)}} to remove the channel reference if channel is not in the Bridge, which gives the following function call stack.
{quote}
#0  topic_remove_subscription (topic=0x7fbf2c008290, sub=0x36095a0) at stasis.c:711
#1  0x00000000005c8a22 in stasis_forward_cancel (forward=0x7fbf2c009fa8) at stasis.c:925
#2  0x00000000004cd126 in ast_channel_internal_cleanup (chan=0x7fbf2c0243d0) at channel_internal_api.c:1553
#3  0x00000000004af818 in ast_channel_destructor (obj=0x7fbf2c0243d0) at channel.c:2363
#4  0x000000000045b94b in internal_ao2_ref (user_data=0x7fbf2c0243d0, delta=-1, file=0x61cb0b "astobj2.c", line=518, func=0x61cd21 <__FUNCTION__.8693> "__ao2_ref")
   at astobj2.c:451
#5  0x000000000045bc2e in __ao2_ref (user_data=0x7fbf2c0243d0, delta=-1) at astobj2.c:518
#6  0x000000000050c26a in bridge_exec (chan=0x7fbf2c00a4d0, data=0x7fbfb3ffc460 "SIP/201-00000000") at features.c:1129
{quote}

finally the call to {{stasis_forward_cancel}} causes the removal of 4 subscribers from the non-pbx channel's topic and one of those 4 subscribers carries the callback {{endpoint_cache_clear}} which is needed to remove channel_ids from {{struct ast_endpoint}}

By: Mohit Dhiman (mohitdhiman) 2018-12-16 23:05:05.743-0600

I tried to stop the removal of subscribers from the non-pbx channel by adding a flag in ast_channel which stopped the channel_ids count from increasing forever but now the subscribers are not getting removed from the channel.

By: Mohit Dhiman (mohitdhiman) 2018-12-24 13:52:10.427-0600

I think i got the problem, so during bridging asterisk create another channel (yanked channel or original channel) to transfer the state from initial channel (clone channel) to this newly created channel and later hangup the initial channel after swapping all the states between them.
Here our initial channel structure was created using function {{ast_channel_alloc_with_endpoint}} which populates the {{endpoint_forward}} field of the channel, this field contains information regarding endpoint topics (which contains the {{endpoint_cache_clear}} callback) but the new channel created during bridging is created using function {{ast_channel_alloc}} which doesn't populates the {{endpoint_forward}} field of the channel and after masquerading when the initial channel hangs up the information in {{endpoint_forward}} field also dies with it.

i tried to swap the value of {{endpoint_forward}} field of the two channels in function {{channel_do_masquerade(dest, source)}} and everything seems to work fine.

i'll test this change on heavy load and hope for the better results.

By: Mohit Dhiman (mohitdhiman) 2018-12-25 10:04:06.038-0600

It worked, i ran the load of around 700 to 800 channels at any instance for continuous 10 hrs and there is no memory leak at all and {{channel_ids}} count is also not increasing as soon as channel dies the {{channel_ids}} count decreases.

By: Friendly Automation (friendly-automation) 2019-01-14 08:27:54.790-0600

Change 10872 merged by Joshua C. Colp:
stasis/endpoint: Fix memory leak of channel_ids in ast_endpoint structure.

[https://gerrit.asterisk.org/10872|https://gerrit.asterisk.org/10872]

By: Friendly Automation (friendly-automation) 2019-01-14 08:28:14.989-0600

Change 10862 merged by Friendly Automation:
stasis/endpoint: Fix memory leak of channel_ids in ast_endpoint structure.

[https://gerrit.asterisk.org/10862|https://gerrit.asterisk.org/10862]

By: Friendly Automation (friendly-automation) 2019-01-14 08:28:16.813-0600

Change 10859 merged by Joshua C. Colp:
stasis/endpoint: Fix memory leak of channel_ids in ast_endpoint structure.

[https://gerrit.asterisk.org/10859|https://gerrit.asterisk.org/10859]