[Home]

Summary:ASTERISK-22417: [patch]RTP ports left open after making calls using SIPTAPI
Reporter:Patrick Beaumont (pbeaumonthatsoff)Labels:patch
Date Opened:2013-08-28 07:51:32Date Closed:
Priority:MajorRegression?
Status:Open/NewComponents:Channels/chan_sip/General
Versions:11.4.0 13.18.4 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Linux 3.2.0-40-virtual #64-Ubuntu SMP i686 i686 i386 GNU/Linux, Ubuntu 12.04Attachments:( 0) asterisklog.txt
( 1) patch.diff
Description:Ports in use at the start
{noformat}
root@CN1:~# netstat -tunap
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:5038            0.0.0.0:*               LISTEN      428/asterisk    
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      313/lighttpd    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      129/sshd        
tcp        0      0 10.0.3.101:22           10.0.3.1:60572          ESTABLISHED 339/sshd: steve [pr
tcp6       0      0 :::22                   :::*                    LISTEN      129/sshd        
udp        0      0 0.0.0.0:30000           0.0.0.0:*                           428/asterisk    
udp        0      0 10.0.3.101:30001        0.0.0.0:*                           428/asterisk
{noformat}

First I initiate a call using dialer.exe through the application Siptapi.
My desk phone rings.
I pick it up.
My mobile phone then rings (the destination I set in dialer.exe).
I answer on my mobile phone.
I place the call on hold using my desk phone.
I initiate another call to my mobile using dialer.exe through the application Siptapi.
My desk phone indicates I have a call waiting.
I hang up the call I have on hold and my desk phone starts ringing.
I pick up my desk phone and my mobile phone starts ringing.
I answer the call on my mobile.
I hang up the call on my mobile.
I hang up my desk phone.

Ports in use at the end:
{noformat}
root@CN1:~# netstat -tunap
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:5038            0.0.0.0:*               LISTEN      428/asterisk    
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      313/lighttpd    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      129/sshd        
tcp        0      0 10.0.3.101:22           10.0.3.1:60572          ESTABLISHED 339/sshd: steve [pr
tcp6       0      0 :::22                   :::*                    LISTEN      129/sshd        
udp        0      0 0.0.0.0:30000           0.0.0.0:*                           428/asterisk    
udp        0      0 10.0.3.101:30001        0.0.0.0:*                           428/asterisk    
udp        0      0 10.0.3.101:30034        0.0.0.0:*                           428/asterisk    
udp        0      0 10.0.3.101:30035        0.0.0.0:*                           428/asterisk  
{noformat}

This is very repeatable and if performed enough times Asterisk will complain about no longing being able to allocate RTP ports and will start rejecting calls.
Performing a core restart appears to be the only way to release the ports.
Comments:By: Michael L. Young (elguero) 2013-08-28 11:17:04.917-0500

Not sure yet what would be causing this.  But a few observations.

Out of curiosity, can you fix this:
{noformat}
[Aug 28 13:29:04] VERBOSE[468][C-00000003] pbx.c:     -- Executing [s@macro-outDialRecord:6] MixMonitor("SIP/290-00000004", "/data/callRecording/Outgoing/00441142994032-07845552273-2013-08-28-13-29-04.WAV,bw(-4)v(-4)") in new stack
{noformat}

Change the lowercase w to an uppercase W for the options to MixMonitor.

You are running Asterisk in a virtual machine?

Also, noticed this in your logs:

{noformat}
[Aug 28 13:29:09] WARNING[430] asterisk.c: The canary is no more.  He has ceased to be!  He's expired and gone to meet his maker!  He's a stiff!  Bereft of life, he rests in peace.  His metabolic processes are now history!  He's off the twig!  He's kicked the bucket.  He's shuffled off his mortal coil, run down the curtain, and joined the bleeding choir invisible!!  THIS is an EX-CANARY.  (Reducing priority)
{noformat}

Some info on astcanary:
{quote}
This process serves a similar purpose, though with the realtime priority being the reason. When a thread starts running away with the processor, it is typically difficult to tell what thread caused the problem, as the machine acts as if it is locked up (in fact, what has happened is that Asterisk runs at a higher priority than even the login shell, so the runaway thread hogs all available CPU time.

If that happens, this canary process will cease to get any process time, which we can monitor with a realtime thread in Asterisk. Should that happen, that monitoring thread may take immediate action to slow down Asterisk to regular priority, thus allowing an administrator to login to the system and restart Asterisk or perhaps take another course of action (such as retrieving a backtrace to let the developers know what precisely went wrong).
{quote}


By: Patrick Beaumont (pbeaumonthatsoff) 2013-08-28 11:37:30.351-0500

Hi Michael.

Thanks for the advice. The canary issue is solved. I inherited the server from someone else and there were trying to set real time priority inside a virtual machine which as far as I understand won't work. I've removed the -p from their startup scripts.

I've corrected the MixMonitor issue as well. Thanks for pointing that out.

By: Rusty Newton (rnewton) 2013-09-04 19:49:42.387-0500

Patrick, after making those changes, does the RTP port leak still appear to exist?

By: Patrick Beaumont (pbeaumonthatsoff) 2013-09-05 05:04:12.179-0500

Hi Rusty.

Thanks for following up on this.

First of all the config changes didn't appear to make any difference to the problem. The ports were still being left open.

Secondly I've been doing my own investigation in to the code and think I've identified the problem and a very crude solution.

From what I can tell the issue is that when the initial INVITE comes from Siptapi Asterisk creates an RTP engine for the audio between itself and Siptapi and attaches it to the SIP channel.
Siptapi then sends a REFER command which changes the SIP channel's method from INVITE to REFER. This seem to upset Asterisk's internal reference counting.
When an INVITE channel is destroyed at some point stop_media_flows is called within chan_sip.c. This then causes the references to the RTP engine to be decreased allowing it to be destroyed when the SIP channel is destroyed. For some reason stop_media_flows is not being called for the initial INVITE channel between Siptapi and Asterisk. I'm guessing it's something to do with the channel method changing from INVITE to REFER but can't be sure at this stage.
The end result is that when __sip_destroy runs it removes all references to the SIP channel but the RTP engine never gets destroyed and so leaves the ports open.

As a crude fix I've added another call to stop_media_flows in the __sip_destroy function which seems to solve the problem and in the limited testing I've done hasn't introduced any side effects. I'll be testing it more thoroughly next week and will report back.

I can't say it is a particularly good solution as it ignores the real issue of why the reference count of the RTP engine gets out of sync in the first place.

If anyone wants more information about what I've been trying and experimenting with in relation to this problem please let me know. I'll report back towards the end of next week with how the fix has behaved in the real world.

By: Rusty Newton (rnewton) 2013-09-05 15:01:33.358-0500

Thanks for looking into it Patrick.

An attempt at a patch, especially if it appears to solve the issue, is always welcome.

After you have tested the patch, you can sign the contributor's license agreement and then attach the patch to this issue. Then the issue will wait for an Asterisk developer to review it and see what is going on. You can find more info on all that here: https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines#AsteriskIssueGuidelines-PatchandCodesubmission

Another very helpful thing would be if you can determine a very simple way to reproduce the issue without using siptapi and describe the steps here. That would allows others to test your patch and confirm the issues and the fix.

Thanks again!

By: Patrick Beaumont (pbeaumonthatsoff) 2013-09-07 12:18:02.090-0500

I've put the "patch" in to the production server with the problem. I'll be monitoring it over the next week and let you know how that goes.

If you want to reproduce this without using Siptapi I would have thought you could use any "click to dial" software that uses an INVITE then a REFER to initiate the call. I haven't tried any others myself but if I get time over the next week I'll look for some.

By: Rusty Newton (rnewton) 2013-09-26 16:41:30.402-0500

bq. I've put the "patch" in to the production server with the problem. I'll be monitoring it over the next week and let you know how that goes.

Thanks! It's been a couple weeks, how has the patch worked out?

bq. If you want to reproduce this without using Siptapi I would have thought you could use any "click to dial" software that uses an INVITE then a REFER to initiate the call. I haven't tried any others myself but if I get time over the next week I'll look for some.

Probably could have, but it is an issue of time. I'm working with hundreds of issues, and I dont have any "click to dial" software set up for testing with at the moment. The burden is mostly on the reporter here to help us out. We definitely appreciate it.

By: Patrick Beaumont (pbeaumonthatsoff) 2013-09-27 03:12:42.248-0500

Hi Rusty.

Apologies for the delay in getting back to you. Things are a bit hectic at work.

The good news is that I haven't noticed any side effects from my crude patch and the customer is no longer reporting any problems. I've rolled the patch out to the rest of my Asterisk servers (about 50 of them) and have not seen any other side effects. That patch is essentially:
channels/chan_sip.c

\[Edit: mjordan\]

Inline patch removed. Please attach the patch to the issue in diff format after signing a license contributor agreement. Thanks!


I must stress that it is a fairly crude fix. The real issue seems to be that stop_media_flows isn't called earlier in the channels life cycle but I haven't really had the time to properly track down why.

By: Rusty Newton (rnewton) 2013-10-18 15:50:58.402-0500

Thanks for the feedback.

We can't take an inline patch for various reasons, see here: https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines#AsteriskIssueGuidelines-PatchandCodesubmission

Please, when you get a moment, verify you have signed the submission agreement, check the patch against the coding guidelines (linked in the page linked above) attach the patch and we'll see about pushing it into Asterisk.

Thanks again for the report and contributing your fix back to the project!

By: Patrick Beaumont (pbeaumonthatsoff) 2013-11-15 06:23:58.522-0600

I've attached a patch. Let me know if it is of any use to you.

By: Rusty Newton (rnewton) 2013-11-21 09:28:48.436-0600

Thanks. Attaching your asterisklog file to the issue, so it will always be available.

By: Rusty Newton (rnewton) 2013-11-21 09:33:02.369-0600

Opening this issue up, since, even though we haven't had others see it, there is an issue at least for you, you have provided a patch that fixes it in your case and a full log showing what is going on when the issue occurs.

By: Corey Farrell (coreyfarrell) 2014-11-04 09:07:13.437-0600

[~pbeaumonthatsoff]: If you can provide a log file with sip debug enabled showing the packets received/sent for a single call, I can likely put together a SIPp scenario to test this.  Please do this with an unpatched copy of the current Asterisk 11 release.  If you know how to work with SIPp and have time to provide a scenario XML file that would make things go faster.

By: Walter Doekes (wdoekes) 2014-11-04 09:16:15.764-0600

Could this be one or more of these?
ASTERISK-22436
ASTERISK-15879


By: Private Name (falves11) 2015-03-05 12:33:05.190-0600

I am trying to test the patch, but it does not apply to version 11 current.
Also it is trying to modify an nonexistent file.


By: Private Name (falves11) 2015-03-05 12:37:21.159-0600

I tested the patch and the same exact ports remain open after the calls are closed.
Something is not working