[Home]

Summary:ASTERISK-15795: [patch] endless wait for RTP in certain scenarios
Reporter:Walter Doekes (wdoekes)Labels:
Date Opened:2010-03-12 02:34:24.000-0600Date Closed:2012-09-14 03:25:29
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:1.8.7.1 Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) ast1424-17012-sdp-direction-passive-POC.diff
( 1) ast1430rc3-17012-sdp-direction-passive-POC.diff
( 2) issue15795_rtp_prodding_1.8.patch
( 3) issue15795_rtp_prodding_WIP.patch
Description:A Dialogic machine that I'm peering with has the ability to go into passive mode for RTP traffic (adding a=direction:passive to the SDP body). I haven't figured out why/when it does so, but when it does -- and it always does in my current setup -- everyone ends up waiting for data and no RTP stream gets set up.

****** STEPS TO REPRODUCE ******

When asterisk bridges a call from X to Y through itself, the following sequence of SIP INVITE and 200 OK occurs (A is the asterisk machine):

(1) X INVITEs A (with a=direction:passive in the SDP)
(2) A INVITEs Y
(3) Y 200 OKs A (with a=direction:passive in the SDP)
(4) A 200 OKs X

Both X and Y wait for data. They've marked that they'll be waiting for asterisk to make the first move. Asterisk also waits for data. The result: no media.





****** ADDITIONAL INFORMATION ******

The issue occurs on 1.4.24. I've browsed the 1.6 source and the bug reports and haven't found any indications that in newer versions this issue is tackled. After filing this report, I'll check 1.4.30 to be on the safe side.

I've created a patch which forces asterisk to send some data. It's merely a proof-of-concept and lacks the following:
(1) I don't know what a reasonable audio frame looks like, that should be fixed.
(2) The 1000 ms wait after ast_answer is arbitrary but increases the chance that the peer has seen and parsed our 200 OK with our session information. It would probably be better to begin sending data right away, and keep doing it until we get the first replies.
(3) The fix need possibly only be applied if both peers have sent "a=direction:passive". I'm not sure whose job it is to begin communicating.
(4) I don't know if the locking I've used is right or if I need to free anything after the ast_sched_add.

It does however seem to work and not conflict with the regular calls that are not affected by the problem.


Regards,
Walter Doekes
OSSO B.V.
Comments:By: Walter Doekes (wdoekes) 2010-03-12 03:52:57.000-0600

And indeed, the problem occurs with 1.4.30rc3 as well. And the proof-of-concept fix still works.

(Unrelated, but worthwhile mentioning: 1.4.29.1 does not work in my test setup. It could be that 302 handling is broken somehow because after setting up the second call it doesn't 200 OK the first.)

By: Walter Doekes (wdoekes) 2010-03-12 07:51:51.000-0600

And as to where the a=direction:passive came from. "Comedia mode" was set to Passive ( http://excelsupport.dialogic.com/imgpubs/webhelp/ipbearerentry.htm ).

Disabling this feature resolves the issue I was having. That means that the issue is now only a theoretical one for me: yes, there is something that could be fixed, and no, there will probably be very few people needing it.

Regards,
Walter

By: Leif Madsen (lmadsen) 2010-03-15 11:21:16

I'm marking this issue as Confirmed as you've provided a couple of patches, but based on your notes is not yet Ready for Testing.

Thanks for the submission!

By: Walter Doekes (wdoekes) 2010-03-16 03:39:54

Thanks Leif.

I was hoping that someone with knowledge of audioframes/RTP and with ast_sched_* experience could look at it to give me some pointers on how to fix it properly.

By: Leif Madsen (lmadsen) 2010-03-16 13:54:38

Sounds good. The next step is likely to send it to reviewboard as several developers have time allocated each month to do reviews, and that should hopefully get you the information to move you forward again. Thanks!

By: frawd (frawd) 2010-04-27 05:20:28

I have the same kind of issue trying to bridge two SIP calls from and to a Nortel CS2K proxy. None of those channels ever send RTP, probably waiting for the other one to send first in some type of "Comedia mode", so the call is muted until someone hangs up. There is no "a=direction" SDP in my case.

My current workaround is to Answer() the originating channel so Asterisk starts sending RTP, to force the originating channel to send some RTP too. The bridge then works okay.

wdoekes: Please post this to the review board. I think it's the way to go but to be frank I'm quite scared to try your patch in production right now.

By: Walter Doekes (wdoekes) 2010-04-27 05:26:31

frawd: I've been meaning to post it on the reviewboard yes. I'll go make some time for that.

And yes, DO NOT run it on production. The patch has a race condition if you hang up after approximately one second which can cause asterisk to crash.

By: frawd (frawd) 2010-04-27 06:57:47

Ok, I'm monitoring this issue and will be happy to test when you believe it can be run in production.

By: frawd (frawd) 2010-04-27 08:10:04

A few questions/comments:
- Could the 'prod' be done only if nothing is received in a certain amount of time after answer (see __ast_answer in channel.c, "Didn't receive a media frame")?
- Does this only apply to SIP/RTP or can it be safely done in the global ast_answer in channel.c?
- In that case, should this only be done in case of audio streams, or does this apply to to any RTP stream (video/image/..)?
- Maybe a configuration parameter (global and/or per-line) could activate the RTP 'prodding'.

I also saw a way to generate a silent audio frame instead of putting garbage in, maybe it could help:
http://lists.digium.com/pipermail/asterisk-dev/2009-May/038416.html

By: Walter Doekes (wdoekes) 2011-12-09 04:25:13.614-0600

I just ran this issue again, but a little bit differently:

(1) If no res_timing_* module is loaded, indications (playtones) and other locally generated RTP fails to start. The prodding fix fixes this, but loading res_timing_timerfd.so fixes this as well.

(2) If I'm using directmedia=no and I'm bridging a call between two peers that also have directmedia=no and no res_timing_* modules, the RTP is stalled too. The prodding fix fixes this.

I've tested the fix against 1.6.2.20. But I've verified that 1.8.7.1 needs some kind of fix (the same probably, but ported to 1.8) too.

By: Walter Doekes (wdoekes) 2012-09-14 03:25:29.222-0500

I'm thinking this probably already works.

I've seen similar issues where I needed the asterisk to begin sending which worked, but only if one of the timing resource module was loaded.

Closing now to reduce open bugs.