[Home]

Summary:ASTERISK-04152: Increasing delay over time on non-Zap channels in MeetMe
Reporter:Tony Mountifield (softins)Labels:
Date Opened:2005-05-12 17:00:18Date Closed:2011-06-07 14:10:49
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Applications/app_meetme
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) zaptel-ztdummy-0004252.diff.txt
Description:Over a period of time, participants on SIP or IAX channels in a MeetMe conference experience increasing delay in hearing the conference.

By instrumenting the reads and writes in the main loop of conf_run() I discovered that data was appearing on the pseudo-fd more often than it was appearing on the incoming channel. This data was being written to the channel, gradually building up a backlog in the channel driver.

My patches (for STABLE and HEAD) provide an interlock between the reading and writing of the channel, so that the same number of frames are written to the channel as are read from it. Each time a frame is read, the channel becomes eligible to have one frame written to it. The frame in fr, if it cannot be written immediately, will pend until the next incoming frame is read. If further data arrives from the pseudo-fd in the meantime, it will overwrite the pending frame. The effect of all this is to avoid the buildup of a backlog going out to the channel.

Please folks, try these patches and feedback here.

****** ADDITIONAL INFORMATION ******

Disclaimer on file.

Please note that this is NOT a duplicate of bug ASTERISK-3519, but a completely separate problem. ASTERISK-3519 concerns delay introduced due to the sounds on joining a conference. This report concerns the gradual increase in delay over time.
Comments:By: Andrew Kohlsmith (akohlsmith) 2005-05-12 17:23:05

This sounds like a band-aid solution; why are there more frames appearing than there are in reality?

By: petersv (petersv) 2005-05-13 02:01:10

The clocks of the conference (zaptel-derived) and the sip connection may be slightly off. They are bound to be, since they are not synchronized.

From the problem description it sounds like Asterisk is clocking the outbound rtp stream from the inbound stream. I am not familiar enough with sip/rtp to tell if this is required or not.

Intuitivly it would make more sense to send the data from an internal clock source (such as the zaptel timer) independent of the received data.

By: Tony Mountifield (softins) 2005-05-13 02:59:11

Andrew - yes it would be good to find out why more data is appearing on the pseudo-fd, but in the meantime this fix prevents such behaviour from causing the delay problem, which is a quick and useful win.

Peter - Asterisk does indeed clock the outbound RTP stream from the inbound one, which is exactly why SIP phones must have Silence Suppression turned off. Clocking data from an internal clock source is what it does at the moment, i.e. when the pseudo-fd says data is ready, it is copied and sent.

One issue I haven't investigated yet is whether ast_waitfor_nandfds() is reporting the pseudo-fd ready more often than it ought, resulting in spurious duplicate frames. Even if it isn't, clock drift is worth guarding against, and only has an effect when necessary.

By: Tony Mountifield (softins) 2005-05-13 03:33:56

On further testing, it looks like I made assumptions that were true for the SIP channels I was using, but appear not to be true for OH323. I haven't tried IAX yet.

By: petersv (petersv) 2005-05-13 13:42:40

softins: Hm, how can a delay accumulate if the data is sent when the intenral source instructs it to? I guess I am not clear on where the delay builds up.
Do you mean that the sip channel only transmits when packets are received and at the same time the conference subsystem queues frames for transmission from the internal clock?

If that is the case - why is asterisk clocking outgoing media from the incoming media from the same device? Would it not be more natural to drive the outgoing part of a channel from the timing of the source it is bridged (or the conference in this case) instead?

By: Tony Mountifield (softins) 2005-05-13 13:59:03

Peter - my understanding is that, for SIP at least, the outgoing RTP stream is clocked by the incoming RTP stream. So over time, the number of bytes sent to the phone cannot exceed the number of bytes received from it.

If the meetme conference is sending MORE data to the channel than is coming in from that channel, the excess data will build up in the outgoing queue, causing the audio to be delayed by the length of the queue.

I guess there must be an upper limit on the amount of queued data in the SIP channel. oej?

I think IAX allows several seconds worth of data to build up, from an experience the other day and a report from a user, but I haven't had time to try that yet - fighting with H.323 at the moment! :-(

By: petersv (petersv) 2005-05-15 05:33:59

Why is an outgoing stream clocked from the incoming stream on the same channel? That seems backwards to me. Would it not make more sense if the rtp outbound stream was driven by the bridged channel or internally for synthezised audio (files etc)?

By: Tony Mountifield (softins) 2005-05-17 03:28:29

I've discovered that another independent cause of delays is an inaccurate ztdummy driver under kernel 2.6. Please see ASTERISK-4199 for details.

Note that that issue does not supersede this one, as this one also applies when using hardware timing.

By: Clod Patry (junky) 2005-06-09 18:13:42

Can i ask why there's: /* break; */ ?
If you dont need it, why not just take off that code?

By: Michael Jerris (mikej) 2005-06-10 06:02:36

If silence suppression is in use on the phone sending to us, we will not get packets all the time potentially, causing the same type of delay buildup with this method.  Correct?

By: Tony Mountifield (softins) 2005-06-10 12:18:51

That's probably correct, but I thought Asterisk didn't support Silence Suppression.

Regarding the /* break; */, that was just in the original code that I copied.

By: Michael Jerris (mikej) 2005-06-19 14:06:53

Or we would have the same sort of buildup if we were loosing packets on the connection incomming to us.  Isn't there a better way to clock this, and even better, occasionally just dump the packets if the queue gets too long.  Why are we not sending everything we have queued up every time we check, this would keep any backed up queue from getting created in the first place.

By: Michael Jerris (mikej) 2005-07-20 19:30:50

can we use the scheduler to somehow schedule an occasional flush to address this?

By: Tony Mountifield (softins) 2005-07-21 03:04:10

The question is what the scheduler would flush? I don't think a ZT conf flush on the pseudo channel would help, because I think the delay builds up in each outbound SIP channel.

Perhaps OEJ could say whether it is possible for a SIP outbound audio queue to grow large if it is given too much data?

I'm on vacation at the moment, so can't look at anything till August.

By: Olle Johansson (oej) 2005-07-21 03:50:20

Oh, I don't know much about the inner workings of RTP. I work mostly with SIP signalling, not the actual audio... Sorry.

By: H. Jones (blackfox) 2005-09-03 16:01:20

The currently listed patch for "HEAD" applies successfully to Asterisk 1.2.0-beta1 but fails to correct the issue as in earlier versions.

Has the root cause of this been found?


By: Tony Mountifield (softins) 2005-09-03 16:22:02

Blackfox: I'm not aware of the root cause having been found.

Are you saying that the patch helped when applied to previous versions, but no longer helps when applied to 1.2 beta1, even though it successfully patches?

By: H. Jones (blackfox) 2005-09-03 16:23:49

Correct.

The RTC-patch (http://bugs.digium.com/view.php?id=4301) for the ztdummy module is included in 1.2.0-beta1, and  "0003599: Audio delay in MeetMe using SIP when not 'q' mode" (http://bugs.digium.com/view.php?id=3599) I don't believe is the real issue.

I can go back and check this patch against 1.0.9 if desired; I recall it working properly on 1.0 with this fix for sure.



By: H. Jones (blackfox) 2005-09-21 14:13:33

I have determined that the symptoms described in this bug were solved for me (YMMV, works for me) by bug 0004301 being fixed.  However, 0004301 seems to have a bug in the fix itself, so this bug manifested.

In 0004301 and Asterisk 1.2.0-beta1, ztdummy.c:48 reads:
#if LINUX_VERSION_CODE >= VERSION_CODE(2,6,13)

This implies to me it's checking for kernel version 2.6.13.  However, as part of my RHEL4-based system, the kernel in use is 2.6.9.  By changing "2,6,13" to "2,6,1", the #if succeeds and RTC is properly used, and the symptoms disappear.

Note, my comments here do NOT include the attached patches to modify app_meetme.c.  Only zaptel/ztdummy.c was changed.

Patch tested on 1.2.0-beta1 and current CVS.  For previous versions, bug 0004301's patches should be applied first.

Attempting to attach my 1-line patch to this bug.  Please review for accuracy; I'm hoping this wraps this bug up.

By: Andrew Lindh (andrew) 2005-09-21 15:41:10

I had the same delay problem with the old ZTdummy (no digium hardware). The old USB timing would cause a large delay in a few minutes. After I used the RTC patch (changing the version or "#if 1") solved the problem. So the solution is to run a version zaptel that uses correct timing. RTC or digium hardware timing.

Run the zttest program and see what you get. Normally I see >99.99% on digium hardware timing and >99.95% on RTC. Anything less and the delay will get worse quicker. Digium hardware can still be a problem if there are interrupt problems on a system. Run the tests, if it's low then that's a big problem.

By: Chih-Wei Huang (cwhuang) 2005-10-13 05:39:54

Any update about this bug?
I encountered the delay problem too (using 1.2-beta1.)
The delay occurs when three or more participants in the conference.
The three participants in my conference are H.323, SIP and MGCP IP phones. Strangely, when talking from SIP or MGCP phones, there is
no obvious delay could be heard in H.323 phone.
But when H.323 phone talked, a gradually increasing delay could
be sensible from SIP or MGCP phones.

Since I'm using zaptel-1.2-beta1, the issue in bug 4301 is resolved (I believed). I have tried to apply app_meetme-HEAD.diff,
but the situation was worse. The sound became choppy in SIP and MGCP phones. The delay seems still exists, but I'm not sure since the sound is too choppy to be heard.

Any idea?

By: Serge Vecher (serge-v) 2005-10-13 08:45:03

cwhuang, can you please try the patch in bug 5374 (http://bugs.digium.com/view.php?id=5374) to see if that affects the problem?

By: Chih-Wei Huang (cwhuang) 2005-10-16 23:25:37

Hi vechers, I tested that patch with and without app_meetme-HEAD.diff, but I couldn't see any difference.

By: Serge Vecher (serge-v) 2005-10-18 09:44:04

cwhuang, please confirm that the patch you've tried is 2005-10-04-3-asynchronous.patch. Also, please report your setup: what distro/kernel version you have, what if any hardware, and what channels do clients connect to Meetme with. Thanks



By: Chih-Wei Huang (cwhuang) 2005-10-18 22:38:38

Yes, I used 2005-10-04-3-asynchronous.patch. In fact I have read it. It patches four files: channel.c, app_milliwatt.c, app_sms.c and app_chanspy.c, right? Since I don't use module milliwatt, sms and chanspy, I assumed it only affect channel.c. Correct me if I'm wrong.

I'm using Scientific Linux SL Release 4.0, with its default kernel 2.6.9. CPU is Mobile AMD Geode(t}=0~{ processor, as shown by cat /proc/cpuinfo. I have three types of clients: SIP, MGCP and H.323.
SIP and MGCP are the native channel drivers, while H.323 use chan_oh323 from inaccessnetworks. The codec is G.723.1 from Intel IPP library.

The Asterisk version is 1.2-beta1. Should I try the latest CVS?

By: Kevin P. Fleming (kpfleming) 2005-11-08 18:33:15.000-0600

This problem should now be solved in CVS HEAD. The previous code was unable to handle slight timing diferences between the conference timing source and the VOIP channels; the new version should be able to handle this without trouble.

If you still experience this issue after upgrading, please reopen this bug.

By: Tony Mountifield (softins) 2005-11-09 06:27:49.000-0600

As the original submitter of this bug, I just wanted to say that I've now concluded the patches I submitted for this particular issue are not correct, as they make invalid assumptions, such as assuming the incoming frames are the same size as the outgoing ones, which is not always the case.

Please could the meetme patches on this bug be removed?

This issue is much better solved with the asynchronous patch in bug ASTERISK-5230, which I have tried with great success.

By: BJ Weschke (bweschke) 2005-11-09 15:10:05.000-0600

meetme patches removed and bug is now closed based on softins' note.

By: Digium Subversion (svnbot) 2008-01-15 15:55:01.000-0600

Repository: asterisk
Revision: 7033

U   trunk/ChangeLog
U   trunk/apps/app_meetme.c
U   trunk/configs/meetme.conf.sample

------------------------------------------------------------------------
r7033 | kpfleming | 2008-01-15 15:55:00 -0600 (Tue, 15 Jan 2008) | 2 lines

issues ASTERISK-3519 and ASTERISK-4152

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=7033