Based on: http://www.dragonflybsd.org/docs/nanosleep/ The improvements to kernel timekeeping appear to apply directly to FreeBSD 4.x based on my own quick testing. Fix: /usr/src/sys/kern/kern_clock.c 325c325 < / tick + 1; --- > / tick; 328c328 < + ((unsigned long)usec + (tick - 1)) / tick + 1; --- > + ((unsigned long)usec + (tick - 1)) / tick; /usr/src/sys/kern/kern_time.c 232c232 < int error; --- > int error, sleepticks; 241a242 > sleepticks = tvtohz(&tv); 243c244 < tvtohz(&tv)); --- > (sleepticks < 1)? 1 : sleepticks); 252c253,254 < *rmt = ts; --- > rmt->tv_sec = ts.tv_sec; > rmt->tv_nsec = ts.tv_nsec; 258c260,261 < ts3 = ts; --- > ts3.tv_sec = ts.tv_sec; > ts3.tv_nsec = ts.tv_nsec; 260a264,265 > if (tv.tv_sec == 0 && tv.tv_usec < tick) > return (0); /usr/src/sys/i386/isa/clock.c 113c113,114 < #define TIMER_DIV(x) ((timer_freq + (x) / 2) / (x)) --- > #define TIMER_DIV(x) (timer_freq / (x)) > #define FRAC_ADJUST(x) (timer_freq - ((timer freq / (x)) * (x))) 141a143 > u_int timer0_frac_freq; 204a207,209 > int phase; > int delta; > 215a221,236 > > phase = 1000000 / timer0_frac_freq; > delta = timecounter->tc_microtime.tv_usec % phase; > #if 1 > disable_intr(); > if (delta < (phase >> 1)) { > outb(TIMER_CNTR0, timer0_max_count & 0xff); > outb(TIMER_CNTR0, timer0_max_count >> 8); > } else { > outb(TIMER_CNTR0, (timer0_max_count +1) & 0xff); > outb(TIMER_CNTR0, (timer0_max_count +1) >> 8); > ++i8254_offset; > } > enable_intr(); > #endif > 236a258 > timer0_frac_freq = new_rate; 247,248c269,270 < if ((timer0_prescaler_count += timer0_max_count) < >= hardclock_max_count) { --- > timer0_prescaler_count += timer0_max_count; > if (timer0_prescaler_count >= hardclock_max_count) { 689a712 > timer0_frac_freq = intr_freq; 1221c1244 < count = timer0_max_count - ((high << 8) | low); --- > count = timer0_max_count + 1 - ((high << 8) | low); Note, the diffs above are just the code, proper credit must be given to Paul Herman and Matt Dillon, in addition the DFly patches listed at the source url contain comments indicating what the code is doing. Sample post patch test data: 0.000020 0.000226 0.000336 0.000284 0.000234 0.000187 0.000132 0.000082 0.000035 0.000242 0.000348 0.000295 0.000246 0.000192 0.000137 0.000090 0.000043 0.000252 0.000361 0.000307 A sawtooth is still present, but the accuracy is MUCH better. I suspect my hack application of the PLL function isn't correct or my P133 is slow enough that I'm observing some other latencies. I have observed occasional negative offsets, which according to the article are strictly forbidden by RFCs, so please check my work. I believe they were the result of my playing with a hz value too high for the machine to reasonably handle, and are not occuring with saner values for hz. How-To-Repeat: /* * Copyright (c) 2003 Paul Herman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE * DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ /* * $DragonFly: site/data/docs/nanosleep/wakeup_latency.c,v 1.1 2004/01/22 21:55:58 justin Exp $ */ #include <sys/time.h> #include <sys/resource.h> #include <time.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define ONE_SECOND 1000000L int count = 200; int debug = 0; int main (int ac, char **av) { long s; double diff; struct timeval tv1, tv2; if (ac > 1 && av[1]) count = strtol(av[1], NULL, 10); while(count--) { gettimeofday(&tv1, NULL); /* * Calculate the number of microseconds to sleep so we * can wakeup right when the second hand hits zero. * * The latency for the following two statements is minimal. * On a > 1.0GHz machine, the subtraction is done in a few * nanoseconds, and the syscall to usleep/nanosleep is usualy * less than 800 ns or 0.8 us. */ s = ONE_SECOND - tv1.tv_usec; usleep(s); gettimeofday(&tv2, NULL); diff = (double)(tv2.tv_usec - (tv1.tv_usec + s))/1e6; diff += (double)(tv2.tv_sec - tv1.tv_sec); if (debug) printf("(%ld.%.6ld) ", tv2.tv_sec, tv2.tv_usec); printf("%.6f\n", diff); } return 0; } 4.11 returns the following sample data: 0.016126 0.016146 0.016162 0.016181 0.016199 0.016218 0.016238 0.016259 0.016274 0.016292 0.016310 0.016342 0.016359 0.016366
Testing with wakeup_latency.c on a 5.3-Rel box shows the same symptom set. I've not yet tested the proposed fix on 5-x. I will try dupilcating this issue on 6-current as well to nail down the problem scope.
Joshua Coombs wrote: > The following reply was made to PR kern/79339; it has been noted by GNATS. > > From: "Joshua Coombs" <jcoombs@gwi.net> > To: <freebsd-gnats-submit@FreeBSD.org>, > "Joshua Coombs" <jcoombs@gwi.net> > Cc: > Subject: Re: kern/79339: [patch] Kernel time code sync with improvements from DragonFly > Date: Wed, 30 Mar 2005 09:33:59 -0500 > > Testing with wakeup_latency.c on a 5.3-Rel box shows the same symptom set. > I've not yet tested the proposed fix on 5-x. I will try dupilcating this > issue on 6-current as well to nail down the problem scope. Please also look at what's actually in DragonFly's CVS repository. Your PR is based on the original patch, while the code in DragonFly is more sophisticated. Namely, tvtohz() was split into two functions, tvtohz_low() and tvtohz_high(), which replace the original function depending on the context tvtohz() appears in. From this I conclude that the original patch is insufficient (likely to break parts of the kernel), and that integrating this improvement into FreeBSD might not be as easy and straightforward as it appears to be at first glance. On the other hand, with some effort it ought to be doable. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net
Joshua Coombs wrote: > The following reply was made to PR kern/79339; it has been noted by GNATS. > > From: "Joshua Coombs" <jcoombs@gwi.net> > To: <freebsd-gnats-submit@FreeBSD.org>, > "Joshua Coombs" <jcoombs@gwi.net> > Cc: > Subject: Re: kern/79339: [patch] Kernel time code sync with improvements from DragonFly > Date: Wed, 30 Mar 2005 09:33:59 -0500 > > Testing with wakeup_latency.c on a 5.3-Rel box shows the same symptom set. > I've not yet tested the proposed fix on 5-x. I will try dupilcating this > issue on 6-current as well to nail down the problem scope. Please also look at what's actually in DragonFly's CVS repository. Your PR is based on the original patch, while the code in DragonFly is more sophisticated. Namely, tvtohz() was split into two functions, tvtohz_low() and tvtohz_high(), which replace the original function depending on the context tvtohz() appears in. From this I conclude that the original patch is insufficient (likely to break parts of the kernel), and that integrating this improvement into FreeBSD might not be as easy and straightforward as it appears to be at first glance. On the other hand, with some effort it ought to be doable. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net _______________________________________________ freebsd-bugs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"
On Thu, 31 Mar 2005, Uwe Doering wrote: > Joshua Coombs wrote: >> Testing with wakeup_latency.c on a 5.3-Rel box shows the same symptom set. >> I've not yet tested the proposed fix on 5-x. I will try dupilcating this >> issue on 6-current as well to nail down the problem scope. > > Please also look at what's actually in DragonFly's CVS repository. Your PR > is based on the original patch, while the code in DragonFly is more > sophisticated. Namely, tvtohz() was split into two functions, tvtohz_low() > and tvtohz_high(), which replace the original function depending on the > context tvtohz() appears in. > > From this I conclude that the original patch is insufficient (likely to break > parts of the kernel), and that integrating this improvement into FreeBSD > might not be as easy and straightforward as it appears to be at first glance. > On the other hand, with some effort it ought to be doable. Indeed. Here is a discussion of some of the bugs in the patch: % >Fix: % /usr/src/sys/kern/kern_clock.c % 325c325 % < / tick + 1; % --- % > / tick; % 328c328 % < + ((unsigned long)usec + (tick - 1)) / tick + 1; % --- % > + ((unsigned long)usec + (tick - 1)) / tick; This breaks all callers of tvtohz() except the one that is changed in the patch to expect this API change. The comment before tvtohz() still says that tvtohz() adds 1. % /usr/src/sys/kern/kern_time.c % 232c232 % < int error; % --- % > int error, sleepticks; % 241a242 % > sleepticks = tvtohz(&tv); % 243c244 % < tvtohz(&tv)); % --- % > (sleepticks < 1)? 1 : sleepticks); This is more or less correct. 1 should be subtracted from tvtohz() in callers that do a careful comparision of the times before and after the sleep so that they can tell if the sleep time has completely expired. The function here (nanosleep1()) is not quite such a caller. It does a sloppy comparision of times, using getnanouptime() instead of nanouptime(). getnanouptime() has a resolution of 1/ticktock_hz, where ticktock_hz is appoximately min(hz, 1000) (normally just hz), so there is a possible error of 2/ticktock_hz in the comparision. I think all the errors go the same way, so the maximum error is 1/ticktock_hz. The extra tick added by tvtohz() accidentally compensates for this error. Synchronization effects may reduce (or increase?) the error. The first getnanouptime() is unsynchronized, but ones done just after timeout returns are synced with clock interrupts, so they give a fairly accurate time every hz/ticktock_hz hardclock interrupts. Anyway, if 1 is subtracted from tztvohz(), then naouptime() should be used to avoid these errors. There are many other callers like nanosleep1(): the ones for select(2), poll(2) and setitimer(2). These all depend on tvtohz() adding 1 to ensure that they sleep for the specified interval, and they all do sloppy comparisions like nanosleep1(), so they all need similar changes if you want timeouts to be synchronized with 1/HZ second boundaries as perfectly as possible. % 252c253,254 % < *rmt = ts; % --- % > rmt->tv_sec = ts.tv_sec; % > rmt->tv_nsec = ts.tv_nsec; % 258c260,261 % < ts3 = ts; % --- % > ts3.tv_sec = ts.tv_sec; % > ts3.tv_nsec = ts.tv_nsec; These changes just introduce style bugs. % 260a264,265 % > if (tv.tv_sec == 0 && tv.tv_usec < tick) % > return (0); This can't be right. We have just not-so-carefully checked whether the time has expired, and only get here when it hasn't. (tv.tv_sec == 0 && tv.tv_usec < tick) means that we would have preferred the sleep time to be less than 1 tick. We had to request a sleep of exactly 1 tick because less than 1 is impossible (this is with 1 subtracted from tvtohz()). Sleeping for exactly 1 tick is also impossible, so we have woken up after an interval of anywhere between 0+epsilon and (1-epsilon+latency) seconds. The interval may be significantly smaller or larger than than `tv' and we must go back to sleep if it is smaller. The above change breaks this. I think the problem that this change is supposed to fix is related to the tick frequency not being an exact multiple of 1/HZ. Also, to avoid sleeping longer than necessary, we should try to wake up 1 tick early and then decide whether to sleep another tick or 2 to finish. Note that although tvtohz() always rounds up, physical sleep intervals are always shorter than the specified timeout, so waking up 1 tick early is very common for unsynchonized sleeps. Thus if we subtract 1 from tvtohz(), we often wake up 1 tick early as a side effect, which is what we want, but there is a problem: suppose that that everything is in perfect sync, but the hardclock interrupt frequency is slightly less than 1/HZ seconds. Then we may wake up 5 usec or so early and decide to go back to sleep, giving a large error. Changes later in the patch are related to this. I think we shouldn't do anything special here except possibly return early if `tv' is very small. Going around the loop in nanosleep1() an extra time is a small pessimization. Using nanouptime() to get the decision of whether to loop right is a pessimization too, but it is relatively small. % /usr/src/sys/i386/isa/clock.c % 113c113,114 % < #define TIMER_DIV(x) ((timer_freq + (x) / 2) / (x)) % --- % > #define TIMER_DIV(x) (timer_freq / (x)) % > #define FRAC_ADJUST(x) (timer_freq - ((timer freq / (x)) * (x))) Reducing TIMER_DIV() unconditionally would be harmless under FreeBSD. It's rounding to nearest dates from there was little more than hardclock ticks for timekeeping. Now HZ and the hardclock interrupt frequency are almost unrelated to timekeeping. % 141a143 % > u_int timer0_frac_freq; % 204a207,209 % > int phase; % > int delta; % > % 215a221,236 % > % > phase = 1000000 / timer0_frac_freq; % > delta = timecounter->tc_microtime.tv_usec % phase; tc_microtime.tv_usec is not quite the right thing to use here. It is updated every tick or two so it might be up to date, but it has unnecessary jitter. microtime() would give a more accurate timestamp. I think microtime() and not microuptime() is the correct function to use here, since we want to sync with the real time. OTOH, nanosleep1() and friends use the uptime, so they must be looked at some more to determine the effects of using different time scales on syncing. I think the synchronization done here is honored by nanosleep1() despite the different scales, and sync is only lost when the clock is changed using settimeofday() (then everything gets out of sync). % > #if 1 % > disable_intr(); The clock should be read inside this critical section. % > if (delta < (phase >> 1)) { % > outb(TIMER_CNTR0, timer0_max_count & 0xff); % > outb(TIMER_CNTR0, timer0_max_count >> 8); % > } else { % > outb(TIMER_CNTR0, (timer0_max_count +1) & 0xff); % > outb(TIMER_CNTR0, (timer0_max_count +1) >> 8); % > ++i8254_offset; % > } I think i8254_offset needs to be reinitialized every time the maximum count is reprogrammed. This is not done in set_timer_freq(); however, most callers of set_timer_freq() initialize or update the i8254 timecounter immediately after, and testing shows that this reduces lost ticks to an acceptable value (usually, and hopefully always < 10). Correctly reprogramming the i8254 on every interrupt is harder. Losing even 1 tick per interrupt is too much, but I think the above can sometimes lose 100 (if clkintr() is delayed for that long, which can easily happen especially in RELENG_4 since clkintr() is not a fast interrupt handler there). See nearby code that calls i8254_get_timecount() inside a critical section for a way to reduce the error to at most 5 ticks. It takes about 5 ticks just to read the counter. This is still far too large to do on every clock tick. All of this only matters if the i8254 is used for timekeeping. % > enable_intr(); % > #endif % > % 236a258 % > timer0_frac_freq = new_rate; % 247,248c269,270 % < if ((timer0_prescaler_count += timer0_max_count) % < >= hardclock_max_count) { % --- % > timer0_prescaler_count += timer0_max_count; % > if (timer0_prescaler_count >= hardclock_max_count) { This change is just to style. % 689a712 % > timer0_frac_freq = intr_freq; The changes seem to be too simple to give a PLL. I didn't check the details for this. % 1221c1244 % < count = timer0_max_count - ((high << 8) | low); % --- % > count = timer0_max_count + 1 - ((high << 8) | low); Always adding 1 here seems to be wrong. Shouldn't you only add 1 if timer0_max_count isn't actually the max count, i.e., when the max count has been programmed to be 1 more than usual? All references to timer0_max_count are potentially wrong when timer0_max_count isn't actually the max count. You add 1 to i8254_offset in the above; this seems to be to adjust for 1 of the references being wrong, but it doesn't seem to adjust for `count' being 1 too large. % A sawtooth is still present, but the accuracy is MUCH better. I suspect my hack application of the PLL function isn't correct or my P133 is slow enough that I'm observing some other latencies. I have observed occasional negative offsets, which according to the article are strictly forbidden by RFCs, so please check my work. I believe they were the result of my playing with a hz value too high for the machine to reasonably handle, and are not occuring with saner values for hz. I only agree with the non-hardware changes (don't sleep for an extra tick in nanosleep1() and friends if this is easy to avoid). All that that perfect sync of real time with hardclock() clock gives is the possibility of waking up on precisely 1/HZ boundaries relative to real time (with whole seconds being boundaries). System activity lengthens sleeps by indeterminate amounts except on unloaded systems. The average error for a random sleep on an unloaded systems would still be 0.5/HZ (or 1.5/HZ without the nanosleep1() change). Bruce
State Changed From-To: open->suspended Mark as suspended. Followups to the original posting seem to indicate that the patches cannot be accepted as-is.
For bugs matching the following conditions: - Status == In Progress - Assignee == "bugs@FreeBSD.org" - Last Modified Year <= 2017 Do - Set Status to "Open"
Is this still an issue?
Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>