select(),nanosleep(),poll() sleep one tick longer, _except_ when previous similar system call was interrupted by a signal or by i/o conditions on the polled file descriptors. I find this rather buggy and inconsequent. It badly breaks any code which tries to hook both timers and input polling on a select() loop and defeats benchmark programs which test for the accuracy of timeouts. This is due to tvtohz (sys/kern/kern_clock.c) which always adds one to its result to allow for the current tick to expire. Instead of adding one it should be better testing for a non-zero result. I modified like this my 4.4 kernel long time ago and have seen so far no trouble at all.Same thing more recently with 5.0. Fix: patch to kern_clock.c and other places which assume that tvtohz adds one follows. notes: tvtohz has been renamed to hzto in NetBSD but is the same. of course one can write ticks = (sec * hz) + (((unsigned long)usec + (tick - 1)) / tick) ?: 1; but this is a GCC extension. -------------------------------8x----------------------------------- How-To-Repeat: This little program prints the timeout accuracy of the select() system call as "[+-] sec usec".It catches SIGQUIT and polls for console input. It use to give: unpatched (regular) kernel: + 0 s 9938 us --input + 0 s 9959 us + 0 s 9972 us + 0 s 9930 us ^\ --interrupt + 0 s 9954 us + 0 s 9954 us patched kernel: - 0 s 44 us ^\ --interrupt - 0 s 3 us - 0 s 100 us --input - 0 s 44 us - 0 s 47 us -------------------------------8x----------------------------------- #include <sys/types.h> #include <sys/time.h> #include <sys/sysctl.h> #include <unistd.h> #include <errno.h> #include <signal.h> #include <time.h> #include <err.h> #include <stdio.h> fd_set rdfs; int ms; void sig(int unused) {} int main(int argc,char **argv) { int k,mib[2]; struct timeval tv,savtv,otv; struct clockinfo clock; char buf[256]; signal(SIGQUIT,sig); if (argc > 1) ms = atoi(argv[1]); if (!ms) ms = 500; savtv.tv_usec = ms % 1000 * 1000; savtv.tv_sec = ms / 1000; mib[0] = CTL_KERN; mib[1] = KERN_CLOCKRATE; k = sizeof clock; if (sysctl(mib,2,&clock,&k,NULL,0)) err(1,"sysctl"); savtv.tv_usec -= savtv.tv_usec % clock.tick; redo: tv = savtv; gettimeofday(&otv,NULL); /* a quick fix is to substract one tick from tv here but this will break things when compiled on other OS ( linux for instance ) */ for(;;) { FD_SET(0,&rdfs); k = select(1,&rdfs,NULL,NULL,&tv); gettimeofday(&tv,NULL); timersub(&tv,&otv,&tv); switch(k) { case 0: if (timercmp(&tv,&savtv,<)) { timersub(&savtv,&tv,&tv);k = '-'; } else { timersub(&tv,&savtv,&tv);k = '+'; } printf("%c%4d s %10d us\n",k,tv.tv_sec,tv.tv_usec); goto redo; case -1: if (errno != EINTR) err(1,"select"); printf("\t--interrupt\n"); break; default: read(0,buf,256); printf("\t--input\n"); break; } timersub(&savtv,&tv,&tv); if (tv.tv_sec < 0) timerclear(&tv); } } -------------------------------8x-----------------------------------
On Fri, 6 Dec 2002, Enache Adrian wrote: > >Description: > select(),nanosleep(),poll() sleep one tick longer, _except_ > when previous similar system call was interrupted by a signal > or by i/o conditions on the polled file descriptors. > > I find this rather buggy and inconsequent. > It badly breaks any code which tries to hook both timers > and input polling on a select() loop and defeats benchmark > programs which test for the accuracy of timeouts. > > This is due to tvtohz (sys/kern/kern_clock.c) which always > adds one to its result to allow for the current tick to > expire. Instead of adding one it should be better testing > for a non-zero result. tvtohz() actually adds 1 to ensure that the sleep time is at least the specified time. This gives a minimum sleep time of 0.0 ticks longer than the specified sleep time (as required by standards) and a maximum sleep time of about 1.0 tick longer than the specified sleep time, with an average sleep time of about 0.5 ticks longer. Not adding 1 would give a {min,max,avg} sleep time of about {-1.0,0.0,-0.5} ticks longer than specified; i.e., it would be shorter than specified (broken) in most cases. Individual syscalls could handle this by sleeping for the shorter time and checking whether the timeout has actually expired when they wake up, but most don't, and this would be a pessimization in most cases. Of the above syscalls, only nanosleep() checks that the timeout has actually expired. It does this mainly for the technical reason that very large timeouts (ones greater than INT_MAX ticks) are required to work and may even be useful (especially if HZ is large). Checking is a pessimization in at least the following cases, because we will just have to wait again after wasting time waking up to check: - most short timeouts that are a multiple of `tick'. - half or the cases for random timeouts of 0.5 ticks (since the current tick will expire before the timeout half the time), and proportionally for all random timeouts of 1 tick or shorter. > >How-To-Repeat: > This little program prints the timeout accuracy of the > select() system call as "[+-] sec usec".It catches SIGQUIT > and polls for console input. > It use to give: > > unpatched (regular) kernel: > + 0 s 9938 us > > --input > + 0 s 9959 us > + 0 s 9972 us > + 0 s 9930 us > ^\ --interrupt > + 0 s 9954 us > + 0 s 9954 us This is the correct behaviour. We are mostly in sync with the clock interrupt, and the default timeout is a multiple of clock.tick, so we must sleep for an extra tick (less epsilon) for the timeout to expire. select() would have to go back to sleep for an extra tick if tvtohz() didn't add 1. > patched kernel: > - 0 s 44 us > ^\ --interrupt > - 0 s 3 us > - 0 s 100 us > > --input > - 0 s 44 us > - 0 s 47 us This is incorrect. The times are the values ((actual timeout) - (specified timeout)). These should be >= 0. The errors are small here because the timeout is a multiple of clock.tick and we are mostly in sync with clock interrupts. In general the error from not adding 1 is up to 1 tick (see above). > -------------------------------8x----------------------------------- > >Fix: > patch to kern_clock.c and other places which assume that > tvtohz adds one follows. This mainly undoes some FreeBSD fixes. > notes: > tvtohz has been renamed to hzto in NetBSD but is the same. Actually, FreeBSD renamed hzto to tvtohz. > redo: > tv = savtv; gettimeofday(&otv,NULL); > /* a quick fix is to substract one tick from tv here > but this will break things when compiled on other > OS ( linux for instance ) */ > for(;;) { > FD_SET(0,&rdfs); > k = select(1,&rdfs,NULL,NULL,&tv); > gettimeofday(&tv,NULL); I thought that Linux gets timeouts right -- at least sleep() used to work except in very early versions of Linux-0.x. Perhaps it uses clock interrupts to get to get close to the timeout and then udelay() to get closer if the residual timeout is small. In general, the residual could be up to `tick' usec, but in cases that are in sync with the clock like your test program, the residual would be small (about 44 usec) and udelay() could reasonably handle it. Bruce
On Sun, 8 Dec 2002, Enache Adrian wrote: > On Sat, Dec 07, 2002 at 11:57:40PM +1100, Bruce Evans wrote: > > tvtohz() actually adds 1 to ensure that the sleep time is at least the > > specified time. This gives a minimum sleep time of 0.0 ticks longer > > than the specified sleep time (as required by standards) and a maximum > > sleep time of about 1.0 tick longer than the specified sleep time, > > with an average sleep time of about 0.5 ticks longer. Not adding 1 > > would give a {min,max,avg} sleep time of about {-1.0,0.0,-0.5} ticks > > longer than specified; i.e., it would be shorter than specified (broken) > > in most cases. > ... > > and we are mostly in sync with clock interrupts. In general the error > > from not adding 1 is up to 1 tick (see above). > The average sleep time seems to be with ~ 0.995 longer on an idle I think you mean 0.009950. > machine - much longer on a loaded one where ticks are lost. Hopefully ticks won't be really lost (in the kernel), although they can be and are lost due to bugs. Ticks man be lost as far as applications can see of course. > For the error to be down to -1 tick with an average of -0.5 ticks > (when not adding 1) the tick value must be very small - this doesn't > seem to happen with the standard 100hz. Under high load the error > tends to be positive. It does (would) happen for random timeouts. If you look at the clock in a loop then you normally see timeouts of ((1 tick) - epsilon) (when not adding 1) or ((2 ticks) - epsilon) (when adding 1) because the first one syncs with the clock. E.g., nanotime({1 nsec}) normally returns at a time epsilon after a clock tick occurs, so a subsequent nanotime({1 nsec}) will wait a full clock tick or two (less epsilon) (if nanotome() is implemented using clock ticks even for short intervals). > When the user has asked for something not multiple of tick the > value will be correctly rounded up by tvtohz() and the error will > be positive too. No. E.g., if the interval is 9999 usec then it will be rounded up to 1 tick (10000 usec). The next clock tick may occur after 1 usec, so if 1 tick is not added then the timeout will return after 1 usec (plus epsilon) instead of after the requested 9999 usec. > > I thought that Linux gets timeouts right -- at least sleep() used to > > work except in very early versions of Linux-0.x. Perhaps it uses > ... > Only nanosleep() works like in FreeBSD ( one tick longer by just > adding one - this is why many recommend to use select() instead > of usleep() ...) This is a bug in Linux :-). > Does it exist a standard specification for this case ? > A quick browsing through susv3 shows that only nanosleep() > is required to do so: > functions/select.html: > "If the timeout parameter is not a null pointer, > it specifies a maximum interval to wait for the > selection to complete." > functions/nanosleep.html > "But, except for the case of being interrupted by > a signal, the suspension time shall not be less than > the time specified by rqtp, as measured by the > system clock CLOCK_REALTIME" select() and its timeouts are now specified in POSIX, not quite as clearly as this. From draft7 of POSIX.1-200x: %%% 31187 limitations on the granularity of timeout intervals. If the requested timeout interval requires a 31188 finer granularity than the implementation supports, the actual timeout interval shall be rounded 31189 up to the next supported value. %%% Bruce
On Sat, Dec 07, 2002 at 11:57:40PM +1100, Bruce Evans wrote: > tvtohz() actually adds 1 to ensure that the sleep time is at least the > specified time. This gives a minimum sleep time of 0.0 ticks longer > than the specified sleep time (as required by standards) and a maximum > sleep time of about 1.0 tick longer than the specified sleep time, > with an average sleep time of about 0.5 ticks longer. Not adding 1 > would give a {min,max,avg} sleep time of about {-1.0,0.0,-0.5} ticks > longer than specified; i.e., it would be shorter than specified (broken) > in most cases. ... > and we are mostly in sync with clock interrupts. In general the error > from not adding 1 is up to 1 tick (see above). The average sleep time seems to be with ~ 0.995 longer on an idle machine - much longer on a loaded one where ticks are lost. For the error to be down to -1 tick with an average of -0.5 ticks (when not adding 1) the tick value must be very small - this doesn't seem to happen with the standard 100hz. Under high load the error tends to be positive. When the user has asked for something not multiple of tick the value will be correctly rounded up by tvtohz() and the error will be positive too. > I thought that Linux gets timeouts right -- at least sleep() used to > work except in very early versions of Linux-0.x. Perhaps it uses > clock interrupts to get to get close to the timeout and then > udelay() to get closer if the residual timeout is small. In general, > the residual could be up to `tick' usec, but in cases that are in sync > with the clock like your test program, the residual would be small > (about 44 usec) and udelay() could reasonably handle it. Oh no. Linux doesn't seem to call udelay() to handle the residual timeout. Only nanosleep() works like in FreeBSD ( one tick longer by just adding one - this is why many recommend to use select() instead of usleep() ...) from kernel/timer.c expire = timespec_to_jiffies(&t) + (t.tv_sec || t.tv_nsec); The silly test program, when run - with some minor modifications ----------------------- +#include <sys/param.h> - struct clockinfo clock; - mib[0] = CTL_KERN; mib[1] = KERN_CLOCKRATE; - k = sizeof clock; - if (sysctl(mib,2,&clock,&k,NULL,0)) err(1,"sysctl"); - savtv.tv_usec -= savtv.tv_usec % clock.tick; + savtv.tv_usec -= savtv.tv_usec % (1000000 / HZ); _______________________ on linux 2.4.18 says: - 0 s 171 us - 0 s 181 us - 0 s 184 us --input - 0 s 169 us - 0 s 180 us According to you it is broken to sleep less than specified. Does it exist a standard specification for this case ? A quick browsing through susv3 shows that only nanosleep() is required to do so: functions/select.html: "If the timeout parameter is not a null pointer, it specifies a maximum interval to wait for the selection to complete." functions/nanosleep.html "But, except for the case of being interrupted by a signal, the suspension time shall not be less than the time specified by rqtp, as measured by the system clock CLOCK_REALTIME" but I'll keep reading on ... thank you & best regards Adi
State Changed From-To: open->feedback Please comment on whether or not this is in fact a bug and then throw back in to the pool. Thanks.
Responsible Changed From-To: freebsd-bugs->bde Please comment on whether or not this is in fact a bug and then throw back in to the pool. Thanks.
State Changed From-To: feedback->suspended This constitutes a RFE and no one is available to work on it right now.
Responsible Changed From-To: bde->freebsd-bugs This constitutes a RFE and no one is available to work on it right now.
For bugs matching the following conditions: - Status == In Progress - Assignee == "bugs@FreeBSD.org" - Last Modified Year <= 2017 Do - Set Status to "Open"