| Summary: | select(2) timer inaccurate, especially with -pthread | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Mikhail Teterin <mi> |
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | CC: | lawlopez |
| Priority: | Normal | ||
| Version: | 3.2-STABLE | ||
| Hardware: | Any | ||
| OS: | Any | ||
State Changed From-To: open->closed The kernel implementation of sleep() uses tvtohz() to calculate the number of ticks that tsleep() should sleep. tvtohz() adds one tick to allow for the possibility that the current tick is almost expired. select() guarantees that it will sleep at least as long as the timeout before returning an empty result (0). If the number of ticks passed to tsleep() were reduced by one, this guarantee would no longer be possible. This is not a bug; it is an artifact of correct implementation. If you need higher resolution, use nanosleep(). jasone@FreeBSD.ORG once stated: =Synopsis: select(2) timer inaccurate, especially with -pthread = =State-Changed-From-To: open->closed =State-Changed-By: jasone =State-Changed-When: Mon Dec 27 12:39:03 PST 1999 =State-Changed-Why: =The kernel implementation of sleep() uses tvtohz() to calculate the =number of ticks that tsleep() should sleep. tvtohz() adds one tick to =allow for the possibility that the current tick is almost expired. =select() guarantees that it will sleep at least as long as the timeout =before returning an empty result (0). This is NOT what the man page states: If timeout is a non-nil pointer, it specifies a maximum interval to wait for the selection to complete. =If the number of ticks passed to tsleep() were reduced by one, this =guarantee would no longer be possible. = =This is not a bug; it is an artifact of correct implementation. If you =need higher resolution, use nanosleep(). In particular, TCL's after(n) implementation relies on this fact... People responsible for the TCL port may need to take a note... Because in TCL 8.x's unix/tclUnixEvent.c there is a function Tcl_Sleep(ms) and the comments in it say: /* * The only trick here is that select appears to return early * under some conditions, so we have to check to make sure that * the right amount of time really has elapsed. If it's too * early, go back to sleep again. */ Please, consider reviewing this again without the implementation details obstructing the view. :-) -mi On Mon, 27 Dec 1999, Mikhail Teterin wrote: > jasone@FreeBSD.ORG once stated: > =select() guarantees that it will sleep at least as long as the timeout > =before returning an empty result (0). > > This is NOT what the man page states: > > If timeout is a non-nil pointer, it specifies a maximum > interval to wait for the selection to complete. This is a bug in the man page. It is so poorly worded that it is broken. "maximum" here means "minimum" in the case where no selected event occurs. Returning before the specified interval has elapsed would not be useful, so the system "waits" until it has elapsed and then returns to the application at a convenient later time. Some related man pages, e.g., FreeBSD's setitimer(2) and The Single Unix Spec's select(2) clarify this a little by saying that the requested timeout may be rounded up to an implementation-defined next-supported value. > > =If the number of ticks passed to tsleep() were reduced by one, this > =guarantee would no longer be possible. > = > =This is not a bug; it is an artifact of correct implementation. If you > =need higher resolution, use nanosleep(). There seems to be a problem with both the pthread and the kernel adding one tick. Bruce On Tue, Dec 28, 1999 at 03:18:25PM +1100, Bruce Evans wrote:
> There seems to be a problem with both the pthread and the kernel adding one
> tick.
I didn't look into this aspect of the report, but I'm guessing that the
doubling of the error is due to call conversion in libc_r causing select()
to actually be called twice.
Jason
Bruce Evans once stated: =On Mon, 27 Dec 1999, Mikhail Teterin wrote: = => jasone@FreeBSD.ORG once stated: = => =select() guarantees that it will sleep at least as long as the timeout => =before returning an empty result (0). => => This is NOT what the man page states: => => If timeout is a non-nil pointer, it specifies a maximum => interval to wait for the selection to complete. = =This is a bug in the man page. It is so poorly worded that it is =broken. The Solaris man-page says the same (man -s 3c select): If timeout is not a NULL pointer, it specifies a maximum interval to wait for the selection to complete. And Linux (man 2 select): timeout is an upper bound on the amount of time elapsed before select returns. Are both of them wrong too?.. I'm sure TCL developers saw more selects then me, and they only account for the possibility of an _earlier_ return from select. => =If the number of ticks passed to tsleep() were reduced by one, this => =guarantee would no longer be possible. => = => =This is not a bug; it is an artifact of correct implementation. If => =you need higher resolution, use nanosleep(). = =There seems to be a problem with both the pthread and the kernel adding =one tick. Yes... With all due respect, IMHO, this PR should be re-opened. -mi On Mon, 27 Dec 1999, Mikhail Teterin wrote:
> Bruce Evans once stated:
>
> =On Mon, 27 Dec 1999, Mikhail Teterin wrote:
> => This is NOT what the man page states:
> =>
> => If timeout is a non-nil pointer, it specifies a maximum
> => interval to wait for the selection to complete.
> =
> =This is a bug in the man page. It is so poorly worded that it is
> =broken.
>
> The Solaris man-page says the same (man -s 3c select):
>
> If timeout is not a NULL pointer, it specifies a maximum
> interval to wait for the selection to complete.
>
> And Linux (man 2 select):
>
> timeout is an upper bound on the amount of time elapsed
> before select returns.
>
> Are both of them wrong too?.. I'm sure TCL developers saw more selects
Yes. The Linux one is completely broken, since it appaers to guarantee
a maximum time before the _return_. Only very fast hard realtime systems
can guarantee that anything happens in an interval of 1us.
Bruce
FreeBSD: => => If timeout is a non-nil pointer, it specifies a maximum => => interval to wait for the selection to complete. Solaris: => If timeout is not a NULL pointer, it specifies a maximum => interval to wait for the selection to complete. Linux: => timeout is an upper bound on the amount of time elapsed => before select returns. Irix (http://reality.sgi.com/cgi-bin/getman/?select): If timeout is a non-zero pointer, it specifies a maximum interval to wait for the selection to complete. BSDI (http://www.bsdi.com/bsdi-man?proto=1.1&apropos=0&msection=2&query=select): If timeout is a non-NULL pointer, it specifies a maximum interval to wait for the selection to complete. Perl 5.004 (http://man.cs.wisc.edu/cgi-bin/man?OS=alpha&command=select): TIMEOUT is the maximum amount of time to wait before returning an empty list. It seems, all of the documentation (including the FreeBSD's) agrees. The FreeBSD's implementation disagrees... -mi I did not see a single reply, so I'm replying to myself. Shouldn't the PR be resurrected? -mi mi once stated: =From mi Tue Dec 28 09:55:47 1999 =Subject: Re: kern/13644 =In-Reply-To: <Pine.BSF.4.10.9912281659210.9558-100000@alphplex.bde.org> from = Bruce Evans at "Dec 28, 1999 05:03:47 pm" =To: Bruce Evans <bde@zeta.org.au> =Date: Tue, 28 Dec 1999 09:55:47 -0500 (EST) =CC: Mikhail Teterin <mi@kot.ne.mediaone.net>, jasone@FreeBSD.ORG, = freebsd-bugs@FreeBSD.ORG, FreeBSD-gnats-submit@FreeBSD.ORG, = lawlopez@cisco.com, jseger@FreeBSD.ORG =X-Mailer: ELM [version 2.4ME+ PL60 (25)] =FreeBSD: ==> => If timeout is a non-nil pointer, it specifies a maximum ==> => interval to wait for the selection to complete. = =Solaris: ==> If timeout is not a NULL pointer, it specifies a maximum ==> interval to wait for the selection to complete. = =Linux: ==> timeout is an upper bound on the amount of time elapsed ==> before select returns. = =Irix (http://reality.sgi.com/cgi-bin/getman/?select): = If timeout is a non-zero pointer, it specifies a maximum = interval to wait for the selection to complete. = =BSDI (http://www.bsdi.com/bsdi-man?proto=1.1&apropos=0&msection=2&query=select): = If timeout is a non-NULL pointer, it specifies a maximum = interval to wait for the selection to complete. = =Perl 5.004 (http://man.cs.wisc.edu/cgi-bin/man?OS=alpha&command=select): = TIMEOUT is the maximum amount of time to wait before = returning an empty list. = =It seems, all of the documentation (including the FreeBSD's) agrees. The =FreeBSD's implementation disagrees... = = -mi On 2000-Jan-24 03:37:19 +1100, Mikhail Teterin <mi@kot.ne.mediaone.net> wrote: >=> =FreeBSD: >=> ==> => If timeout is a non-nil pointer, it specifies a maximum >=> ==> => interval to wait for the selection to complete. ... >It appears, that you, as well as other developers, speak from the >implementation point of view. I only look at the man-page. The man page >says, the time out is the UPPER limit. Note that the man page talks about waiting for the _selection_ to complete. It does not refer to returning from the select(2) call. And the behaviour is exactly as documented: when the specified interval is complete, the process will return to the run queue for normal scheduling (if it hasn't previously found a ready FD). Unix is not a real-time OS, so once a process is in the run queue, an arbitrary period can expire before the process is actually run. The only cases where a select(2) (or poll(2)) system call will return before a specified period are: 1) A signal was received 2) One of the specified file descriptors became ready. >sorts of other man-pages from all sorts of other vendors, who all say >(almost) the same thing: > > that the timeout is indeed the UPPER limit, and not the LOWER. Please provide a test program and results from these other vendors demonstrating that their select() will return before the specified time limit in the absence of any other event. It's probably worthwhile adding a comment to select(2) similar to that in sleep(3), noting that "system activity may lengthen the sleep by an indeterminate amount." Peter On 2000-Jan-25 04:02:28 +1100, Mikhail Teterin <mi@aldan.algebra.com> wrote: > It does not explain the consistent 9-10 msecs excess. _That_ might be a bug. The select(2) delay is converted from a struct timeval into ticks by tvtohz(). This routine rounds up to the next tick and then adds 1 to the result (in both -stable and -current). The `+1' would seem to be incorrect - even if you want to allow for the current tick, a more reasonable estimate would be 0.5, not 1. Based on a quick check, both Solaris 2.5 and Digital Unix (aka Tru64) 4.0D don't include this one tick over-estimate. I believe one (only) of the following two patches should be applied: Index: kern_clock.c =================================================================== RCS file: /home/CVSROOT/src/sys/kern/kern_clock.c,v retrieving revision 1.104 diff -u -r1.104 kern_clock.c --- kern_clock.c 1999/12/08 10:02:12 1.104 +++ kern_clock.c 2000/01/24 21:51:27 @@ -271,9 +271,8 @@ * If the number of usecs in the whole seconds part of the time * difference fits in a long, then the total number of usecs will * fit in an unsigned long. Compute the total and convert it to - * ticks, rounding up and adding 1 to allow for the current tick - * to expire. Rounding also depends on unsigned long arithmetic - * to avoid overflow. + * ticks, rounding up. Rounding also depends on unsigned long + * arithmetic to avoid overflow. * * Otherwise, if the number of ticks in the whole seconds part of * the time difference fits in a long, then convert the parts to @@ -305,10 +304,10 @@ ticks = 1; } else if (sec <= LONG_MAX / 1000000) ticks = (sec * 1000000 + (unsigned long)usec + (tick - 1)) - / tick + 1; + / tick; else if (sec <= LONG_MAX / hz) ticks = sec * hz - + ((unsigned long)usec + (tick - 1)) / tick + 1; + + ((unsigned long)usec + (tick - 1)) / tick; else ticks = LONG_MAX; if (ticks > INT_MAX) Index: sys_generic.c =================================================================== RCS file: /home/CVSROOT/src/sys/kern/sys_generic.c,v retrieving revision 1.54 diff -u -r1.54 sys_generic.c --- sys_generic.c 2000/01/14 02:53:25 1.54 +++ sys_generic.c 2000/01/24 21:50:26 @@ -687,7 +687,10 @@ ttv = atv; timevalsub(&ttv, &rtv); timo = ttv.tv_sec > 24 * 60 * 60 ? - 24 * 60 * 60 * hz : tvtohz(&ttv); + 24 * 60 * 60 * hz + 1 : tvtohz(&ttv); + /* compensate for over-estimate in tvtohz() */ + if (timo > 1) + timo-- } s = splhigh(); if ((p->p_flag & P_SELECT) == 0 || nselcoll != ncoll) { @@ -818,7 +821,10 @@ ttv = atv; timevalsub(&ttv, &rtv); timo = ttv.tv_sec > 24 * 60 * 60 ? - 24 * 60 * 60 * hz : tvtohz(&ttv); + 24 * 60 * 60 * hz + 1 : tvtohz(&ttv); + /* compensate for over-estimate in tvtohz() */ + if (timo > 1) + timo-- } s = splhigh(); if ((p->p_flag & P_SELECT) == 0 || nselcoll != ncoll) { Peter |
The select's man-page says: If timeout is a non-nil pointer, it specifies a maximum interval to wait for the selection to complete. If timeout is a nil pointer, the select blocks indefinitely. To effect a poll, the timeout argument should be non-nil, pointing to a zero-valued timeval structure. In fact, the select on an empty list of file descriptors -- as done in TCL's Tcl_Sleep, for example, will always add about 10 msec and another 10 if compiled with -pthread. Fix: The numbers appear to be constant, so one can specify a smaller timeout to actually get it right. But the values will have to be different with and without the -pthread flag. How-To-Repeat: Consider a program: #include <stdio.h> #include <sys/types.h> #include <sys/time.h> #include <unistd.h> struct timeval t1, t2, timeout; main() { timeout.tv_sec = 0; for(timeout.tv_usec = 2000; timeout.tv_usec < 40000; timeout.tv_usec += 1000) { if(gettimeofday(&t1, NULL)) perror("gettimeofday"); select(0, NULL, NULL, NULL, &timeout); if(gettimeofday(&t2, NULL)) perror("gettimeofday"); printf("Slept for %d instead of %d microseconds\n", t2.tv_usec - t1.tv_usec + (t2.tv_sec - t1.tv_sec)*1000*1000, timeout.tv_usec); } if(gettimeofday(&t1, NULL)) perror("gettimeofday"); if(gettimeofday(&t2, NULL)) perror("gettimeofday"); printf("The gettimeofday overhead is %d usec\n", t2.tv_usec - t1.tv_usec); } The output of this program compiled as usual is: Slept for 11248 instead of 2000 microseconds Slept for 19507 instead of 3000 microseconds Slept for 20103 instead of 4000 microseconds Slept for 19939 instead of 5000 microseconds Slept for 19910 instead of 6000 microseconds Slept for 19984 instead of 7000 microseconds Slept for 19986 instead of 8000 microseconds Slept for 20058 instead of 9000 microseconds Slept for 19980 instead of 10000 microseconds Slept for 29905 instead of 11000 microseconds Slept for 30010 instead of 12000 microseconds Slept for 29970 instead of 13000 microseconds Slept for 29988 instead of 14000 microseconds Slept for 30045 instead of 15000 microseconds Slept for 30011 instead of 16000 microseconds Slept for 29887 instead of 17000 microseconds Slept for 29991 instead of 18000 microseconds Slept for 29970 instead of 19000 microseconds Slept for 30067 instead of 20000 microseconds Slept for 39947 instead of 21000 microseconds Slept for 40019 instead of 22000 microseconds Slept for 39910 instead of 23000 microseconds Slept for 39981 instead of 24000 microseconds Slept for 40055 instead of 25000 microseconds Slept for 39914 instead of 26000 microseconds Slept for 40057 instead of 27000 microseconds Slept for 39910 instead of 28000 microseconds Slept for 39979 instead of 29000 microseconds Slept for 40054 instead of 30000 microseconds Slept for 49973 instead of 31000 microseconds Slept for 49993 instead of 32000 microseconds Slept for 50002 instead of 33000 microseconds Slept for 50000 instead of 34000 microseconds Slept for 49972 instead of 35000 microseconds Slept for 49994 instead of 36000 microseconds Slept for 50069 instead of 37000 microseconds Slept for 49904 instead of 38000 microseconds Slept for 49966 instead of 39000 microseconds The gettimeofday overhead is 5 usec and with the ``-pthread'' flag: Slept for 21687 instead of 2000 microseconds Slept for 29719 instead of 3000 microseconds Slept for 30010 instead of 4000 microseconds Slept for 29971 instead of 5000 microseconds Slept for 29986 instead of 6000 microseconds Slept for 30050 instead of 7000 microseconds Slept for 29987 instead of 8000 microseconds Slept for 29911 instead of 9000 microseconds Slept for 29974 instead of 10000 microseconds Slept for 29990 instead of 11000 microseconds Slept for 40072 instead of 12000 microseconds Slept for 40015 instead of 13000 microseconds Slept for 39904 instead of 14000 microseconds Slept for 39974 instead of 15000 microseconds Slept for 39987 instead of 16000 microseconds Slept for 39989 instead of 17000 microseconds Slept for 40187 instead of 18000 microseconds Slept for 39783 instead of 19000 microseconds Slept for 39970 instead of 20000 microseconds Slept for 39987 instead of 21000 microseconds Slept for 49996 instead of 22000 microseconds Slept for 49967 instead of 23000 microseconds Slept for 49996 instead of 24000 microseconds Slept for 49972 instead of 25000 microseconds Slept for 50143 instead of 26000 microseconds Slept for 49825 instead of 27000 microseconds Slept for 49994 instead of 28000 microseconds Slept for 49973 instead of 29000 microseconds Slept for 49996 instead of 30000 microseconds Slept for 49970 instead of 31000 microseconds Slept for 59979 instead of 32000 microseconds Slept for 59991 instead of 33000 microseconds Slept for 60000 instead of 34000 microseconds Slept for 60000 instead of 35000 microseconds Slept for 59979 instead of 36000 microseconds Slept for 59983 instead of 37000 microseconds Slept for 59994 instead of 38000 microseconds Slept for 60119 instead of 39000 microseconds The gettimeofday overhead is 5 usec Using usleep instead of select gives pretty similar numbers :(