Bug 76972 - [kernel] [patch] 64-bit integer overflow computing user cpu time in calcru() in kern_resource.c
Summary: [kernel] [patch] 64-bit integer overflow computing user cpu time in calcru() ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Conrad Meyer
URL:
Keywords: patch, patch-ready
: 17842 78957 227689 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-02-01 17:30 UTC by cal
Modified: 2019-02-14 19:08 UTC (History)
6 users (show)

See Also:


Attachments
kern_resource.c.patch.txt (1.41 KB, text/plain; charset=UTF-8)
2012-08-02 14:42 UTC, Andrey Zonov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description cal 2005-02-01 17:30:17 UTC
the line numbers below are for 5.1RELEASE - for other releases, only the line
numbers are different - the code is the same, so the problem is the same (i've
observed it on 5.1R, 5.2.1R, and 5.3R systems, seen it in the code on 4.10R
systems, and i have been told that the same problem occurs on all FreeBSD
systems since BSD4.4-Lite (!))

in FreeBSD 5.1-RELEASE (and 5.2.1R, and 5.3R, and 4.10R), in file
/usr/src/sys/kern/kern_resource.c, lines 657-750 define a function calcru()
("calculate resource usage"), and there is a bug in that function - line 713
in that file is

	uu = (tu * ut) / tt;

which is trying to compute user cpu time (uu) as a proportion (since ut<=tt)
of total cpu time (tu), based on the fraction of statclock ticks (ut / tt)
that occur in user time - the problem is that for my application, the values
of tt and ut (which are counts of statclock ticks) are very large and the
value of the intermediate product overflows, even with 64-bit arithmetic,
leading to incorrect values (i reported the symptoms to freebsd-hackers and a
very helpful description and localization of the problem to calcru() was
provided by peter jeremy, who also wrote that it is a very old problem, but
only partly known because it is so rarely encountered)

Fix: 

<mostly>

it turns out that this problem can be (partly) fixed easily, by simply
re-ordering the computations (this idea is due to Doug Ambrisko), replacing
lines 711-714 (sorry i don't speak "patch" very well yet):

	/* Subdivide tu. */
	uu = (tu * ut) / tt;
	su = (tu * st) / tt;
	iu = tu - uu - su;

of the file with the following lines (tt is defined to be ut+st+it earlier in
the file, so we want to insist that tu = uu+su+iu):

	/* Subdivide tu. */
	su = (tu * st) / tt;
	iu = (tu * it) / tt;
	uu = tu - (su + iu);

since st and it are expected to be much smaller than ut for most long
programs, the overflow likelihood is greatly reduced - but i think that we can
actually do much better - see the full mathematical analysis at

	http://www.cs.umd.edu/~cal/math/overflow-analysis.txt

which also explains the value of the constant TOOBIG (yup, there is a proof
there that shows how well the suggested fix *should* work 8-))

this problem can be fixed (more completely) by replacing those same lines with
the following lines:

	/* Subdivide tu. be careful of overflow */
	/* we know 0 <= st, it, ut <= tt and 1 <= tt = st+it+ut */
#define TOOBIG 48592008 /* 2^35 * sqrt(2) / 10^3 */
	if (tt >= TOOBIG && tu >= tt)
		{
		u_int64_t q, r;
		q = tu / tt;
		r = tu % tt;
		su = (q * st) + (r * st) / tt;
		iu = (q * it) + (r * it) / tt;
		}
	else	{
		su = (tu * st) / tt;
		iu = (tu * it) / tt;
		}
	uu = tu - (su + iu);

the constant TOOBIG is designed so the calcru() function does not use the
complicated form of the equations when the numbers are guaranteed to be small
enough not to overflow in the intermediate products

this change does not solve all the arithmetic problems (especially when either
st or it is very large), but this much suffices for my needs -

my assumptions for the analysis are as follows (if any of them is incorrect,
please tell me):
	tu is total time in microseconds
	tt is total 128hz ticks (tt = ut + st + it)
	st is system 128hz ticks
	it is interrupt 128hz ticks
	ut is user 128hz ticks
i assume that all of these counters are non-negative and increasing throughout
the life of the process -
i assume therefore that (i use "~" to mean "very roughly near")
	tu ~ tt * 10^6/128
strictly because of the clock rates (machines with other kinds of clock rate
ratios will need a different analysis, but the same kind of idea should work
there, too, possibly with different constant values for TOOBIG) -

in my case ut ~ tt (i have utilizations for this particular program
usually well above 90%, often 99%, for weeks at a time), so we see
overflow in the old computation when
	tu * ut >= 2^64,
which happens when tt is about
		~ 3.8*10^5 seconds
		~ 105.45 hours
this is the phenomenon i observed

visible effect of making the change

for long programs (that is, programs that take more than 388 days or so of
either system or interrupt cpu time), this change may not help, and will
certainly not help after long enough, as shown in the analysis

for medium length programs (that is, programs that take more than 105 hours or
so of either system or interrupt cpu time, but less than 388 days or so each
of system and interrupt cpu time), this change adds both tests from the if
statement to every call to calcru(), and does in fact fix the problem

for short programs (that is, programs that take less than 105 hours or so each
of system and interrupt cpu time), it adds the first test in the if statement
to every call to calcru(), and does not otherwise affect the correctness of
the computation

<philosophy> instrumentation is costly, and correct instrumentation is even
more costly, but it's worth it, every time, to get the right answer
</philosophy> (unless, of course, you don't want or need it 8-))

both of these changes are being tested (05jan26), and both of them pass the
first error threshold - the experimental results and analysis can be found on
the web

	http://www.cs.umd.edu/~cal/math/overflow-analysis.txt

more later,
cal

Dr. Christopher Landauer
Aerospace Integration Science Center
The Aerospace Corporation, Mail Stop M6/214
P.O.Box 92957
Los Angeles, California 90009-2957, USA
cal@aero.org
+1 (310) 336-1361
How-To-Repeat: 
use time in csh or /usr/bin/time or getrusage() to time any program that takes
more than a week or so of user or system cpu time to run (they all exhibit the
same error of "losing" the user or system time, whichever is larger, and
therefore incorrectly reporting the utilization percentage)
Comment 1 Bruce Evans 2005-02-02 09:02:23 UTC
On Tue, 1 Feb 2005, Chris Landauer wrote:

> >Description:
> in FreeBSD 5.1-RELEASE (and 5.2.1R, and 5.3R, and 4.10R), in file
> /usr/src/sys/kern/kern_resource.c, lines 657-750 define a function calcru()
> ("calculate resource usage"), and there is a bug in that function - line 713
> in that file is
>
> 	uu = (tu * ut) / tt;
>
> which is trying to compute user cpu time (uu) as a proportion (since ut<=tt)
> of total cpu time (tu), based on the fraction of statclock ticks (ut / tt)
> that occur in user time - the problem is that for my application, the values
> of tt and ut (which are counts of statclock ticks) are very large and the
> value of the intermediate product overflows, even with 64-bit arithmetic,
> leading to incorrect values (i reported the symptoms to freebsd-hackers and a
> very helpful description and localization of the problem to calcru() was
> provided by peter jeremy, who also wrote that it is a very old problem, but
> only partly known because it is so rarely encountered)

PR 17842 is about this too.  It has the 105 hours magic number but otherwise
less detail, so I superseded it by this PR.

> >Fix:
> <mostly>
> ...
> this problem can be fixed (more completely) by replacing those same lines with
> the following lines:
>
> 	/* Subdivide tu. be careful of overflow */
> 	/* we know 0 <= st, it, ut <= tt and 1 <= tt = st+it+ut */
> #define TOOBIG 48592008 /* 2^35 * sqrt(2) / 10^3 */
> 	if (tt >= TOOBIG && tu >= tt)
> 		{
> 		u_int64_t q, r;
> 		q = tu / tt;
> 		r = tu % tt;
> 		su = (q * st) + (r * st) / tt;
> 		iu = (q * it) + (r * it) / tt;
> 		}
> 	else	{
> 		su = (tu * st) / tt;
> 		iu = (tu * it) / tt;
> 		}
> 	uu = tu - (su + iu);

Seems basically OK.

> the constant TOOBIG is designed so the calcru() function does not use the
> complicated form of the equations when the numbers are guaranteed to be small
> enough not to overflow in the intermediate products

I think you should use 2^32 for TOOBIG so that the only assumption on
stathz is that it is <= 10^6.  2^32 microseconds is a bit over an hour
and most programs won't run that long.  The tu >= tt check is automatically
satisfied if stathz <= 10^6 (less epsilon?), but it is reasonable to check
that both terms of the multiplication are < 2^32.  You can even make
the fix have negative inefficiency for the usual case on 32-bit machines
by casting the terms down to uint32_t when they are < 2^32.  For i386's
in particular, 32x32->64 bit multiplication is a single instruction and
gcc would use it if the terms were uint32_t.

> ...
> my assumptions for the analysis are as follows (if any of them is incorrect,
> please tell me):
> 	tu is total time in microseconds
> 	tt is total 128hz ticks (tt = ut + st + it)
> 	st is system 128hz ticks
> 	it is interrupt 128hz ticks
> 	ut is user 128hz ticks
> i assume that all of these counters are non-negative and increasing throughout
> the life of the process -
> i assume therefore that (i use "~" to mean "very roughly near")
> 	tu ~ tt * 10^6/128
> strictly because of the clock rates (machines with other kinds of clock rate
> ratios will need a different analysis, but the same kind of idea should work
> there, too, possibly with different constant values for TOOBIG) -

Seems OK.  stathz is machine-dependent -- better avoid tricky assumptions
about it (see above).  It seems to be the same as hz on sparc64's, so it
might be much larger than 128 (however, the scheduler depends on it being
about 128).

> for long programs (that is, programs that take more than 388 days or so of
> either system or interrupt cpu time), this change may not help, and will
> certainly not help after long enough, as shown in the analysis

Hmm, 388 days is 2^32 statclock ticks at 128Hz and the limit of 2^32
is sort of new.  BTW, I've always thought it stupid that it is stupid for
the tick counters to be 64 bits.  Values larger than 2^32 in them never
came close to actually working, since overflow occurs in the multiplication
by a factor of (388 days) / (105 hours) sooner than 32-bit counters would
overflow.  The counters are only (?) used by calcru(), so their 64-bitness
is not useful anywhere.

Bruce
Comment 2 Andrey Zonov 2012-08-02 14:42:21 UTC
Hi,

I stepped on this sometime ago.

I suggest simple patch that fixed problem for me.

$ ps -o lstart,time,systime,usertime -p 63292
STARTED                               TIME  SYSTIME      USERTIME
Sat Jun 16 13:49:20 2012     1017686:25.32 67:05.75 1017619:19.56

-- 
Andrey Zonov
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:58:26 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 4 Ed Maste freebsd_committer 2018-04-22 23:25:03 UTC
*** Bug 17842 has been marked as a duplicate of this bug. ***
Comment 5 Ed Maste freebsd_committer 2018-04-22 23:26:27 UTC
*** Bug 78957 has been marked as a duplicate of this bug. ***
Comment 6 Conrad Meyer freebsd_committer 2019-02-10 20:58:13 UTC
I wrote a small program to empirically validate the proposed patch under some assumptions.

"tu" (total time in microseconds) is drawn from [1, ~1000 years] (0 is uninteresting).

"tt" (total time in ticks) is tu / 1000.

"ut" or "st" is <= tt.

With randomly drawn tu and ut/st in the given ranges (not precisely uniform due to lack of arc4random_uniform64, but close to uniform distribution), I observe overflow in something like 99.999976% of the input range with the current algorithm and 0% (absolutely zero) instances of overflow with the proposed patch.  Also, the result is correct, and not truncated, in all cases with the bounds given above.

I believe it fixes the issue and can be committed.
Comment 7 Conrad Meyer freebsd_committer 2019-02-10 20:59:43 UTC
(It can be found at https://people.freebsd.org/~cem/pr76972_test.c .  If I made any mistake, let me know.)
Comment 8 Conrad Meyer freebsd_committer 2019-02-10 22:56:58 UTC
*** Bug 227689 has been marked as a duplicate of this bug. ***
Comment 9 Conrad Meyer freebsd_committer 2019-02-10 23:00:30 UTC
(In reply to Conrad Meyer from comment #6)
Intuitively, b/c is in the (real) range [0, 1], so floor(a*b/c) should be always be representable without overflow.  The trick is mostly in not losing all precision (i.e., a*(b/c) = 0 or a).
Comment 10 commit-hook freebsd_committer 2019-02-10 23:08:26 UTC
A commit references this bug:

Author: cem
Date: Sun Feb 10 23:07:47 UTC 2019
New revision: 343985
URL: https://svnweb.freebsd.org/changeset/base/343985

Log:
  Prevent overflow for usertime/systime in caclru1

  PR:		76972 and duplicates
  Reported by:	Dr. Christopher Landauer <cal AT aero.org>,
  		Steinar Haug <sthaug AT nethelp.no>
  Submitted by:	Andrey Zonov <andrey AT zonov.org> (earlier version)
  MFC after:	2 weeks

Changes:
  head/sys/kern/kern_resource.c
Comment 11 commit-hook freebsd_committer 2019-02-14 19:08:02 UTC
A commit references this bug:

Author: bde
Date: Thu Feb 14 19:07:09 UTC 2019
New revision: 344133
URL: https://svnweb.freebsd.org/changeset/base/344133

Log:
  Finish the fix for overflow in calcru1().

  The previous fix was unnecessarily very slow up to 105 hours where the
  simple formula used previously worked, and unnecessarily slow by a factor
  of about 5/3 up to 388 days, and didn't work above 388 days.  388 days is
  not a long time, since it is a reasonable uptime, and for processes the
  times being calculated are aggregated over all threads, so with N CPUs
  running the same thread a runtime of 388 days is reachable after only
  388 / N physical days.

  The PRs document overflow at 388 days, but don't try to fix it.

  Use the simple formula up to 76 hours.  Then use a complicated general
  method that reduces to the simple formula up to a bit less than 105
  hours, then reduces to the previous method without its extra work up
  to almost 388 days, then does more complicated reductions, usually
  many bits at a time so that this is not slow.  This works up to half
  of maximum representable time (292271 years), with accumulated rounding
  errors of at most 32 usec.

  amd64 can do all this with no avoidable rounding errors in an inline
  asm with 2 instructions, but this is too special to use.  __uint128_t
  can do the same with 100's of instructions on 64-bit arches.  Long
  doubles with at least 64 bits of precision are the easiest method to
  use on i386 userland, but are hard to use in the kernel.

  PR:		76972 and duplicates
  Reviewed by:	kib

Changes:
  head/sys/kern/kern_resource.c