Bug 232914 - kern/kern_resource: Integer overflow in function calcru1
Summary: kern/kern_resource: Integer overflow in function calcru1
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.3-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-02 17:51 UTC by WHR
Modified: 2024-05-06 04:25 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description WHR 2018-11-02 17:51:30 UTC
The issue is from following code in function calcru1:
su = (tu * st) / tt;
the 'tu * st' is overflowed in my system, causing the 'p_rux.rux_su' is no longer growing for that process.

For example, running ps(1) shows that 'systime' is no longer growing, but 'time' is not yet overflowed and keep growing
$ ps -p 17087 -o comm,time,systime
COMMAND            TIME    SYSTIME
VBoxHeadless 8951:24.70 6339:45.52
$ ps -p 17087 -o comm,time,systime
COMMAND            TIME    SYSTIME
VBoxHeadless 8951:25.72 6339:45.52
$ ps -p 17087 -o comm,time,systime
COMMAND            TIME    SYSTIME
VBoxHeadless 8951:26.54 6339:45.52
$ ps -p 17087 -o comm,time,systime
COMMAND            TIME    SYSTIME
VBoxHeadless 8951:27.60 6339:45.52

Running kgdb(1) shows an overflow is occurred when 'cputick2usec(ruxp->rux_runtime)' multiplying '$p->p_rux.rux_sticks'
(kgdb) p $p
$3949 = (struct proc *) 0xfffff800464f5000
(kgdb) p $p->p_pid
$3950 = 17087
(kgdb) p $p->p_comm
$3951 = "VBoxHeadless\000\000\000\000\000\000\000"
(kgdb) p $p->p_rux
$3952 = {rux_runtime = 1888763265150967, rux_uticks = 866405, rux_sticks = 66653750, rux_iticks = 0, rux_uu = 6893394017, 
  rux_su = 380385524075, rux_tu = 537211849594}
(kgdb) p $p->p_rux
$3953 = {rux_runtime = 1888767209058083, rux_uticks = 866408, rux_sticks = 66653898, rux_iticks = 0, rux_uu = 6893416864, 
  rux_su = 380385524075, rux_tu = 537212971341}
(kgdb) p $p->p_rux
$3954 = {rux_runtime = 1888773035929970, rux_uticks = 866411, rux_sticks = 66654106, rux_iticks = 0, rux_uu = 6893440457, 
  rux_su = 380385524075, rux_tu = 537214628650}
(kgdb) p ((uint64_t)$p->p_rux.rux_runtime * 1000ll / ((cpu_ticks == tc_cpu_ticks ? timehands->th_counter->tc_frequency : cpu_tick_frequency) / 1000ll))
$3955 = 537220825631
(kgdb) p ((uint64_t)$p->p_rux.rux_runtime * 1000ll / ((cpu_ticks == tc_cpu_ticks ? timehands->th_counter->tc_frequency : cpu_tick_frequency) / 1000ll)) * $p->p_rux.rux_sticks / ($p->p_rux.rux_uticks + $p->p_rux.rux_sticks + $p->p_rux.rux_iticks)
$3956 = 257136383464
(kgdb) p ((uint64_t)$p->p_rux.rux_runtime * 1000ll / ((cpu_ticks == tc_cpu_ticks ? timehands->th_counter->tc_frequency : cpu_tick_frequency) / 1000ll))
$3959 = 537288008712
(kgdb) p $p->p_rux.rux_sticks 
$3960 = 66663850
(kgdb) p ((uint64_t)$p->p_rux.rux_runtime * 1000ll / ((cpu_ticks == tc_cpu_ticks ? timehands->th_counter->tc_frequency : cpu_tick_frequency) / 1000ll)) * $p->p_rux.rux_sticks
$3961 = 17371417884613589424

Showing the correct result of 'tu * st' in python(1):
>>> 537288008712*66663850
35817687219575461200L

This result is actually greater than ULLONG_MAX.

Tested on kernel version 10.3-RELEASE-p20 and 11.1-RELEASE.
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2022-07-29 03:03:18 UTC
I believe I've reproduced this on a 12.3-RELEASE-p5 GENERIC amd64 virtualbox virtual machine.

After suspending the guest and upgrading virtualbox (on host) and resuming, the following was logged in /var/log/messages:

Jul 28 00:34:10 123-RELEASE-p5-amd64-9e36 kernel: calcru: runtime went backwards from 5425487903787745 usec to 18518189771 usec for pid 11 (idle)

Additionally, the following is observed in top output:

  PID   JID USERNAME    THR PRI NICE   SIZE    RES SWAP STATE    C   TIME    WCPU COMMAND
   11     0 root          2 155 ki31     0B    32K   0B RUN      0    ??? 199.70% [idle]

Originally running `ps -p` on the idle process id showed time and system not changing:

[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 17087 -o comm,time,systime
COMMAND TIME SYSTIME                                                                                                                                     [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424922:45.67 90424922:45.67


But after a small period of time started changing again:

[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424939:04.68 90424939:04.68
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424939:05.76 90424939:05.76
[koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime
COMMAND           TIME        SYSTIME
idle    90424939:06.43 90424939:06.43

Looking at kern/kern_resource history:

1) Conrad (cem) resolved a calcru1 overflow in base 23e5e43ccd0f via bug 76972 (Bruce followed up with base 23e5e43ccd0f)

2) Simon (sjg) disabled the message for virtual machines in base bacb140f31aa via review D33902 (unsure whether this was eventually MFC'd or not, the commit didn't include MFC tag)

There may be behaviour improvements possible in this case, particularly for the '???' reported times in top. In particular:

1) Had the message been disabled in my (VM) case, I would not have been able to correlate/isolate the cause via dmesg

2) What could we report for runtime in the event of overflow instead of '???' What do other OS's do?

With dynamic frequency/power scaling being the norm these days, increasing heterogeneity in system hardware (big.little etc) in the future, and given this message appears to only be an (intended) 'report' or diagnostic message solely to indicate the state change (and not a fatal error) ...

3) What other ways might FreeBSD more broadly/gracefully handle timing changes going forward?

^Triage: Request feedback from Conrad re (remaining?) calcru1 overflow and Simon/John re virtual machine situation.
Comment 2 WHR 2023-07-22 14:00:35 UTC
I didn't see any kernel message printed in dmesg(8) when this happens, but the issue prevented the affected process being correctly shown in some 'top' implementations. (I uses nmon(1) with linprocfs(4))
Comment 3 Kyle Ambroff-Kao 2024-05-06 00:00:43 UTC
Looks like this was resolved in 13.0.0 with 23e5e43ccd0fc

https://cgit.freebsd.org/src/commit/?id=23e5e43ccd0fc

This appears to be a duplicate of bug 76972.

So I think this can be closed.
Comment 4 Mark Linimon freebsd_committer freebsd_triage 2024-05-06 04:24:52 UTC
^Triage: close as per comment #1.
Comment 5 Mark Linimon freebsd_committer freebsd_triage 2024-05-06 04:25:18 UTC
(In reply to Mark Linimon from comment #4)
Er, comment #3.