The issue is from following code in function calcru1: su = (tu * st) / tt; the 'tu * st' is overflowed in my system, causing the 'p_rux.rux_su' is no longer growing for that process. For example, running ps(1) shows that 'systime' is no longer growing, but 'time' is not yet overflowed and keep growing $ ps -p 17087 -o comm,time,systime COMMAND TIME SYSTIME VBoxHeadless 8951:24.70 6339:45.52 $ ps -p 17087 -o comm,time,systime COMMAND TIME SYSTIME VBoxHeadless 8951:25.72 6339:45.52 $ ps -p 17087 -o comm,time,systime COMMAND TIME SYSTIME VBoxHeadless 8951:26.54 6339:45.52 $ ps -p 17087 -o comm,time,systime COMMAND TIME SYSTIME VBoxHeadless 8951:27.60 6339:45.52 Running kgdb(1) shows an overflow is occurred when 'cputick2usec(ruxp->rux_runtime)' multiplying '$p->p_rux.rux_sticks' (kgdb) p $p $3949 = (struct proc *) 0xfffff800464f5000 (kgdb) p $p->p_pid $3950 = 17087 (kgdb) p $p->p_comm $3951 = "VBoxHeadless\000\000\000\000\000\000\000" (kgdb) p $p->p_rux $3952 = {rux_runtime = 1888763265150967, rux_uticks = 866405, rux_sticks = 66653750, rux_iticks = 0, rux_uu = 6893394017, rux_su = 380385524075, rux_tu = 537211849594} (kgdb) p $p->p_rux $3953 = {rux_runtime = 1888767209058083, rux_uticks = 866408, rux_sticks = 66653898, rux_iticks = 0, rux_uu = 6893416864, rux_su = 380385524075, rux_tu = 537212971341} (kgdb) p $p->p_rux $3954 = {rux_runtime = 1888773035929970, rux_uticks = 866411, rux_sticks = 66654106, rux_iticks = 0, rux_uu = 6893440457, rux_su = 380385524075, rux_tu = 537214628650} (kgdb) p ((uint64_t)$p->p_rux.rux_runtime * 1000ll / ((cpu_ticks == tc_cpu_ticks ? timehands->th_counter->tc_frequency : cpu_tick_frequency) / 1000ll)) $3955 = 537220825631 (kgdb) p ((uint64_t)$p->p_rux.rux_runtime * 1000ll / ((cpu_ticks == tc_cpu_ticks ? timehands->th_counter->tc_frequency : cpu_tick_frequency) / 1000ll)) * $p->p_rux.rux_sticks / ($p->p_rux.rux_uticks + $p->p_rux.rux_sticks + $p->p_rux.rux_iticks) $3956 = 257136383464 (kgdb) p ((uint64_t)$p->p_rux.rux_runtime * 1000ll / ((cpu_ticks == tc_cpu_ticks ? timehands->th_counter->tc_frequency : cpu_tick_frequency) / 1000ll)) $3959 = 537288008712 (kgdb) p $p->p_rux.rux_sticks $3960 = 66663850 (kgdb) p ((uint64_t)$p->p_rux.rux_runtime * 1000ll / ((cpu_ticks == tc_cpu_ticks ? timehands->th_counter->tc_frequency : cpu_tick_frequency) / 1000ll)) * $p->p_rux.rux_sticks $3961 = 17371417884613589424 Showing the correct result of 'tu * st' in python(1): >>> 537288008712*66663850 35817687219575461200L This result is actually greater than ULLONG_MAX. Tested on kernel version 10.3-RELEASE-p20 and 11.1-RELEASE.
I believe I've reproduced this on a 12.3-RELEASE-p5 GENERIC amd64 virtualbox virtual machine. After suspending the guest and upgrading virtualbox (on host) and resuming, the following was logged in /var/log/messages: Jul 28 00:34:10 123-RELEASE-p5-amd64-9e36 kernel: calcru: runtime went backwards from 5425487903787745 usec to 18518189771 usec for pid 11 (idle) Additionally, the following is observed in top output: PID JID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME WCPU COMMAND 11 0 root 2 155 ki31 0B 32K 0B RUN 0 ??? 199.70% [idle] Originally running `ps -p` on the idle process id showed time and system not changing: [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 17087 -o comm,time,systime COMMAND TIME SYSTIME [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime COMMAND TIME SYSTIME idle 90424922:45.67 90424922:45.67 [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime COMMAND TIME SYSTIME idle 90424922:45.67 90424922:45.67 [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime COMMAND TIME SYSTIME idle 90424922:45.67 90424922:45.67 [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime COMMAND TIME SYSTIME idle 90424922:45.67 90424922:45.67 [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime COMMAND TIME SYSTIME idle 90424922:45.67 90424922:45.67 But after a small period of time started changing again: [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime COMMAND TIME SYSTIME idle 90424939:04.68 90424939:04.68 [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime COMMAND TIME SYSTIME idle 90424939:05.76 90424939:05.76 [koobs@123-RELEASE-p5-amd64-9e36:/usr/ports] ps -p 11 -o comm,time,systime COMMAND TIME SYSTIME idle 90424939:06.43 90424939:06.43 Looking at kern/kern_resource history: 1) Conrad (cem) resolved a calcru1 overflow in base 23e5e43ccd0f via bug 76972 (Bruce followed up with base 23e5e43ccd0f) 2) Simon (sjg) disabled the message for virtual machines in base bacb140f31aa via review D33902 (unsure whether this was eventually MFC'd or not, the commit didn't include MFC tag) There may be behaviour improvements possible in this case, particularly for the '???' reported times in top. In particular: 1) Had the message been disabled in my (VM) case, I would not have been able to correlate/isolate the cause via dmesg 2) What could we report for runtime in the event of overflow instead of '???' What do other OS's do? With dynamic frequency/power scaling being the norm these days, increasing heterogeneity in system hardware (big.little etc) in the future, and given this message appears to only be an (intended) 'report' or diagnostic message solely to indicate the state change (and not a fatal error) ... 3) What other ways might FreeBSD more broadly/gracefully handle timing changes going forward? ^Triage: Request feedback from Conrad re (remaining?) calcru1 overflow and Simon/John re virtual machine situation.
I didn't see any kernel message printed in dmesg(8) when this happens, but the issue prevented the affected process being correctly shown in some 'top' implementations. (I uses nmon(1) with linprocfs(4))
Looks like this was resolved in 13.0.0 with 23e5e43ccd0fc https://cgit.freebsd.org/src/commit/?id=23e5e43ccd0fc This appears to be a duplicate of bug 76972. So I think this can be closed.
^Triage: close as per comment #1.
(In reply to Mark Linimon from comment #4) Er, comment #3.