This is a duplicate of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=171811 created at the request of gonzo@, so that we can close the old one from 2012. I don’t know how the original Issue maker lives (ben@desync.com), but I'm still alive. So, I'll do it. All information from the old issue is relevant and affects all the "supported" versions of FreeBSD in 2019: more info how to reproduce: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=171811#c3
Can you freshly describe the problem and provide some reproduction steps?
RACCT pcpu metrics are incorrect when processes end quickly. What i expect: maximum value for 1-core process: 100 What i get: 100x256 It is not an artificial or abstract state. For example this behavior is easy to see when working 'make config' for autotools with the launch of a lot of short calls (e.g env BATCH=no make -C /usr/ports/misc/mc clean configure). This makes it impossible to use any external billing based on RACCT. How to reproduce ( we use cpuset here to create load on only one core. So we should have pcpu=100 for jail assuming the jail does nothing else ): 1) Run jail1 2) Try to execute ant fast/light external command (e.g. /bin/ls ) in the loop. For more convincing create a simple utility: --- #include <stdio.h> int main() { return 0; } --- Write execution loop and drop it into jail, e.g /root/run.sh: --- #!/bin/sh while [ 1 ]; do /root/a.out > /dev/null done --- Run inside jail this script via cpuset: cpuset -c -l 0 /bin/sh /root/run.sh After this we can see on the 'top -P' something like: --- 182 processes: 2 running, 180 sleeping CPU 0: 34.1% user, 0.0% nice, 65.9% system, 0.0% interrupt, 0.0% idle CPU 1: 0.5% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.5% idle CPU 2: 3.1% user, 0.0% nice, 1.2% system, 0.0% interrupt, 95.7% idle CPU 3: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle CPU 4: 1.2% user, 0.0% nice, 0.8% system, 0.0% interrupt, 98.1% idle CPU 5: 0.8% user, 0.0% nice, 0.4% system, 0.0% interrupt, 98.8% idle CPU 6: 1.2% user, 0.0% nice, 0.0% system, 0.0% interrupt, 98.8% idle CPU 7: 0.4% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.2% idle ... PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 41437 root 1 76 0 11408K 2232K CPU0 0 0:07 12.79% sh ... --- Only one core is busy. However if we look at the RACCT from the hoster side, we see the following: freebsd:~ # rctl -u jail:jail1 | grep pcpu pcpu=25600 freebsd:~ # rctl -u jail:jail1 | grep pcpu pcpu=25600 freebsd:~ # rctl -u jail:jail1 | grep pcpu pcpu=25600
I've created a patch here: https://reviews.freebsd.org/D30632 which seems to fix this problem.
(In reply to cyril from comment #3) I confirm - everything is fine now. Tested on: FreeBSD 14.0-CURRENT #0 main-n247127-1976e079544-dirty amd64
(In reply to Oleg Ginzburg from comment #4) markj discovered that the above patch is actually not correct. I have made another patch that modifies how pcpu is calculated. It is now based on the elapsed cputime value divided by the elapsed realtime value, rather than aggregating the pcpu of all processes in a jail. You can try the patch here: https://reviews.freebsd.org/D30878
(In reply to cyril from comment #5) I've been running this patch on 14.2 and 14.3 for the last year. It has been working fine. Graphed cpu readings appear in line with expectations. Can this make into 15, or one of 14x releases?
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c72188d85a793c7610208beafb83af544de6e3b7 commit c72188d85a793c7610208beafb83af544de6e3b7 Author: Cyril Zhang <cyril@freebsdfoundation.org> AuthorDate: 2025-08-05 23:20:56 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2025-08-05 23:33:55 +0000 racct: Improve handling of the pcpu resource The previous scheme would inflate the CPU consumption of short-lived processes. For containers (e.g., processes, jails), the total pcpu usage was computed as a sum of the pcpu usage of all constituent threads, which makes little sense for a decaying average. Instead, aggregate wallclock time of all on-CPU threads and compute the pcpu resource as a decaying average as the sum. This gives much more reasonable and accurate values in various simple tests. PR: 235556 Reviewed by: markj MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30878 sys/kern/kern_racct.c | 307 +++++++++++++++++++++++++------------------------- sys/sys/proc.h | 1 - sys/sys/racct.h | 6 +- 3 files changed, 156 insertions(+), 158 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2c3fb4cd509623f9510c62cde81d3e76b441435c commit 2c3fb4cd509623f9510c62cde81d3e76b441435c Author: Cyril Zhang <cyril@freebsdfoundation.org> AuthorDate: 2025-08-05 23:20:56 +0000 Commit: Olivier Certner <olce@FreeBSD.org> CommitDate: 2026-01-06 19:24:14 +0000 racct: Improve handling of the pcpu resource The previous scheme would inflate the CPU consumption of short-lived processes. For containers (e.g., processes, jails), the total pcpu usage was computed as a sum of the pcpu usage of all constituent threads, which makes little sense for a decaying average. Instead, aggregate wallclock time of all on-CPU threads and compute the pcpu resource as a decaying average as the sum. This gives much more reasonable and accurate values in various simple tests. PR: 235556 Reviewed by: markj MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30878 (cherry picked from commit c72188d85a793c7610208beafb83af544de6e3b7) Changes for this MFC to stable/14: - Removal of process swap out (thread stacks swapped out) was done only in 'main' and stable/15 and prior to the original commit, causing a conflict with a test on P_INMEM. In this MFC, the %CPU is not forced to 0 anymore when P_INMEM is not set (contrary to what ps(1) explicitly does, which we find dubious). The %CPU decay will take care of that in a more accurate manner (for processes that have just been swapped out). - Commit 9530c6f082ada9e6 ("racct: Simplify skipping idle process in the throttling daemon") was MFCed before this one although it occurred after it. Consequently, adding the 'idle' variable in racctd() was removed from this MFC, as the other original commit removes it. sys/kern/kern_racct.c | 306 ++++++++++++++++++++++++-------------------------- sys/sys/proc.h | 2 +- sys/sys/racct.h | 6 +- 3 files changed, 150 insertions(+), 164 deletions(-)