Bug 235556 - rctl(8) pcpu/cputime is too high
Summary: rctl(8) pcpu/cputime is too high
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Mark Johnston
URL: https://reviews.freebsd.org/D30878
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-06 15:51 UTC by Oleg Ginzburg
Modified: 2026-01-06 19:26 UTC (History)
6 users (show)

See Also:
linimon: mfc-stable14?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oleg Ginzburg 2019-02-06 15:51:17 UTC
This is a duplicate of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=171811  created at the request of gonzo@, so that we can close the old one from 2012.

I don’t know how the original Issue maker lives (ben@desync.com), but I'm still alive. So, I'll do it.

All information from the old issue is relevant and affects all the "supported" versions of FreeBSD in 2019:

more info how to reproduce: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=171811#c3
Comment 1 Allan Jude freebsd_committer freebsd_triage 2019-02-07 03:41:04 UTC
Can you freshly describe the problem and provide some reproduction steps?
Comment 2 Oleg Ginzburg 2019-02-07 10:44:54 UTC
RACCT pcpu metrics are incorrect when processes end quickly.

What i expect:
 maximum value for 1-core process: 100

What i get:
 100x256

It is not an artificial or abstract state. For example this behavior is easy to see when working 'make config' for autotools with the launch of a lot of short calls (e.g env BATCH=no make -C /usr/ports/misc/mc clean configure). This makes it impossible to use any external billing based on RACCT. 

How to reproduce ( we use cpuset here to create load on only one core. So we should have pcpu=100 for jail assuming the jail does nothing else ):

1) Run jail1
2) Try to execute ant fast/light external command (e.g. /bin/ls ) in the loop.
For more convincing create a simple utility:

---
#include <stdio.h>

int main()
{
return 0;
}
---

Write execution loop and drop it into jail, e.g /root/run.sh:
---
#!/bin/sh

while [ 1 ]; do
/root/a.out > /dev/null
done
---

Run inside jail this script via cpuset:

cpuset -c -l 0 /bin/sh /root/run.sh


After this we can see on the 'top -P' something like:
---
182 processes: 2 running, 180 sleeping
CPU 0: 34.1% user,  0.0% nice, 65.9% system,  0.0% interrupt,  0.0% idle
CPU 1:  0.5% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.5% idle
CPU 2:  3.1% user,  0.0% nice,  1.2% system,  0.0% interrupt, 95.7% idle
CPU 3:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
CPU 4:  1.2% user,  0.0% nice,  0.8% system,  0.0% interrupt, 98.1% idle
CPU 5:  0.8% user,  0.0% nice,  0.4% system,  0.0% interrupt, 98.8% idle
CPU 6:  1.2% user,  0.0% nice,  0.0% system,  0.0% interrupt, 98.8% idle
CPU 7:  0.4% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.2% idle
...

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
41437 root             1  76    0 11408K  2232K CPU0    0   0:07  12.79% sh
...
---


Only one core is busy. However if we look at the RACCT from the hoster side, we see the following:

freebsd:~ # rctl -u jail:jail1 | grep pcpu
pcpu=25600
freebsd:~ # rctl -u jail:jail1 | grep pcpu
pcpu=25600
freebsd:~ # rctl -u jail:jail1 | grep pcpu
pcpu=25600
Comment 3 cyril 2021-06-03 21:00:11 UTC
I've created a patch here: https://reviews.freebsd.org/D30632 which seems to fix this problem.
Comment 4 Oleg Ginzburg 2021-06-04 09:29:54 UTC
(In reply to cyril from comment #3)

I confirm - everything is fine now.
Tested on: FreeBSD 14.0-CURRENT #0 main-n247127-1976e079544-dirty amd64
Comment 5 cyril 2021-07-13 19:50:14 UTC
(In reply to Oleg Ginzburg from comment #4)

markj discovered that the above patch is actually not correct. I have made another patch that modifies how pcpu is calculated. It is now based on the elapsed cputime value divided by the elapsed realtime value, rather than aggregating the pcpu of all processes in a jail. You can try the patch here: https://reviews.freebsd.org/D30878
Comment 6 jSML4ThWwBID69YC 2025-08-05 20:51:17 UTC
(In reply to cyril from comment #5)

I've been running this patch on 14.2 and 14.3 for the last year. It has been working fine. Graphed cpu readings appear in line with expectations. 

Can this make into 15, or one of 14x releases?
Comment 7 commit-hook freebsd_committer freebsd_triage 2025-08-05 23:58:36 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c72188d85a793c7610208beafb83af544de6e3b7

commit c72188d85a793c7610208beafb83af544de6e3b7
Author:     Cyril Zhang <cyril@freebsdfoundation.org>
AuthorDate: 2025-08-05 23:20:56 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2025-08-05 23:33:55 +0000

    racct: Improve handling of the pcpu resource

    The previous scheme would inflate the CPU consumption of short-lived
    processes.  For containers (e.g., processes, jails), the total pcpu
    usage was computed as a sum of the pcpu usage of all constituent
    threads, which makes little sense for a decaying average.

    Instead, aggregate wallclock time of all on-CPU threads and compute the
    pcpu resource as a decaying average as the sum.  This gives much more
    reasonable and accurate values in various simple tests.

    PR:             235556
    Reviewed by:    markj
    MFC after:      1 month
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D30878

 sys/kern/kern_racct.c | 307 +++++++++++++++++++++++++-------------------------
 sys/sys/proc.h        |   1 -
 sys/sys/racct.h       |   6 +-
 3 files changed, 156 insertions(+), 158 deletions(-)
Comment 8 commit-hook freebsd_committer freebsd_triage 2026-01-06 19:26:26 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2c3fb4cd509623f9510c62cde81d3e76b441435c

commit 2c3fb4cd509623f9510c62cde81d3e76b441435c
Author:     Cyril Zhang <cyril@freebsdfoundation.org>
AuthorDate: 2025-08-05 23:20:56 +0000
Commit:     Olivier Certner <olce@FreeBSD.org>
CommitDate: 2026-01-06 19:24:14 +0000

    racct: Improve handling of the pcpu resource

    The previous scheme would inflate the CPU consumption of short-lived
    processes.  For containers (e.g., processes, jails), the total pcpu
    usage was computed as a sum of the pcpu usage of all constituent
    threads, which makes little sense for a decaying average.

    Instead, aggregate wallclock time of all on-CPU threads and compute the
    pcpu resource as a decaying average as the sum.  This gives much more
    reasonable and accurate values in various simple tests.

    PR:             235556
    Reviewed by:    markj
    MFC after:      1 month
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D30878

    (cherry picked from commit c72188d85a793c7610208beafb83af544de6e3b7)

    Changes for this MFC to stable/14:
    - Removal of process swap out (thread stacks swapped out) was done only
      in 'main' and stable/15 and prior to the original commit, causing
      a conflict with a test on P_INMEM.  In this MFC, the %CPU is not
      forced to 0 anymore when P_INMEM is not set (contrary to what ps(1)
      explicitly does, which we find dubious).  The %CPU decay will take
      care of that in a more accurate manner (for processes that have just
      been swapped out).
    - Commit 9530c6f082ada9e6 ("racct: Simplify skipping idle process in the
      throttling daemon") was MFCed before this one although it occurred
      after it.  Consequently, adding the 'idle' variable in racctd() was
      removed from this MFC, as the other original commit removes it.

 sys/kern/kern_racct.c | 306 ++++++++++++++++++++++++--------------------------
 sys/sys/proc.h        |   2 +-
 sys/sys/racct.h       |   6 +-
 3 files changed, 150 insertions(+), 164 deletions(-)