Bug 247602

Summary: top occasionally shows unrealistic WCPU values
Product: Base System Reporter: Jeremy Chadwick <jdc>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Only Me    
Priority: ---    
Version: 11.4-RELEASE   
Hardware: amd64   
OS: Any   

Description Jeremy Chadwick 2020-06-27 22:39:59 UTC
Preface: this is a different problem than what's in PR 135823 (threading is not involved in this one, at least not as far as the processes themselves go), and what's in PR 236096 (I use SCHED_ULE).  (I've picked 11.4-RELEASE for the Version field in Bugzilla because that's the closest to stable/11 I can get.)

While building world (make -j2 buildworld) on a 2-core 11/stable (r358258) amd64 VM I have, while running top -s 1, I noticed  on very rare occasion -- but recurring -- some procesesses would show completely ridiculous WCPU values.  The numbers of processes which would show this varied (sometimes an entire page worth, other times only partial numbers).  Most of these processes do not have threads, in case that's relevant.  Example:

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 1211 halbot        1  20    0 56524K 50092K select  1   0:01 1830297421127995.00% perl /home/halbot/hal/halbot.pl /home/halbot/hal/halbot.json
  764 jdc           1  20    0 13208K  5748K select  0   0:00 1830297421127995.00% sshd: jdc@pts/0 (sshd)
  486 unbound       1  20    0 19320K  9672K kqread  1   0:00 1830297421127995.00% /usr/local/sbin/unbound -c /usr/local/etc/unbound/unbound.conf
  905 root          1  20    0  5024K  1372K select  1   0:00 1830297421127995.00% make -m /usr/src/share/mk -f Makefile.inc1 TARGET=amd64 TARGET_ARCH=amd64 buildworld
  881 root          1  20    0  5024K   984K select  1   0:00 1830297421127995.00% make -j2 buildworld
 4091 root          1  52    0  5024K  1780K select  1   0:00 1830297421127995.00% make DIRPRFX=usr.bin/clang/llvm-tblgen/ all
  465 root          1  20    0  6428K  1996K select  1   0:00 1830297421127995.00% /usr/sbin/syslogd -4 -s -s -cc
 4240 root          1  80    0   156M   135M CPU1    1   0:03 100.71% /usr/bin/c++ -cc1 -triple x86_64-unknown-freebsd11.3 -emit-obj -disable-free -disable-llvm-verifier -discard-value
 4237 root          1  80    0   164M   142M RUN     0   0:04  99.76% /usr/bin/c++ -cc1 -triple x86_64-unknown-freebsd11.3 -emit-obj -disable-free -disable-llvm-verifier -discard-value
 4216 jdc           1  20    0  7928K  2900K CPU0    0   0:00   0.09% top
  568 root          1  20    0   105M 95940K select  1   0:02   0.03% /usr/local/bin/perl -T -w /usr/local/bin/spamd -4 -c -d -r /var/run/spamd/spamd.pid
 1493 jdc           1  20    0 13208K  5776K select  1   0:00   0.02% sshd: jdc@pts/1 (sshd)
  656 root          1  20    0   145M  6172K kqread  0   0:00   0.01% php-fpm: master process (/usr/local/etc/php-fpm.conf) (php-fpm)
...


This looks like an overflowed integer, but makes me wonder if the bug is in the kernel or in top itself.  (Not sure if this should be Component: bin or Component: kern)

I might suggest pulling jhb@ in for this one since he has great familiarity with some of top's code, esp. relating to WCPU.