If the hardware and kernel support hyperthreading but hyperthreading is disabled (which is the default), top always reports at least 50% idle time even all cpus completely busy. Fix: The appended patch to usr.bin/top/machine.c checks to see if hyperthreading is possible but disabled and if so, subtracts ticks from cp_time[CP_IDLE]. =================================================================== RCS file: RCS/machine.c,v retrieving revision 1.1 How-To-Repeat: Insure hyperthreading is disabled: # sysctl machdep.hyperthreading_allowed=0 machdep.hyperthreading_allowed: 0 -> 0 Note that if sysctl says "unknown oid" then your test system does not support hyperthreading and you need to find a different system that does. Launch top and then start some cpu bound processes; observe that idle never goes below 50%. Another way to see this effect is to look at the cp_time vector on the number of ticks tallied in a second. Here's a single processor Pentium III: CPU: Intel(R) Pentium(R) III CPU family 1266MHz (1266.07-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6b1 Stepping = 1 Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> real memory = 2147418112 (2047 MB) avail memory = 2096291840 (1999 MB) [...] cpu0: <ACPI CPU> on acpi0 It see about 133 ticks in a second: % sysctl kern.cp_time ; sleep 1 ; sysctl kern.cp_time kern.cp_time: 386 142 8685 210 9844275 kern.cp_time: 386 142 8686 210 9844408 Here's a dual processor Xeon system with hyperthreading (4 logical CPUs): CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3200.13-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf34 Stepping = 4 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x441d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,<b14>> AMD Features=0x20000000<LM> Logical CPUs per core: 2 real memory = 2146893824 (2047 MB) avail memory = 2095759360 (1998 MB) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 It see about 534 ticks in a second: % sysctl kern.cp_time ; sleep 1 ; sysctl kern.cp_time kern.cp_time: 31410710 498316 11249090 2058151 1597290413 kern.cp_time: 31410710 498316 11249091 2058151 1597290946 So with 4 times as many logical processors, we see 4 times as many ticks. What's interesting is that the number of ticks per second is independent of the setting of hyperthreading_allowed. But when hyperthreading is disabled, two of the processors can only contribute to CP_IDLE ticks while the other two can contribute to all types of ticks.
Responsible Changed From-To: freebsd-bugs->jhb Assign this PR to John Baldwin as he has been involved in development in these areas and may be able to comment on the best way to fix this. My intuition is that we should thinking about changing the kernel measurement and reporting bits rather than top, as other tools will otherwise remain incorrect even if top is fixed.
Yes, the place to fix this is in the kernel, not in userland applications. We have a hack at work for 4.x at least to address this, but a more proper fix is needed and requires us to have good knowledge in the kernel of online vs offline CPUs. --- //depot/vendor/freebsd_4/src/sys/kern/kern_clock.c 2003/08/22 15:39:19 +++ //depot/yahoo/ybsd_4/src/sys/kern/kern_clock.c 2005/02/01 08:02:41 @@ -376,6 +375,11 @@ } } +#ifdef SMP +/* XXXHACK */ +extern int hlt_cpus_mask; +#endif + /* * Statistics clock. Grab profile sample, and if divider reaches 0, * do process and kernel statistics. Most of the statistics are only @@ -450,6 +450,15 @@ * so that we know how much of its real time was spent * in ``non-process'' (i.e., interrupt) work. */ +#ifdef SMP + /* + * XXXHACK: If this is a halted CPU, then don't count it + * in the statistics. + */ + if (hlt_cpus_mask & 1 << cpuid) + p = NULL; + else { +#endif p = curproc; if (CLKF_INTR(frame)) { if (p != NULL) @@ -460,6 +469,9 @@ cp_time[CP_SYS]++; } else cp_time[CP_IDLE]++; +#ifdef SMP + } +#endif } pscnt = psdiv; -- John Baldwin
State Changed From-To: open->closed HEAD no longer allows dynamic disabling of HTT which "fixes" this. Eventually we will have true online/offline CPU support and we will ensure top(1) works properly with offline CPUs once that happens.