Bug 109413 - [patch] top(1) shows at least 50% idle when hyperthreading is disabled
Summary: [patch] top(1) shows at least 50% idle when hyperthreading is disabled
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 6.2-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: John Baldwin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-21 23:50 UTC by leres
Modified: 2011-07-07 20:07 UTC (History)
0 users

See Also:


Attachments
file.diff (2.25 KB, patch)
2007-02-21 23:50 UTC, leres
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description leres 2007-02-21 23:50:08 UTC
	If the hardware and kernel support hyperthreading but
	hyperthreading is disabled (which is the default), top
	always reports at least 50% idle time even all cpus completely
	busy.

Fix: The appended patch to usr.bin/top/machine.c checks to see
	if hyperthreading is possible but disabled and if so,
	subtracts ticks from cp_time[CP_IDLE].

===================================================================
RCS file: RCS/machine.c,v
retrieving revision 1.1
How-To-Repeat: 	Insure hyperthreading is disabled:

	    # sysctl machdep.hyperthreading_allowed=0
	    machdep.hyperthreading_allowed: 0 -> 0

	Note that if sysctl says "unknown oid" then your test system
	does not support hyperthreading and you need to find a
	different system that does.

	Launch top and then start some cpu bound processes; observe
	that idle never goes below 50%.

	Another way to see this effect is to look at the cp_time
	vector on the number of ticks tallied in a second.

	Here's a single processor Pentium III:

	    CPU: Intel(R) Pentium(R) III CPU family      1266MHz (1266.07-MHz 686-class CPU)
	      Origin = "GenuineIntel"  Id = 0x6b1  Stepping = 1
	      Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
	    real memory  = 2147418112 (2047 MB)
	    avail memory = 2096291840 (1999 MB)
	    [...]
	    cpu0: <ACPI CPU> on acpi0

	It see about 133 ticks in a second:

	    % sysctl kern.cp_time ; sleep 1 ; sysctl kern.cp_time
	    kern.cp_time: 386 142 8685 210 9844275
	    kern.cp_time: 386 142 8686 210 9844408

	Here's a dual processor Xeon system with hyperthreading (4
	logical CPUs):

	    CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3200.13-MHz 686-class CPU)
	      Origin = "GenuineIntel"  Id = 0xf34  Stepping = 4
	      Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
	      Features2=0x441d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,<b14>>
	      AMD Features=0x20000000<LM>
	      Logical CPUs per core: 2
	    real memory  = 2146893824 (2047 MB)
	    avail memory = 2095759360 (1998 MB)
	    FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
	     cpu0 (BSP): APIC ID:  0
	     cpu1 (AP): APIC ID:  1
	     cpu2 (AP): APIC ID:  6
	     cpu3 (AP): APIC ID:  7

	It see about 534 ticks in a second:

	    % sysctl kern.cp_time ; sleep 1 ; sysctl kern.cp_time
	    kern.cp_time: 31410710 498316 11249090 2058151 1597290413
	    kern.cp_time: 31410710 498316 11249091 2058151 1597290946

	So with 4 times as many logical processors, we see 4 times
	as many ticks. What's interesting is that the number of
	ticks per second is independent of the setting of
	hyperthreading_allowed. But when hyperthreading is disabled,
	two of the processors can only contribute to CP_IDLE ticks
	while the other two can contribute to all types of ticks.
Comment 1 Robert Watson freebsd_committer 2007-06-25 12:43:34 UTC
Responsible Changed
From-To: freebsd-bugs->jhb

Assign this PR to John Baldwin as he has been involved in development in 
these areas and may be able to comment on the best way to fix this.  My 
intuition is that we should thinking about changing the kernel measurement 
and reporting bits rather than top, as other tools will otherwise remain 
incorrect even if top is fixed.
Comment 2 john 2007-06-25 19:38:26 UTC
Yes, the place to fix this is in the kernel, not in userland applications.  We 
have a hack at work for 4.x at least to address this, but a more proper fix 
is needed and requires us to have good knowledge in the kernel of online vs 
offline CPUs.

--- //depot/vendor/freebsd_4/src/sys/kern/kern_clock.c	2003/08/22 15:39:19
+++ //depot/yahoo/ybsd_4/src/sys/kern/kern_clock.c	2005/02/01 08:02:41
@@ -376,6 +375,11 @@
 	}
 }
 
+#ifdef SMP
+/* XXXHACK */
+extern int hlt_cpus_mask;
+#endif
+
 /*
  * Statistics clock.  Grab profile sample, and if divider reaches 0,
  * do process and kernel statistics.  Most of the statistics are only
@@ -450,6 +450,15 @@
 		 * so that we know how much of its real time was spent
 		 * in ``non-process'' (i.e., interrupt) work.
 		 */
+#ifdef SMP
+		/*
+		 * XXXHACK: If this is a halted CPU, then don't count it
+		 * in the statistics.
+		 */
+		if (hlt_cpus_mask & 1 << cpuid)
+			p = NULL;
+		else {
+#endif
 		p = curproc;
 		if (CLKF_INTR(frame)) {
 			if (p != NULL)
@@ -460,6 +469,9 @@
 			cp_time[CP_SYS]++;
 		} else
 			cp_time[CP_IDLE]++;
+#ifdef SMP
+		}
+#endif
 	}
 	pscnt = psdiv;
 

-- 
John Baldwin
Comment 3 John Baldwin freebsd_committer 2011-07-07 20:06:49 UTC
State Changed
From-To: open->closed

HEAD no longer allows dynamic disabling of HTT which "fixes" this. 
Eventually we will have true online/offline CPU support and we will 
ensure top(1) works properly with offline CPUs once that happens.