Bug 192316 - Invariant TSC gets misdetected on Intel Core 2 Duo processors, resulting in sluggish system behavior
Summary: Invariant TSC gets misdetected on Intel Core 2 Duo processors, resulting in s...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: John Baldwin
Depends on:
Reported: 2014-08-01 19:13 UTC by Jan Kokemüller
Modified: 2015-04-03 16:22 UTC (History)
6 users (show)

See Also:

suggested patch (2.09 KB, patch)
2014-08-01 19:14 UTC, Jan Kokemüller
no flags Details | Diff
acpidump of a Lenovo SL510 (132.60 KB, application/gzip)
2014-09-17 16:15 UTC, Jan Kokemüller
no flags Details
always_disable.patch (3.50 KB, patch)
2014-10-24 15:06 UTC, John Baldwin
no flags Details | Diff
only_disable_c2.patch (7.72 KB, patch)
2014-11-25 15:48 UTC, John Baldwin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Kokemüller 2014-08-01 19:13:23 UTC

On this Intel Core 2 Duo T6570 processor, the TSC timer gets a very high priority, even though it is not invariant. If there is power saving enabled and the processor goes to lower C-states, this results in a sluggish system.

Thread on freebsd-current:

There are some Intel models hard coded as TSC invariant in the TSC detection logic, even if they are not. Suggested fix: just check if the TSC_INVARIANT processor bit is set or not. I've attached a patch.

Comment 1 Jan Kokemüller 2014-08-01 19:14:03 UTC
Created attachment 145227 [details]
suggested patch
Comment 2 John Baldwin freebsd_committer freebsd_triage 2014-08-12 19:35:00 UTC
Alexander added the Intel CPUID checks in http://svnweb.freebsd.org/base?view=revision&revision=185460

Jung-uk added the AMD CPUID check in http://svnweb.freebsd.org/base?view=revision&revision=184146

I think there is some confusion as to what it means to be invariant.  The now official definition is "C-, P-, and T-state" invariant.  However, the 'tsc_is_invariant' variable only includes 'P- and T-state' invariant, and the system even copes with this by disabling Cx states if the TSC is used when it is not C-state invariant.

In terms of what is in the SDM, this is how Intel defines an Invariant TSC in the February 2014 version of Volume 3 of the SDM includes this quote about the availability of an Invariant TSC:

17.13.1 Invariant TSC
The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor’s support for invariant TSC is indicated by CPUID.80000007H:EDX[8].
The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource.

However, earlier in the text a separate section notes the CPUs that support a P-state invariant (constant-rate) TSC:

Processor families increment the time-stamp counter differently:
• For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4 processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]); and for P6 family processors: the time-stamp counter increments with every internal processor clock cycle.
The internal processor clock cycle is determined by the current core-clock to bus-clock ratio. Intel® SpeedStep® technology transitions may also impact the processor clock.
• For Pentium 4 processors, Intel Xeon processors (family [0FH], models [03H and higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], model [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors (family [06H], model [0FH]); for Intel Core 2 and Intel Xeon processors (family [06H], DisplayModel [17H]); for Intel Atom processors (family [06H],
DisplayModel [1CH]): the time-stamp counter increments at a constant rate. That rate may be set by the maximum core-clock to bus-clock ratio of the processor or may be set by the maximum resolved frequency at which the processor is booted. The maximum resolved frequency may differ from the maximum qualified frequency of the processor, see Section 18.14.5 for more detail. On certain processors, the TSC frequency may not be the same as the frequency in the brand string.
The specific processor configuration determines the behavior. Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward.

I think the problem is that you probably have C1E enabled and so we are not setting 'cpu_can_deep_sleep' in your case.  We have special magic for C1E on AMD, but not Intel.  And even then, the special magic results in us turning off the power savings of C1E so it becomes a dumb C1 (as well as explicitly disable C3 from being used).

I expect most Intel CPUs of this vintage are probably using C1E by default (since it defaults to on in the BIOS), so I would be inclined to basically go with this patch (but without the AMD changes) and to also remove the TC_FLAGS_C3STOP flag since it will no longer be used (and corresponding code in tc_windup()).  This will leave the 'deep_sleep' stuff as only applying to the LAPIC timer and not to the TSC.
Comment 3 Jan Kokemüller 2014-09-17 16:15:39 UTC
Created attachment 147407 [details]
acpidump of a Lenovo SL510

Thanks for the detailed analysis. I've had another look at how my laptop (Lenovo SL510) handles C-states. By default the OS queries the _CST method of the firmware and this results in the following C-states:

dev.cpu.0.cx_supported: C1/1/1 C2/2/57
dev.cpu.1.cx_supported: C1/1/1 C2/2/57

So this means that cpu_can_deep_sleep does not get set. When I insert a "return (ENXIO);" at the start of the acpi_cpu_cx_cst function in sys/dev/acpica/acpi_cpu.c the OS falls back to FADT/P_BLK parsing and I get the following:

dev.cpu.0.cx_supported: C1/1/0 C2/2/1 C3/3/57
dev.cpu.1.cx_supported: C1/1/0 C2/2/1 C3/3/57

... which looks closer to the actual hardware to me. I've attached the acpidumps. The _CST method is in ssdt5.dsl. My laptop seems to fall into one of the cases at lines 115, 318, or 531. I wonder why they do this weird remapping of C-states?
Comment 4 John Baldwin freebsd_committer freebsd_triage 2014-10-24 14:58:31 UTC
I have a possible patch that will disable Cx entirely if you force the use of TSC on one of these CPUs.  However, can you do one more test for me.  Can you let your system use _CST as normal but restrict your C states to only using C1?  I want to make sure that C1E also stops the TSC.  (I'm pretty sure it does.)  I think we don't use C2 by default, so if you haven't explicitly changed something in sysctl.conf or rc.conf to enable C2 and you are having problems with the TSC as a timecounter, that would count as a test.
Comment 5 John Baldwin freebsd_committer freebsd_triage 2014-10-24 15:06:29 UTC
Created attachment 148617 [details]
Comment 6 Jan Kokemüller 2014-10-24 18:34:24 UTC
Your patch works well! TSC gets a priority of -1000 as expected. When I manually switch back to TSC, there are still no slowdowns as C-states get disabled.

I forgot to mention that there is no obvious C1E switch in the BIOS. There is a "CPU Power Management" switch that seems to disable all C-states except C1. The description says:

"Enables or disables the power saving feature that stops the microprocessor clock automatically when there are no system activities. Normally it is not necessary to change this setting."

On an unpatched kernel there are no slowdowns when I set this to "Disabled", both with TSC and HPET as kern.timecounter.hardware. FreeBSD then reports only C1 in cx_supported.

If I have "CPU Power Management" enabled, which is the default, there are no slowdowns if I set dev.cpu.{0,1}.cx_lowest to "C1". Currently, higher C-states are enabled per default. When I boot a recent installer of -current from a USB stick, slowdowns happen.
Comment 7 John Baldwin freebsd_committer freebsd_triage 2014-11-25 15:48:58 UTC
Created attachment 149826 [details]

Based on your feedback, my previous patch was too aggressive.  I had assumed C1/C1E was also busted for you, but it sounds like it isn't.  Please try this updated patch.  It should disable C2 but permit you to still use C1.
Comment 8 Jan Kokemüller 2014-12-10 12:35:50 UTC
Thanks, this patch works great. TSC gets a priority of -1000. After manually switching to TSC the cx_usage_counters keep incrementing in C1 only and there are no slowdowns. Switching back to HPET restores the usage of C2 correctly.
Comment 9 commit-hook freebsd_committer 2015-01-05 20:45:27 UTC
A commit references this bug:

Author: jhb
Date: Mon Jan  5 20:44:47 UTC 2015
New revision: 276724
URL: https://svnweb.freebsd.org/changeset/base/276724

  On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
  may also halt in C2 and not just C3 (it seems that in some cases the BIOS
  advertises its C3 state as a C2 state in _CST).  Just play it safe and
  disable both C2 and C3 states if a user forces the use of the TSC as the
  timecounter on such CPUs.

  PR:		192316
  Differential Revision:	https://reviews.freebsd.org/D1441
  No objection from:	jkim
  MFC after:	1 week

Comment 10 commit-hook freebsd_committer 2015-04-02 01:02:57 UTC
A commit references this bug:

Author: jhb
Date: Thu Apr  2 01:02:50 UTC 2015
New revision: 280973
URL: https://svnweb.freebsd.org/changeset/base/280973

  MFC 276724:
  On some Intel CPUs with a P-state but not C-state invariant TSC the TSC
  may also halt in C2 and not just C3 (it seems that in some cases the BIOS
  advertises its C3 state as a C2 state in _CST).  Just play it safe and
  disable both C2 and C3 states if a user forces the use of the TSC as the
  timecounter on such CPUs.

  PR:		192316

_U  stable/10/
_U  stable/9/sys/
_U  stable/9/sys/dev/
_U  stable/9/sys/sys/