Bug 221621 - Ryzen Threadripper - hwpstate0 - repeated set frequency failed err 6
Summary: Ryzen Threadripper - hwpstate0 - repeated set frequency failed err 6
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Conrad Meyer
URL: https://reviews.freebsd.org/D12082
Keywords:
Depends on:
Blocks: 219213
  Show dependency treegraph
 
Reported: 2017-08-19 05:42 UTC by Sean P.R.
Modified: 2017-12-06 02:14 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sean P.R. 2017-08-19 05:42:16 UTC
New build with Threadripper 1920X.  The kernel reports the following error repeatedly - ranging from a few seconds to a couple minutes between repeats:
 hwpstate0: set freq failed, err 6

No change if powerd is on or off.


uname -av
FreeBSD gzy 11.1-RELEASE FreeBSD 11.1-RELEASE #0 r321309: Fri Jul 21 02:08:28 UTC 2017     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

----

sysctl -a | grep hwpstate
hwpstate0: set freq failed, err 6
[repeated / fills buffer]
debug.hwpstate_verbose: 0
dev.hwpstate.0.freq_settings: 3500/4287 3000/3187 2200/1897
dev.hwpstate.0.%parent: cpu0
dev.hwpstate.0.%pnpinfo: 
dev.hwpstate.0.%location: 
dev.hwpstate.0.%driver: hwpstate
dev.hwpstate.0.%desc: Cool`n'Quiet 2.0
dev.hwpstate.%parent: 

----

sysctl -a | grep cpu
kern.smp.cpus: 24
kern.smp.maxcpus: 256
kern.ccpu: 0
  <cpu count="24" mask="ffffff,0,0,0">0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23</cpu>
    <cpu count="6" mask="3f,0,0,0">0, 1, 2, 3, 4, 5</cpu>
      <cpu count="2" mask="3,0,0,0">0, 1</cpu>
      <cpu count="2" mask="c,0,0,0">2, 3</cpu>
      <cpu count="2" mask="30,0,0,0">4, 5</cpu>
    <cpu count="6" mask="fc0,0,0,0">6, 7, 8, 9, 10, 11</cpu>
      <cpu count="2" mask="c0,0,0,0">6, 7</cpu>
      <cpu count="2" mask="300,0,0,0">8, 9</cpu>
      <cpu count="2" mask="c00,0,0,0">10, 11</cpu>
    <cpu count="6" mask="3f000,0,0,0">12, 13, 14, 15, 16, 17</cpu>
      <cpu count="2" mask="3000,0,0,0">12, 13</cpu>
      <cpu count="2" mask="c000,0,0,0">14, 15</cpu>
      <cpu count="2" mask="30000,0,0,0">16, 17</cpu>
    <cpu count="6" mask="fc0000,0,0,0">18, 19, 20, 21, 22, 23</cpu>
      <cpu count="2" mask="c0000,0,0,0">18, 19</cpu>
      <cpu count="2" mask="300000,0,0,0">20, 21</cpu>
      <cpu count="2" mask="c00000,0,0,0">22, 23</cpu>
kern.sched.cpusetsize: 32
kern.pin_pcpu_swi: 0
kern.racct.pcpu_threshold: 1
cpu	HAMMER
device	cpufreq
kern.vt.splash_cpu_duration: 10
kern.vt.splash_cpu_style: 2
kern.vt.splash_ncpu: 0
kern.vt.splash_cpu: 0
vfs.ncpurgeminvnodes: 2048
net.inet.tcp.per_cpu_timers: 0
debug.cpufreq.verbose: 0
debug.cpufreq.lowest: 0
debug.acpi.cpu_unordered: 0
hw.ncpu: 24
hw.acpi.cpu.cx_lowest: C2
dev.cpufreq.0.%parent: cpu0
dev.cpufreq.0.%pnpinfo: 
dev.cpufreq.0.%location: 
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%desc: 
dev.cpufreq.%parent: 
dev.hwpstate.0.%parent: cpu0
dev.acpi_perf.23.%parent: cpu23
dev.acpi_perf.22.%parent: cpu22
dev.acpi_perf.21.%parent: cpu21
dev.acpi_perf.20.%parent: cpu20
dev.acpi_perf.19.%parent: cpu19
dev.acpi_perf.18.%parent: cpu18
dev.acpi_perf.17.%parent: cpu17
dev.acpi_perf.16.%parent: cpu16
dev.acpi_perf.15.%parent: cpu15
dev.acpi_perf.14.%parent: cpu14
dev.acpi_perf.13.%parent: cpu13
dev.acpi_perf.12.%parent: cpu12
dev.acpi_perf.11.%parent: cpu11
dev.acpi_perf.10.%parent: cpu10
dev.acpi_perf.9.%parent: cpu9
dev.acpi_perf.8.%parent: cpu8
dev.acpi_perf.7.%parent: cpu7
dev.acpi_perf.6.%parent: cpu6
dev.acpi_perf.5.%parent: cpu5
dev.acpi_perf.4.%parent: cpu4
dev.acpi_perf.3.%parent: cpu3
dev.acpi_perf.2.%parent: cpu2
dev.acpi_perf.1.%parent: cpu1
dev.acpi_perf.0.%parent: cpu0
dev.nvme.0.num_cpus_per_ioq: 4
dev.cpu.23.cx_method: C1/hlt C2/io
dev.cpu.23.cx_usage_counters: 2453945 4102905
dev.cpu.23.cx_usage: 37.42% 62.57% last 979us
dev.cpu.23.cx_lowest: C2
dev.cpu.23.cx_supported: C1/1/1 C2/2/400
dev.cpu.23.%parent: acpi0
dev.cpu.23.%pnpinfo: _HID=none _UID=0
dev.cpu.23.%location: handle=\_PR_.C017
dev.cpu.23.%driver: cpu
dev.cpu.23.%desc: ACPI CPU
dev.cpu.22.cx_method: C1/hlt C2/io
dev.cpu.22.cx_usage_counters: 2600457 4181710
dev.cpu.22.cx_usage: 38.34% 61.65% last 3711us
dev.cpu.22.cx_lowest: C2
dev.cpu.22.cx_supported: C1/1/1 C2/2/400
dev.cpu.22.%parent: acpi0
dev.cpu.22.%pnpinfo: _HID=none _UID=0
dev.cpu.22.%location: handle=\_PR_.C016
dev.cpu.22.%driver: cpu
dev.cpu.22.%desc: ACPI CPU
dev.cpu.21.cx_method: C1/hlt C2/io
dev.cpu.21.cx_usage_counters: 2422602 3977879
dev.cpu.21.cx_usage: 37.85% 62.14% last 3908us
dev.cpu.21.cx_lowest: C2
dev.cpu.21.cx_supported: C1/1/1 C2/2/400
dev.cpu.21.%parent: acpi0
dev.cpu.21.%pnpinfo: _HID=none _UID=0
dev.cpu.21.%location: handle=\_PR_.C015
dev.cpu.21.%driver: cpu
dev.cpu.21.%desc: ACPI CPU
dev.cpu.20.cx_method: C1/hlt C2/io
dev.cpu.20.cx_usage_counters: 2536009 4052910
dev.cpu.20.cx_usage: 38.48% 61.51% last 1052us
dev.cpu.20.cx_lowest: C2
dev.cpu.20.cx_supported: C1/1/1 C2/2/400
dev.cpu.20.%parent: acpi0
dev.cpu.20.%pnpinfo: _HID=none _UID=0
dev.cpu.20.%location: handle=\_PR_.C014
dev.cpu.20.%driver: cpu
dev.cpu.20.%desc: ACPI CPU
dev.cpu.19.cx_method: C1/hlt C2/io
dev.cpu.19.cx_usage_counters: 2341676 3899865
dev.cpu.19.cx_usage: 37.51% 62.48% last 2us
dev.cpu.19.cx_lowest: C2
dev.cpu.19.cx_supported: C1/1/1 C2/2/400
dev.cpu.19.%parent: acpi0
dev.cpu.19.%pnpinfo: _HID=none _UID=0
dev.cpu.19.%location: handle=\_PR_.C013
dev.cpu.19.%driver: cpu
dev.cpu.19.%desc: ACPI CPU
dev.cpu.18.cx_method: C1/hlt C2/io
dev.cpu.18.cx_usage_counters: 2434036 3973339
dev.cpu.18.cx_usage: 37.98% 62.01% last 3489us
dev.cpu.18.cx_lowest: C2
dev.cpu.18.cx_supported: C1/1/1 C2/2/400
dev.cpu.18.%parent: acpi0
dev.cpu.18.%pnpinfo: _HID=none _UID=0
dev.cpu.18.%location: handle=\_PR_.C012
dev.cpu.18.%driver: cpu
dev.cpu.18.%desc: ACPI CPU
dev.cpu.17.cx_method: C1/hlt C2/io
dev.cpu.17.cx_usage_counters: 2355975 4002148
dev.cpu.17.cx_usage: 37.05% 62.94% last 1924us
dev.cpu.17.cx_lowest: C2
dev.cpu.17.cx_supported: C1/1/1 C2/2/400
dev.cpu.17.%parent: acpi0
dev.cpu.17.%pnpinfo: _HID=none _UID=0
dev.cpu.17.%location: handle=\_PR_.C011
dev.cpu.17.%driver: cpu
dev.cpu.17.%desc: ACPI CPU
dev.cpu.16.cx_method: C1/hlt C2/io
dev.cpu.16.cx_usage_counters: 2534527 4129061
dev.cpu.16.cx_usage: 38.03% 61.96% last 507us
dev.cpu.16.cx_lowest: C2
dev.cpu.16.cx_supported: C1/1/1 C2/2/400
dev.cpu.16.%parent: acpi0
dev.cpu.16.%pnpinfo: _HID=none _UID=0
dev.cpu.16.%location: handle=\_PR_.C010
dev.cpu.16.%driver: cpu
dev.cpu.16.%desc: ACPI CPU
dev.cpu.15.cx_method: C1/hlt C2/io
dev.cpu.15.cx_usage_counters: 2298008 3937197
dev.cpu.15.cx_usage: 36.85% 63.14% last 1186us
dev.cpu.15.cx_lowest: C2
dev.cpu.15.cx_supported: C1/1/1 C2/2/400
dev.cpu.15.%parent: acpi0
dev.cpu.15.%pnpinfo: _HID=none _UID=0
dev.cpu.15.%location: handle=\_PR_.C00F
dev.cpu.15.%driver: cpu
dev.cpu.15.%desc: ACPI CPU
dev.cpu.14.cx_method: C1/hlt C2/io
dev.cpu.14.cx_usage_counters: 2994036 4853700
dev.cpu.14.cx_usage: 38.15% 61.84% last 1567us
dev.cpu.14.cx_lowest: C2
dev.cpu.14.cx_supported: C1/1/1 C2/2/400
dev.cpu.14.%parent: acpi0
dev.cpu.14.%pnpinfo: _HID=none _UID=0
dev.cpu.14.%location: handle=\_PR_.C00E
dev.cpu.14.%driver: cpu
dev.cpu.14.%desc: ACPI CPU
dev.cpu.13.cx_method: C1/hlt C2/io
dev.cpu.13.cx_usage_counters: 2243953 3861132
dev.cpu.13.cx_usage: 36.75% 63.24% last 2914us
dev.cpu.13.cx_lowest: C2
dev.cpu.13.cx_supported: C1/1/1 C2/2/400
dev.cpu.13.%parent: acpi0
dev.cpu.13.%pnpinfo: _HID=none _UID=0
dev.cpu.13.%location: handle=\_PR_.C00D
dev.cpu.13.%driver: cpu
dev.cpu.13.%desc: ACPI CPU
dev.cpu.12.cx_method: C1/hlt C2/io
dev.cpu.12.cx_usage_counters: 2372562 3947137
dev.cpu.12.cx_usage: 37.54% 62.45% last 1280us
dev.cpu.12.cx_lowest: C2
dev.cpu.12.cx_supported: C1/1/1 C2/2/400
dev.cpu.12.%parent: acpi0
dev.cpu.12.%pnpinfo: _HID=none _UID=0
dev.cpu.12.%location: handle=\_PR_.C00C
dev.cpu.12.%driver: cpu
dev.cpu.12.%desc: ACPI CPU
dev.cpu.11.cx_method: C1/hlt C2/io
dev.cpu.11.cx_usage_counters: 2539566 4104573
dev.cpu.11.cx_usage: 38.22% 61.77% last 2991us
dev.cpu.11.cx_lowest: C2
dev.cpu.11.cx_supported: C1/1/1 C2/2/400
dev.cpu.11.%parent: acpi0
dev.cpu.11.%pnpinfo: _HID=none _UID=0
dev.cpu.11.%location: handle=\_PR_.C00B
dev.cpu.11.%driver: cpu
dev.cpu.11.%desc: ACPI CPU
dev.cpu.10.cx_method: C1/hlt C2/io
dev.cpu.10.cx_usage_counters: 2625097 4190803
dev.cpu.10.cx_usage: 38.51% 61.48% last 2765us
dev.cpu.10.cx_lowest: C2
dev.cpu.10.cx_supported: C1/1/1 C2/2/400
dev.cpu.10.%parent: acpi0
dev.cpu.10.%pnpinfo: _HID=none _UID=0
dev.cpu.10.%location: handle=\_PR_.C00A
dev.cpu.10.%driver: cpu
dev.cpu.10.%desc: ACPI CPU
dev.cpu.9.cx_method: C1/hlt C2/io
dev.cpu.9.cx_usage_counters: 2476575 4019405
dev.cpu.9.cx_usage: 38.12% 61.87% last 3526us
dev.cpu.9.cx_lowest: C2
dev.cpu.9.cx_supported: C1/1/1 C2/2/400
dev.cpu.9.%parent: acpi0
dev.cpu.9.%pnpinfo: _HID=none _UID=0
dev.cpu.9.%location: handle=\_PR_.C009
dev.cpu.9.%driver: cpu
dev.cpu.9.%desc: ACPI CPU
dev.cpu.8.cx_method: C1/hlt C2/io
dev.cpu.8.cx_usage_counters: 2641831 4109578
dev.cpu.8.cx_usage: 39.13% 60.86% last 2049us
dev.cpu.8.cx_lowest: C2
dev.cpu.8.cx_supported: C1/1/1 C2/2/400
dev.cpu.8.%parent: acpi0
dev.cpu.8.%pnpinfo: _HID=none _UID=0
dev.cpu.8.%location: handle=\_PR_.C008
dev.cpu.8.%driver: cpu
dev.cpu.8.%desc: ACPI CPU
dev.cpu.7.cx_method: C1/hlt C2/io
dev.cpu.7.cx_usage_counters: 2409144 3945099
dev.cpu.7.cx_usage: 37.91% 62.08% last 11695us
dev.cpu.7.cx_lowest: C2
dev.cpu.7.cx_supported: C1/1/1 C2/2/400
dev.cpu.7.%parent: acpi0
dev.cpu.7.%pnpinfo: _HID=none _UID=0
dev.cpu.7.%location: handle=\_PR_.C007
dev.cpu.7.%driver: cpu
dev.cpu.7.%desc: ACPI CPU
dev.cpu.6.cx_method: C1/hlt C2/io
dev.cpu.6.cx_usage_counters: 2489036 4013233
dev.cpu.6.cx_usage: 38.27% 61.72% last 2511us
dev.cpu.6.cx_lowest: C2
dev.cpu.6.cx_supported: C1/1/1 C2/2/400
dev.cpu.6.%parent: acpi0
dev.cpu.6.%pnpinfo: _HID=none _UID=0
dev.cpu.6.%location: handle=\_PR_.C006
dev.cpu.6.%driver: cpu
dev.cpu.6.%desc: ACPI CPU
dev.cpu.5.cx_method: C1/hlt C2/io
dev.cpu.5.cx_usage_counters: 2432103 3961948
dev.cpu.5.cx_usage: 38.03% 61.96% last 1532us
dev.cpu.5.cx_lowest: C2
dev.cpu.5.cx_supported: C1/1/1 C2/2/400
dev.cpu.5.%parent: acpi0
dev.cpu.5.%pnpinfo: _HID=none _UID=0
dev.cpu.5.%location: handle=\_PR_.C005
dev.cpu.5.%driver: cpu
dev.cpu.5.%desc: ACPI CPU
dev.cpu.4.cx_method: C1/hlt C2/io
dev.cpu.4.cx_usage_counters: 2902090 4284426
dev.cpu.4.cx_usage: 40.38% 59.61% last 1432us
dev.cpu.4.cx_lowest: C2
dev.cpu.4.cx_supported: C1/1/1 C2/2/400
dev.cpu.4.%parent: acpi0
dev.cpu.4.%pnpinfo: _HID=none _UID=0
dev.cpu.4.%location: handle=\_PR_.C004
dev.cpu.4.%driver: cpu
dev.cpu.4.%desc: ACPI CPU
dev.cpu.3.cx_method: C1/hlt C2/io
dev.cpu.3.cx_usage_counters: 2403177 3900568
dev.cpu.3.cx_usage: 38.12% 61.87% last 143us
dev.cpu.3.cx_lowest: C2
dev.cpu.3.cx_supported: C1/1/1 C2/2/400
dev.cpu.3.%parent: acpi0
dev.cpu.3.%pnpinfo: _HID=none _UID=0
dev.cpu.3.%location: handle=\_PR_.C003
dev.cpu.3.%driver: cpu
dev.cpu.3.%desc: ACPI CPU
dev.cpu.2.cx_method: C1/hlt C2/io
dev.cpu.2.cx_usage_counters: 2492593 3971882
dev.cpu.2.cx_usage: 38.55% 61.44% last 3067us
dev.cpu.2.cx_lowest: C2
dev.cpu.2.cx_supported: C1/1/1 C2/2/400
dev.cpu.2.%parent: acpi0
dev.cpu.2.%pnpinfo: _HID=none _UID=0
dev.cpu.2.%location: handle=\_PR_.C002
dev.cpu.2.%driver: cpu
dev.cpu.2.%desc: ACPI CPU
dev.cpu.1.cx_method: C1/hlt C2/io
dev.cpu.1.cx_usage_counters: 2427833 3862553
dev.cpu.1.cx_usage: 38.59% 61.40% last 3042us
dev.cpu.1.cx_lowest: C2
dev.cpu.1.cx_supported: C1/1/1 C2/2/400
dev.cpu.1.%parent: acpi0
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%location: handle=\_PR_.C001
dev.cpu.1.%driver: cpu
dev.cpu.1.%desc: ACPI CPU
dev.cpu.0.cx_method: C1/hlt C2/io
dev.cpu.0.cx_usage_counters: 34418398 12688334
dev.cpu.0.cx_usage: 73.06% 26.93% last 803us
dev.cpu.0.cx_lowest: C2
dev.cpu.0.cx_supported: C1/1/1 C2/2/400
dev.cpu.0.freq_levels: 3500/4287 3000/3187 2200/1897
dev.cpu.0.freq: 3500
dev.cpu.0.%parent: acpi0
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%location: handle=\_PR_.C000
dev.cpu.0.%driver: cpu
dev.cpu.0.%desc: ACPI CPU
dev.cpu.%parent: 
security.jail.param.cpuset.id: 0
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2017-08-19 06:19:01 UTC
Well, I don't know what is actually logging that line.  But I expect the problem lies in hwpstates, probably in the family-specific MSR support that does not yet include 17h (Zen):

https://svnweb.freebsd.org/base/head/sys/x86/cpufreq/hwpstate.c?view=markup#l397

I don't have this hardware around yet (waiting on a board to come back in stock), but I know a couple others do.
Comment 2 Conrad Meyer freebsd_committer freebsd_triage 2017-08-19 06:25:00 UTC
By the way, setting 'debug.hwpstate_verbose="1"' in loader.conf may show a little more of what is going on.
Comment 3 Conrad Meyer freebsd_committer freebsd_triage 2017-08-19 06:42:43 UTC
Please try the following patch:

--- a/sys/x86/cpufreq/hwpstate.c
+++ b/sys/x86/cpufreq/hwpstate.c
@@ -83,6 +83,10 @@ __FBSDID("$FreeBSD$");
 #define        AMD_10H_11H_CUR_DID(msr)                (((msr) >> 6) & 0x07)
 #define        AMD_10H_11H_CUR_FID(msr)                ((msr) & 0x3F)

+#define        AMD_17H_CUR_VID(msr)                    (((msr) >> 14) & 0xFF)
+#define        AMD_17H_CUR_DID(msr)                    (((msr) >> 8) & 0x3F)
+#define        AMD_17H_CUR_FID(msr)                    ((msr) & 0xFF)
+
 #define        HWPSTATE_DEBUG(dev, msg...)                     \
        do{                                             \
                if(hwpstate_verbose)                    \
@@ -427,6 +431,15 @@ hwpstate_get_info_from_msr(device_t dev)
                case 0x16:
                        hwpstate_set[i].freq = (100 * (fid + 0x10)) >> did;
                        break;
+               case 0x17:
+                       did = AMD_17H_CUR_DID(msr);
+                       if (did == 0) {
+                               HWPSTATE_DEBUG(dev, "unexpected did: 0\n");
+                               did = 1;
+                       }
+                       fid = AMD_17H_CUR_FID(msr);
+                       hwpstate_set[i].freq = (200 * fid) / did;
+                       break;
                default:
                        HWPSTATE_DEBUG(dev, "get_info_from_msr: AMD family"
                            " 0x%02x CPUs are not supported yet\n", family);

It is based on the Family 17h Reference manual and compiles, but obviously I am unable to test it.
Comment 4 Sean P.R. 2017-08-19 08:28:30 UTC
(In reply to Conrad Meyer from comment #2)

Thanks.  Here's what verbose mode returns.  I'll take the system down to apply the patch in the morning and report back:

2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu0
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu1
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu2
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu3
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu4
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu5
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu6
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu7
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu8
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu9
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu10
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu11
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu12
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu13
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu14
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu15
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu16
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu17
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu18
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu19
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu20
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu21
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu22
2017-08-19 01:24:03	gzy	kernel	hwpstate0: setting P1-state on cpu23
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu0
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu1
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu2
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu3
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu4
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu5
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu6
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu7
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu8
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu9
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu10
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu11
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu12
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu13
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu14
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu15
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu16
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu17
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu18
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu19
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu20
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu21
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu22
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: result  P2-state on cpu23
2017-08-19 01:24:03	gzy	kernel	hwpstate0: error: loop is not enough.
2017-08-19 01:24:03	gzy	kernel	hwpstate0: set freq failed, err 6
Comment 5 Conrad Meyer freebsd_committer freebsd_triage 2017-08-19 18:13:45 UTC
Hm, it's interesting that setting pstates fails like that.  The current code should be valid on Zen.  I don't see any documentation in the reference manual on transition latency, but given this loop is processed sequentially it seems that it never succeeded even after 24 * 0.01 = 0.24 seconds.  That seems like it should be plenty of time to me, but I don't know.

Maybe hardware or some other system is controlling the p-state on your system?
Comment 6 Sean P.R. 2017-08-19 18:56:54 UTC
(In reply to Conrad Meyer from comment #5)
As your last post suspected, that patch didn't change the hwpstate0 error.  

I ran powerd -v and it returns the following messages, if they're helpful.  

I only hesitate in saying that you're almost certainly right that something in the hardware or firmware is controlling the power because of the powerd output below  (says no frequency adjustments happening).  OTOH, the system has definitely been running much (_much_) cooler than I thought it would.

I'll install Windows later today to see if its power management changes the temperature,fan speed, etc. in any way that's noticeably different.

Any other ideas?  And, if not, what's the cleanest way to stop the hwpstate0 messages from filling my log without losing any important functionality?

Thanks again.  Here's the powerd output:

powerd -v
powerd: unable to determine AC line status
load   7%, current freq 3500 MHz ( 0), wanted freq 3390 MHz
load  14%, current freq 3500 MHz ( 0), wanted freq 3284 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 3181 MHz
load   6%, current freq 3500 MHz ( 0), wanted freq 3081 MHz
load  10%, current freq 3500 MHz ( 0), wanted freq 2984 MHz
changing clock speed from 3500 MHz to 3000 MHz
powerd: error setting CPU frequency 3000: Device not configured
load  83%, current freq 3500 MHz ( 0), wanted freq 7000 MHz
load  17%, current freq 3500 MHz ( 0), wanted freq 6781 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 6569 MHz
load  19%, current freq 3500 MHz ( 0), wanted freq 6363 MHz
load   3%, current freq 3500 MHz ( 0), wanted freq 6164 MHz
load  19%, current freq 3500 MHz ( 0), wanted freq 5971 MHz
load   2%, current freq 3500 MHz ( 0), wanted freq 5784 MHz
load   4%, current freq 3500 MHz ( 0), wanted freq 5603 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 5427 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 5257 MHz
load   2%, current freq 3500 MHz ( 0), wanted freq 5092 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 4932 MHz
load   9%, current freq 3500 MHz ( 0), wanted freq 4777 MHz
load  14%, current freq 3500 MHz ( 0), wanted freq 4627 MHz
load   5%, current freq 3500 MHz ( 0), wanted freq 4482 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 4341 MHz
load   5%, current freq 3500 MHz ( 0), wanted freq 4205 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 4073 MHz
load   6%, current freq 3500 MHz ( 0), wanted freq 3945 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 3821 MHz
load   3%, current freq 3500 MHz ( 0), wanted freq 3701 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 3585 MHz
load  14%, current freq 3500 MHz ( 0), wanted freq 3472 MHz
load   5%, current freq 3500 MHz ( 0), wanted freq 3363 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 3257 MHz
load   4%, current freq 3500 MHz ( 0), wanted freq 3155 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 3056 MHz
load   6%, current freq 3500 MHz ( 0), wanted freq 2960 MHz
changing clock speed from 3500 MHz to 3000 MHz
powerd: error setting CPU frequency 3000: Device not configured
load 142%, current freq 3500 MHz ( 0), wanted freq 7000 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 6781 MHz
load   7%, current freq 3500 MHz ( 0), wanted freq 6569 MHz
load   5%, current freq 3500 MHz ( 0), wanted freq 6363 MHz
load   8%, current freq 3500 MHz ( 0), wanted freq 6164 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 5971 MHz
load   6%, current freq 3500 MHz ( 0), wanted freq 5784 MHz
load   3%, current freq 3500 MHz ( 0), wanted freq 5603 MHz
load  11%, current freq 3500 MHz ( 0), wanted freq 5427 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 5257 MHz
load  20%, current freq 3500 MHz ( 0), wanted freq 5092 MHz
load   4%, current freq 3500 MHz ( 0), wanted freq 4932 MHz
load  34%, current freq 3500 MHz ( 0), wanted freq 4932 MHz
load   6%, current freq 3500 MHz ( 0), wanted freq 4777 MHz
load   3%, current freq 3500 MHz ( 0), wanted freq 4627 MHz
load   5%, current freq 3500 MHz ( 0), wanted freq 4482 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 4341 MHz
load   2%, current freq 3500 MHz ( 0), wanted freq 4205 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 4073 MHz
load   6%, current freq 3500 MHz ( 0), wanted freq 3945 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 3821 MHz
load   6%, current freq 3500 MHz ( 0), wanted freq 3701 MHz
load   4%, current freq 3500 MHz ( 0), wanted freq 3585 MHz
load   3%, current freq 3500 MHz ( 0), wanted freq 3472 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 3363 MHz
load  10%, current freq 3500 MHz ( 0), wanted freq 3257 MHz
load   6%, current freq 3500 MHz ( 0), wanted freq 3155 MHz
load  28%, current freq 3500 MHz ( 0), wanted freq 3155 MHz
load   8%, current freq 3500 MHz ( 0), wanted freq 3056 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 2960 MHz
changing clock speed from 3500 MHz to 3000 MHz
powerd: error setting CPU frequency 3000: Device not configured
load 103%, current freq 3500 MHz ( 0), wanted freq 7000 MHz
load   6%, current freq 3500 MHz ( 0), wanted freq 6781 MHz
load   0%, current freq 3500 MHz ( 0), wanted freq 6569 MHz
load  14%, current freq 3500 MHz ( 0), wanted freq 6363 MHz
^Ctotal joules used: 71.757
Comment 7 Conrad Meyer freebsd_committer freebsd_triage 2017-08-19 19:13:02 UTC
(In reply to Sean P.R. from comment #6)
I suspect the repeated logging is coming from powerd's attempts to change the p-state (error 6 == ENXIO == "Device not configured").  If you disable the powerd service, does the log spam go away?  It doesn't seem like powerd is doing anything for now anyway, so disabling it seems harmless.

It's interesting that powerd is reporting the machine in P0, and attempting to transition to P1.  Attempting to set P1 matches what the hwpstate module is logging; however, hwpstate is reporting that the machine is actually in P2 instead!

So it seems there is some discrepancy between what hwpstate sees and what powerd sees.  Perhaps powerd is just seeing stale cached data from hwpstate.

Either way, if the CPU is truly in P2 (2.2GHz base), that would explain the low temperatures.
Comment 8 Sean P.R. 2017-08-19 19:19:24 UTC
(In reply to Conrad Meyer from comment #7)

The hwpstate0 error messages stay the same regardless whether I have powerd disabled or enabled...  Sounds like nothing else to try at the moment then?  thanks again.
Comment 9 Conrad Meyer freebsd_committer freebsd_triage 2017-08-19 19:32:07 UTC
(In reply to Sean P.R. from comment #8)
Yeah, sorry, I don't have any other ideas.  I couldn't find that specific log message so I don't know where it's coming from.
Comment 10 Conrad Meyer freebsd_committer freebsd_triage 2017-08-19 19:48:06 UTC
The error message is the same as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=167018 .  Unfortunately, the change that fixed it for those processors is already included in CURRENT and 11.1, so it seems we have a different issue here.  Perhaps the MSRC001_0071 register mentioned in https://lists.freebsd.org/pipermail/freebsd-stable/2012-May/067758.html would work better on Zen?  I don't see it documented in the reference manual, though.

Instead, there is MSRC001_0293 "Hardware PState Status," which has HW p-state in bits 22-24?

But the text seems to indicate that the existing register is the right one to check:

> ยง 2.1.5.2.1.2    Core P-state Visibility
>
> MSRC001_0063[CurPstate] reflects the current non-boosted P-state number for
> each core. For example, if Core::X86::Msr::PStateStat[CurPstate]=001b on core
> 1, then core 1 is in the P1 state.

Sorry I'm not more help :-).
Comment 11 mailsed 2017-08-21 10:02:54 UTC
(In reply to Conrad Meyer from comment #10)

Hi,

I want to know where the text is. Because
in https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf ,
There is no content of the section "2.1.3 CPU Power Management".
the next line is the section 2.1.4.
If you have this section's content, you can go forward, I think.
I'm waiting for this section's content.

By the way, the error messages should be gone away, if powerd is not running.
I wonder why :-)?
Comment 12 Conrad Meyer freebsd_committer freebsd_triage 2017-08-21 15:59:23 UTC
(In reply to mailsed from comment #11)
The updated (in-progress, not finalized) version of the PPR on AMD's developer hub has some content in "CPU Power Management," now numbered 2.1.5 (but that may change).
Comment 13 Sean P.R. 2017-08-21 18:30:34 UTC
(In reply to Conrad Meyer from comment #12)

After some playing around yesterday I think your earlier suspicious is right - pretty sure the BIOS is controlling the p-state. 

MB is msi X399 "Gaming Pro Carbon AC."  Uses AMI BIOS.  Same behavior under BIOS versions 1.2 (shipped), 1.3 (official release, flashed), and 1.43 (beta release, flashed)

Since the sensors aren't giving much useful information, I've been focused on watching the fans; I can see that the fans increase and decrease speed according to the load.  Very unscientifically, fan behavior looks about the same under ordinary use as in Windows, where I can see from the MB manufacturer's tool that the p-state varies a bit under load.  Of course, there are also AMD chipset drivers that allow more fine-grained control under Windows, so BIOS control shouldn't be mutually exclusive.

One possible lead:  I am pretty sure that the MB LEDs always read "AA" when booted into FreeBSD.  According to the manual, that means "System has transitioned into ACPI mode. Interrupt controller is in APIC mode." - while the CPU temperature should be displayed "after the CPU has fully booted into the OS."  It's possible I'm misreading "99" as "AA," but I'm pretty sure not.  See PDF page 48 of http://download.msi.com/archive/mnu_exe/E7B09v1.0.zip

Re error messages- In addition to disabling powerd, I turned everything I could find related to AHCI software control (I'm impatient :P).  That fixed it and things are running acceptably* (*in terms of power management; performance is great under FreeBSD and most common tasks are faster than Windows).

Happy to help debug more if you like.


[public service p.s. in case someone having other trouble with this MB happens across this message in a search:  (1) turn off the XHCI hand off or there's a high probability the USB controller will go into a reset loop; and (2) virtualization is turned off by default and BIOS versions < the 1.43 beta inexplicably lack the option to turn it on, so bhyve and vbox will fail until you flash the BIOS and enable "SVM"]
Comment 14 Sean P.R. 2017-08-22 04:02:32 UTC
One more data point - OpenBSD's power management appears to work fine with no errors reported, in case their code's helpful in debugging.  Set to hw.perfpolicy=auto it switches between "low" (2200MHz) and "high" (3500MHz) based on how hard I'm working the machine, or at least that's what the daemon log reports (in a series of short tests).  Manually switching to low keeps things at 2200MHz in the same tests.
Comment 15 Conrad Meyer freebsd_committer freebsd_triage 2017-08-22 04:53:55 UTC
(In reply to Sean P.R. from comment #14)
If someone wants to investigate what OpenBSD does differently here, I think the relevant files are:

https://github.com/openbsd/src/blob/master/sys/arch/amd64/amd64/k1x-pstate.c
https://github.com/openbsd/src/blob/master/sys/dev/acpi/acpicpu.c

(Grep for cpu_setperf(), which is a callback called by the "auto" hw.perfpolicy policy in sched_bsd.c.)
Comment 16 Conrad Meyer freebsd_committer freebsd_triage 2017-09-06 02:09:45 UTC
I've noticed a curious thing on my 1950X.  Setting P0 or P2 works fine.  But P1 does not:

$ sysctl dev.cpu.0.freq=2800
dev.cpu.0.freq: 3400
sysctl: dev.cpu.0.freq=2800: Device not configured
$ sysctl dev.cpu.0.freq=2200
dev.cpu.0.freq: 3400 -> 2200
$ sysctl dev.cpu.0.freq=3400
dev.cpu.0.freq: 2200 -> 3400

Do you observe the same (well, similar) thing on your 1920X?  OpenBSD's power management might "work" by only toggling between P0 and P2.  Or otherwise ignoring the inability to set P1.
Comment 17 Conrad Meyer freebsd_committer freebsd_triage 2017-09-06 02:36:40 UTC
By the way, I also note that the failure to set P1 drives the load average up enough that powerd thinks P0 is correct again.  So we don't just try P2 eventually because of that, I think.  I see the same thing as you (comment 6) (except my P0 is 3400 and P2 is only 2800):

load  10%, current freq 3500 MHz ( 0), wanted freq 2984 MHz
changing clock speed from 3500 MHz to 3000 MHz
powerd: error setting CPU frequency 3000: Device not configured
load  83%, current freq 3500 MHz ( 0), wanted freq 7000 MHz
      ^^^                              ^^^^^^^^^^^^^^^^^^^^
...
load   6%, current freq 3500 MHz ( 0), wanted freq 2960 MHz
changing clock speed from 3500 MHz to 3000 MHz
powerd: error setting CPU frequency 3000: Device not configured
load 142%, current freq 3500 MHz ( 0), wanted freq 7000 MHz
     ^^^^                              ^^^^^^^^^^^^^^^^^^^^
...
load   0%, current freq 3500 MHz ( 0), wanted freq 2960 MHz
changing clock speed from 3500 MHz to 3000 MHz
powerd: error setting CPU frequency 3000: Device not configured
load 103%, current freq 3500 MHz ( 0), wanted freq 7000 MHz
     ^^^^                              ^^^^^^^^^^^^^^^^^^^^
Comment 18 Conrad Meyer freebsd_committer freebsd_triage 2017-09-06 20:02:25 UTC
I've changed hwpstate's mechanism to wait for CPUs to switch frequency levels from DELAY() to sleep()[0].   This allows the CPU to idle instead of spinning for the duration of the failed transition to P1.  This at least allows powerd to switch to P2 eventually:

$ powerd -v
...
// Load is added
...
load  34%, current freq 2200 MHz ( 2), wanted freq 2200 MHz
load 100%, current freq 2200 MHz ( 2), wanted freq 6800 MHz
changing clock speed from 2200 MHz to 3400 MHz
load 100%, current freq 3400 MHz ( 0), wanted freq 6800 MHz
...
// Load is removed
...
load   0%, current freq 3400 MHz ( 0), wanted freq 2877 MHz
load   0%, current freq 3400 MHz ( 0), wanted freq 2787 MHz
changing clock speed from 3400 MHz to 2800 MHz
powerd: error setting CPU frequency 2800: Device not configured
load   2%, current freq 3400 MHz ( 0), wanted freq 2699 MHz
changing clock speed from 3400 MHz to 2800 MHz
powerd: error setting CPU frequency 2800: Device not configured
load   2%, current freq 3400 MHz ( 0), wanted freq 2614 MHz
changing clock speed from 3400 MHz to 2800 MHz
powerd: error setting CPU frequency 2800: Device not configured
load   2%, current freq 3400 MHz ( 0), wanted freq 2532 MHz
changing clock speed from 3400 MHz to 2800 MHz
powerd: error setting CPU frequency 2800: Device not configured
load   2%, current freq 3400 MHz ( 0), wanted freq 2452 MHz
changing clock speed from 3400 MHz to 2800 MHz
powerd: error setting CPU frequency 2800: Device not configured
load   0%, current freq 3400 MHz ( 0), wanted freq 2375 MHz
changing clock speed from 3400 MHz to 2800 MHz
powerd: error setting CPU frequency 2800: Device not configured
load   0%, current freq 3400 MHz ( 0), wanted freq 2300 MHz
changing clock speed from 3400 MHz to 2800 MHz
powerd: error setting CPU frequency 2800: Device not configured
load   0%, current freq 3400 MHz ( 0), wanted freq 2228 MHz
changing clock speed from 3400 MHz to 2800 MHz
powerd: error setting CPU frequency 2800: Device not configured
load   2%, current freq 3400 MHz ( 0), wanted freq 2200 MHz
changing clock speed from 3400 MHz to 2200 MHz
load   0%, current freq 2200 MHz ( 2), wanted freq 2200 MHz

[0]: https://reviews.freebsd.org/D12260
Comment 19 commit-hook freebsd_committer freebsd_triage 2017-09-07 20:21:31 UTC
A commit references this bug:

Author: cem
Date: Thu Sep  7 20:20:12 UTC 2017
New revision: 323286
URL: https://svnweb.freebsd.org/changeset/base/323286

Log:
  cpufreq(4) hwpstate: Yield CPU awaiting frequency change

  It doesn't seem necessary to busy the CPU while waiting to transition
  into a different p-state.

  PR:		221621 (related, but does not completely address)
  Reviewed by:	truckman
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D12260

Changes:
  head/sys/x86/cpufreq/hwpstate.c
Comment 20 Sean P.R. 2017-09-30 20:37:28 UTC
(In reply to Conrad Meyer from comment #18)

Sorry I dropped off for a while.  Please let me know if there's something I can help test this weekend.
Comment 21 Conrad Meyer freebsd_committer freebsd_triage 2017-11-30 20:32:56 UTC
Resolved by r326383 + r326407.  Now we simply do not verify if the value set reads back from MSR.