Bug 218264

Summary: amdtemp: Does not recognize AMD Ryzen/Chipset temperature sensors
Product: Base System Reporter: Nils Beyer <nbe>
Component: kernAssignee: Conrad Meyer <cem>
Status: Closed FIXED    
Severity: Affects Only Me CC: avg, cem, knz, mjoras, rozhuk.im, truckman
Priority: --- Flags: koobs: mfc-stable11+
Version: CURRENT   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239607
Attachments:
Description Flags
superiotool dump parser for NCT6779D chips...
none
"devinfo -v" output
none
dmesg -a output none

Description Nils Beyer 2017-03-31 16:12:41 UTC
My system: FreeBSD 12.0-CURRENT #0 334829e6c(drm-next)-dirty
My CPU: AMD Ryzen 1700

after kldloading "amdtemp", there are still no temperature values under 
"sysctl dev.cpu".

dmidecode -t processor
---------------------------------------------------------------------------------
# dmidecode 3.0
Scanning /dev/mem for entry point.
SMBIOS 3.0 present.

Handle 0x002D, DMI type 4, 48 bytes
Processor Information
        Socket Designation: AM4
        Type: Central Processor
        Family: <OUT OF SPEC>
        Manufacturer: Advanced Micro Devices, Inc.
        ID: 11 0F 80 00 FF FB 8B 17
        Version: AMD Ryzen 7 1700 Eight-Core Processor          
        Voltage: 1.2 V
        External Clock: 100 MHz
        Max Speed: 3400 MHz
        Current Speed: 3400 MHz
        Status: Populated, Enabled
        Upgrade: <OUT OF SPEC>
        L1 Cache Handle: 0x002A
        L2 Cache Handle: 0x002B
        L3 Cache Handle: 0x002C
        Serial Number: Unknown
        Asset Tag: Unknown
        Part Number: Unknown
        Core Count: 8
        Core Enabled: 8
        Thread Count: 16
        Characteristics:
                64-bit capable
---------------------------------------------------------------------------------



TIA and regards,
Nils
Comment 1 Nils Beyer 2017-06-28 13:59:26 UTC
Created attachment 183887 [details]
superiotool dump parser for NCT6779D chips...
Comment 2 Nils Beyer 2017-06-28 14:01:35 UTC
FWIW, I've changed my mainboard to "ASRock AB350 Pro4". The port "superiotool" is able to read the sensor's data (NCT6779D) - so I've created a quick&dirty parser for its output:
-------------------------------------------------------------------------
#./nct6779d.lua
MB temperature:  38 (hyst=0, crit=75)
CPU temperature: 52 (hyst=75, crit=80)
FAN1:            0
FAN1 pulse:      2
FAN2:            19
FAN2 pulse:      2
FAN3:            255
FAN3 pulse:      2
FAN4:            255
FAN4 pulse:      2
FAN5:            255
FAN5 pulse:      2
FAN6:            255
-------------------------------------------------------------------------
(at the moment 14 cores are busy compiling poudriere stuff)
Comment 3 Don Lewis freebsd_committer freebsd_triage 2017-07-13 00:26:52 UTC
(In reply to Nils Beyer from comment #0)

AMD hasn't released the documentation for this yet.  Linux is in the same boat.
Comment 4 Raphael 'kena' Poss 2017-08-18 10:41:40 UTC
Bug also applies AMD Ryzen 1800X - I'm affected too.
Comment 5 Conrad Meyer freebsd_committer freebsd_triage 2017-09-01 23:11:41 UTC
(In reply to Nils Beyer from comment #1)
This is amusing, but for context I have a similar nct sensor on my Intel system and it doesn't track the core temperatures at all.

At idle, it reports about 8°C cooler than the package/core sensors.

Under load, it climbs more slowly and reaches a peak ~30°C below the package/core sensors.

Back to idle, it starts falling again while the cores still report 10°C higher.

So I don't think this sensor is sufficient even if we integrated it better.

Fortunately, I was able to find the AMD SB-TSI (sideband temperature sensor) spec which supposedly documents communication with the CPU temperature sensor over SMBus.
Comment 6 Conrad Meyer freebsd_committer freebsd_triage 2017-09-02 17:05:54 UTC
Here's a patch that should add Zen support to amdtemp(4): https://reviews.freebsd.org/D12217

Please give it a spin and let me know what you think.
Comment 7 Raphael 'kena' Poss 2017-09-02 19:45:49 UTC
Ryzen 1800X over here.

# kldload amdsnm
amdsmn0: <AMD Family 17h System Management Network> on hostb0
# kldload amdtemp
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0
# sysctl -a | grep temp
dev.amdtemp.0.core0.sensor0: 45.1C
dev.amdtemp.0.sensor_offset: 0
dev.amdtemp.0.%parent: hostb0
dev.amdtemp.0.%pnpinfo:
dev.amdtemp.0.%location:
dev.amdtemp.0.%driver: amdtemp
dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors
dev.amdtemp.%parent:
dev.cpu.15.temperature: 45.1C
dev.cpu.14.temperature: 45.1C
dev.cpu.13.temperature: 45.1C
dev.cpu.12.temperature: 45.1C
dev.cpu.11.temperature: 45.1C
dev.cpu.10.temperature: 45.1C
dev.cpu.9.temperature: 45.1C
dev.cpu.8.temperature: 45.1C
dev.cpu.7.temperature: 54.5C
dev.cpu.6.temperature: 54.5C
dev.cpu.5.temperature: 54.5C
dev.cpu.4.temperature: 54.5C
dev.cpu.3.temperature: 54.5C
dev.cpu.2.temperature: 54.5C
dev.cpu.1.temperature: 54.5C
dev.cpu.0.temperature: 54.5C

Seems to work. What's the difference between core0.sensor0 and the per-cpu temperature values?
Comment 8 Conrad Meyer freebsd_committer freebsd_triage 2017-09-02 22:18:16 UTC
(In reply to Raphael 'kena' Poss from comment #7)
> Seems to work. What's the difference between core0.sensor0 and the per-cpu
> temperature values?

No difference -- you're just seeing measurements from slightly different points in time.  sysctl -a just generates enough CPU activity to bring the temperature up from idle on some CPUs.  It's all one single sensor per Zeppelin die (and there's one die in the 1800X).
Comment 9 Raphael 'kena' Poss 2017-09-03 00:42:01 UTC
Brilliant, thanks for the good work.
Comment 10 Raphael 'kena' Poss 2017-09-03 00:49:38 UTC
Huh could it be that this sensor hardware is shared with previous AMD tech?

Check this out: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/vega10/THM/thm_9_0_offset.h
Comment 11 Don Lewis freebsd_committer freebsd_triage 2017-09-03 01:08:58 UTC
(In reply to Raphael 'kena' Poss from comment #10)
Those files are for AMD GPUs.
Comment 12 Conrad Meyer freebsd_committer freebsd_triage 2017-09-03 15:25:03 UTC
(In reply to Raphael 'kena' Poss from comment #10)
Yeah, it wouldn't be too surprising if it's shared across models.  That said, Vega isn't exactly "previous" AMD tech :-).
Comment 13 Raphael 'kena' Poss 2017-09-03 20:30:51 UTC
So there are other sensors possibly in there. For example the temperature coefficient.
Comment 14 Conrad Meyer freebsd_committer freebsd_triage 2017-09-03 20:48:54 UTC
(In reply to Raphael 'kena' Poss from comment #13)
CUR_TMP, HTC, and THERM_TRIP match the PPR.  Nothing else is documented between that and what would be 0x80 and 0x81 in the Vega link (not present in Vega).

When I attempt to read from the (undocumented for Zen) Vega THM_TMON0_COEFF (index 0x5e == 0x59978), I get 0x00024068.  Any idea how that field decodes?  Is that sane for that register on Vega?
Comment 15 Raphael 'kena' Poss 2017-09-03 21:09:58 UTC
I'm sorry I can't say, no access to that hardware for now.
Comment 16 Nils Beyer 2017-09-04 07:56:31 UTC
(In reply to Conrad Meyer from comment #6)

thanks a lot for that patch - applied it on 11-STABLE r323151.

Unfortunately, temperature values are all strange:
-------------------------------------------------------------------------------
root@asbach:/usr/src/#sysctl dev.cpu | grep temp
dev.cpu.15.temperature: 255.9C
dev.cpu.14.temperature: 255.9C
dev.cpu.13.temperature: 255.9C
dev.cpu.12.temperature: 255.9C
dev.cpu.11.temperature: 255.9C
dev.cpu.10.temperature: 255.9C
dev.cpu.9.temperature: 255.9C
dev.cpu.8.temperature: 255.9C
dev.cpu.7.temperature: 255.9C
dev.cpu.6.temperature: 255.9C
dev.cpu.5.temperature: 255.9C
dev.cpu.4.temperature: 255.9C
dev.cpu.3.temperature: 255.9C
dev.cpu.2.temperature: 255.9C
dev.cpu.1.temperature: 255.9C
dev.cpu.0.temperature: 255.9C
-------------------------------------------------------------------------------
Comment 17 Ivan Rozhuk 2017-09-04 14:16:24 UTC
Can you also try my patch
https://reviews.freebsd.org/D9759 ?
Comment 18 Nils Beyer 2017-09-04 14:45:42 UTC
(In reply to rozhuk.im from comment #17)

sorry, mea culpa - too soon from vacation and too much patches to try.

Ok, for your patch D9759, I get these:
----------------------------------------------------------------------------
amdtemp0: cpu_id = 800f11, AMD_REG_CPUID = 0.
amdtemp0: cpu_id = 800f11, AMD_REG_CPUID = 0.
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb10
amdtemp0: amdtemp_attach: 0:24:3
amdtemp0: CPU have TS: Temperature sensor.
amdtemp0: Found: Reported Temperature Control (RTC).
amdtemp0: Found: Thermaltrip Status (TTS).
amdtemp0: Found: Hardware Thermal Control (HTC), state: enabled.
random: harvesting attach, 8 bytes (4 bits) from amdtemp0
[...]
dev.cpu.15.temperature: 255.9C
dev.cpu.14.temperature: 255.9C
dev.cpu.13.temperature: 255.9C
dev.cpu.12.temperature: 255.9C
dev.cpu.11.temperature: 255.9C
dev.cpu.10.temperature: 255.9C
dev.cpu.9.temperature: 255.9C
dev.cpu.8.temperature: 255.9C
dev.cpu.7.temperature: 255.9C
dev.cpu.6.temperature: 255.9C
dev.cpu.5.temperature: 255.9C
dev.cpu.4.temperature: 255.9C
dev.cpu.3.temperature: 255.9C
dev.cpu.2.temperature: 255.9C
dev.cpu.1.temperature: 255.9C
dev.cpu.0.temperature: 255.9C
----------------------------------------------------------------------------


For Conrad's patch D12217, I get:
----------------------------------------------------------------------------
amdsmn0: <AMD Family 17h System Management Network> on hostb0
random: harvesting attach, 8 bytes (4 bits) from amdsmn0
amdtemp1: <AMD CPU On-Die Thermal Sensors> on hostb0
amdtemp1: Found 16 cores and 1 sensors.
random: harvesting attach, 8 bytes (4 bits) from amdtemp1
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb10
amdtemp0: Found 16 cores and 1 sensors.
random: harvesting attach, 8 bytes (4 bits) from amdtemp0
[...]
dev.cpu.15.temperature: -1
dev.cpu.14.temperature: -1
dev.cpu.13.temperature: -1
dev.cpu.12.temperature: -1
dev.cpu.11.temperature: -1
dev.cpu.10.temperature: -1
dev.cpu.9.temperature: -1
dev.cpu.8.temperature: -1
dev.cpu.7.temperature: -1
dev.cpu.6.temperature: -1
dev.cpu.5.temperature: -1
dev.cpu.4.temperature: -1
dev.cpu.3.temperature: -1
dev.cpu.2.temperature: -1
dev.cpu.1.temperature: -1
dev.cpu.0.temperature: -1
[...]
amdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULL
----------------------------------------------------------------------------

I have _not_ rebooted my system after building the "amdtemp" and "amdsmn" modules if that matters.

Again, sorry for the confusion...
Comment 19 Conrad Meyer freebsd_committer freebsd_triage 2017-09-04 16:33:34 UTC
(In reply to Nils Beyer from comment #16)
It certainly looks like your host bridge PCI device read is failing.  That's a temperature value of 0x7ff (11 bit field), which is consistent with a register read of 0xffffffff.
Comment 20 Nils Beyer 2017-09-04 16:48:56 UTC
(In reply to Conrad Meyer from comment #19)

okay, what can I do to help debugging it?
Comment 21 Conrad Meyer freebsd_committer freebsd_triage 2017-09-04 17:14:08 UTC
(In reply to Nils Beyer from comment #20)
I am not sure, honestly.  Does your motherboard BIOS have any update available?  It might be worth trying that in case the BIOS has done something funky.

Can you try pciconf | grep hostb0, and grep hostb10 to find the PCI (D)/B/D/F for those devices?  On my system it's pci0:0:0:0 and pci0:64:0:0.

(Also: it's odd that your system has two host bridges -- I think it's usually one per Zeppelin die on Zen systems.  Maybe the problem is that both of our amdtemp drivers are attaching to the wrong device.)

Once you have those pci names, try:

$ pciconf -r pci0:foo:bar:baz 0x4    (must be root)

Which will print the PCIR_COMMAND status.  While you're at it, can you try:

$ pciconf -w hostb0@pci0:0:0:0 0x60 0x59800 && pciconf -r hostb0@pci0:0:0:0 0x64

(Substituting your first hostbridge PCI (D)/B/D/F for pci0:0:0:0, if it is not 0/0/0, in both commands.)  That's what the amdtemp + amdsmn driver do from the kernel on family 17h.
Comment 22 Conrad Meyer freebsd_committer freebsd_triage 2017-09-04 17:15:20 UTC
(In reply to Conrad Meyer from comment #21)
> $ pciconf -r pci0:foo:bar:baz 0x4    (must be root)

Sorry, that should be:

$ pciconf -r -h pci0:foo:bar:baz 0x4
Comment 23 Nils Beyer 2017-09-04 18:01:12 UTC
(In reply to Conrad Meyer from comment #21)

--------------------------------------------------------------------------------
root@asbach:/root/#pciconf -l | grep hostb0
hostb0@pci0:0:0:0:      class=0x060000 card=0x14501849 chip=0x14501022 rev=0x00 hdr=0x00

root@asbach:/root/#pciconf -l | grep hostb10
hostb10@pci0:0:24:3:    class=0x060000 card=0x00000000 chip=0x14631022 rev=0x00 hdr=0x00

root@asbach:/root/#pciconf -r -h pci0:0:0:0 0x4
0000 

root@asbach:/root/#pciconf -r -h pci0:0:24:3 0x4
0000 

root@asbach:/root/#pciconf -w hostb0@pci0:0:0:0 0x60 0x59800 && pciconf -r hostb0@pci0:0:0:0 0x64
30000fef 

root@asbach:/root/#pciconf -w hostb10@pci0:0:24:3 0x60 0x59800 && pciconf -r hostb10@pci0:0:24:3 0x64
00000000
--------------------------------------------------------------------------------

The command "pciconf -w hostb0@pci0:0:0:0 0x60 0x59800 && pciconf -r hostb0@pci0:0:0:0 0x64"
generates a different output after nearly every execution:

30000fef, 30200fef, 31200fef, 31400fef, 31600fef, 31800fef, 31400fef, 31a00fef, 31400fef

So it seems that there's something like a temperature behind it?
Comment 24 Ivan Rozhuk 2017-09-04 18:15:02 UTC
(In reply to Nils Beyer from comment #18)
Thanks!

Can you retest with updated patch?
Comment 25 Conrad Meyer freebsd_committer freebsd_triage 2017-09-04 18:39:55 UTC
(In reply to Nils Beyer from comment #23)
It seems we are attaching to the wrong host bridge.  Should be the 14501022 one (hostb0).  That explains the bogus value.  I'll fix it when I get home.
Comment 26 Nils Beyer 2017-09-04 18:43:15 UTC
(In reply to rozhuk.im from comment #24)

that looks much better:
----------------------------------------------------------------------------
amdtemp0: cpu_id = 800f11, AMD_REG_CPUID = 0.
amdtemp0: cpu_id = 800f11, AMD_REG_CPUID = 0.
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb10
amdtemp0: amdtemp_attach: 0:24:3
amdtemp0: CPU have TS: Temperature sensor.
amdtemp0: Found: Reported Temperature Control (RTC).
amdtemp0: Found: Thermaltrip Status (TTS).
amdtemp0: Found: Hardware Thermal Control (HTC), state: enabled.
random: harvesting attach, 8 bytes (4 bits) from amdtemp0
[...]
root@asbach:/usr/src/#sysctl dev.cpu | grep temp
dev.cpu.15.temperature: 49.6C
dev.cpu.14.temperature: 49.6C
dev.cpu.13.temperature: 49.6C
dev.cpu.12.temperature: 49.6C
dev.cpu.11.temperature: 49.6C
dev.cpu.10.temperature: 49.6C
dev.cpu.9.temperature: 49.6C
dev.cpu.8.temperature: 49.6C
dev.cpu.7.temperature: 49.6C
dev.cpu.6.temperature: 49.6C
dev.cpu.5.temperature: 49.6C
dev.cpu.4.temperature: 49.6C
dev.cpu.3.temperature: 49.6C
dev.cpu.2.temperature: 49.6C
dev.cpu.1.temperature: 49.6C
dev.cpu.0.temperature: 49.6C
----------------------------------------------------------------------------

*thumbs up*
Comment 27 Nils Beyer 2017-09-04 18:46:20 UTC
(In reply to Conrad Meyer from comment #25)

looking forward to testing it...
Comment 28 Ivan Rozhuk 2017-09-04 22:00:45 UTC
(In reply to Nils Beyer from comment #26)

Thanks!

sysctl dev.amdtemp please!
Comment 29 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 00:45:54 UTC
(In reply to Nils Beyer from comment #27)
Please try the latest revision: https://reviews.freebsd.org/D12217

Thanks!
Comment 30 Nils Beyer 2017-09-05 06:26:38 UTC
(In reply to rozhuk.im from comment #28)

------------------------------------------------------------------------
root@asbach:/usr/src/#sysctl dev.amdtemp
dev.amdtemp.0.htc.PslApicLoEn: 1
dev.amdtemp.0.htc.PslApicHiEn: 1
dev.amdtemp.0.htc.HtcActSts: 1
dev.amdtemp.0.htc.HtcAct: 1
dev.amdtemp.0.htc.HtcPstateLimit: 7
dev.amdtemp.0.htc.HtcSlewSel: 1
dev.amdtemp.0.htc.HtcLock: 1
dev.amdtemp.0.htc.HtcEn: 1
dev.amdtemp.0.htc.HtcHystLmt: 7.6C
dev.amdtemp.0.htc.HtcTmpLmt: 115.6C
dev.amdtemp.0.tts.core1.sensor1_offset: 0
dev.amdtemp.0.tts.core1.sensor0_offset: 0
dev.amdtemp.0.tts.core1.sensor1: -3.9C
dev.amdtemp.0.tts.core1.sensor0: -3.9C
dev.amdtemp.0.tts.core0.sensor1_offset: 0
dev.amdtemp.0.tts.core0.sensor0_offset: 0
dev.amdtemp.0.tts.core0.sensor1: -3.9C
dev.amdtemp.0.tts.core0.sensor0: -3.9C
dev.amdtemp.0.tts.thermtrip: 0
dev.amdtemp.0.tts.sense: 0
dev.amdtemp.0.tts.enable: 1
dev.amdtemp.0.tts.DiodeOffset: 45
dev.amdtemp.0.tts.TjOffset: 0
dev.amdtemp.0.rtc.sensor_offset: 0
dev.amdtemp.0.rtc.PerStepTimeUp: 15
dev.amdtemp.0.rtc.PerStepTimeDn: 15
dev.amdtemp.0.rtc.TmpMaxDiffUp: 3
dev.amdtemp.0.rtc.TmpSlewDnEn: 1
dev.amdtemp.0.rtc.CurTmpTjSel: -3.3C
dev.amdtemp.0.rtc.CurTmp: 45.7C
dev.amdtemp.0.%parent: hostb10
dev.amdtemp.0.%pnpinfo: 
dev.amdtemp.0.%location: 
dev.amdtemp.0.%driver: amdtemp
dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors
dev.amdtemp.%parent: 
------------------------------------------------------------------------
Comment 31 Nils Beyer 2017-09-05 09:41:13 UTC
(In reply to Conrad Meyer from comment #29)

------------------------------------------------------------------------------
amdsmn0: <AMD Family 17h System Management Network> on hostb0
random: harvesting attach, 8 bytes (4 bits) from amdsmn0
amdtemp1: <AMD CPU On-Die Thermal Sensors> on hostb0
amdtemp1: Found 16 cores and 1 sensors.
random: harvesting attach, 8 bytes (4 bits) from amdtemp1
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb10
amdtemp0: No SMN device found
device_attach: amdtemp0 attach returned 6
[...]
root@asbach:/usr/src/#sysctl dev.cpu | grep temp

root@asbach:/usr/src/#sysctl dev.amdtemp
dev.amdtemp.1.core0.sensor0: 50.0C
dev.amdtemp.1.sensor_offset: 0
dev.amdtemp.1.%parent: hostb0
dev.amdtemp.1.%pnpinfo: 
dev.amdtemp.1.%location: 
dev.amdtemp.1.%driver: amdtemp
dev.amdtemp.1.%desc: AMD CPU On-Die Thermal Sensors
dev.amdtemp.%parent:
------------------------------------------------------------------------------
Comment 32 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 15:06:10 UTC
(In reply to Nils Beyer from comment #31)
Thanks!  Looks good.
Comment 33 commit-hook freebsd_committer freebsd_triage 2017-09-05 15:20:12 UTC
A commit references this bug:

Author: cem
Date: Tue Sep  5 15:19:14 UTC 2017
New revision: 323185
URL: https://svnweb.freebsd.org/changeset/base/323185

Log:
  amdtemp(4): Add support for Family 17h temperature sensor

  The sensor value is formatted similarly to previous models (same
  bitfield sizes, same units), but must be read off of the internal
  System Management Network (SMN) from the System Management Unit (SMU)
  co-processor.

  PR:		218264
  Reported and tested by:	Nils Beyer <nbe AT renzel.net>
  Reviewed by:	avg (no +1), mjoras, truckman
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D12217

Changes:
  head/share/man/man4/amdtemp.4
  head/sys/conf/files.amd64
  head/sys/conf/files.i386
  head/sys/dev/amdtemp/amdtemp.c
Comment 34 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 15:21:14 UTC
Fixed in (r323184 and) r323185.
Comment 35 Nils Beyer 2017-09-05 15:28:26 UTC
(In reply to Conrad Meyer from comment #34)

I do not have temperature nodes under "dev.cpu":
--------------------------------------------------------------------------------
root@asbach:/usr/src/#sysctl dev.cpu | grep temp^M
root@asbach:/usr/src/
--------------------------------------------------------------------------------

sorry, if that wasn't clear in my response...
Comment 36 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 15:32:52 UTC
Ah, I missed that.  Thanks, I'll take a look.
Comment 37 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 15:45:24 UTC
(In reply to Nils Beyer from comment #35)
Sorry about that.  Please try https://people.freebsd.org/~cem/amdtemp-fix-probe.patch .
Comment 38 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 15:56:52 UTC
(In reply to Conrad Meyer from comment #37)
Scratch that, it doesn't work on my system anyway.
Comment 39 Nils Beyer 2017-09-05 16:26:19 UTC
(In reply to Conrad Meyer from comment #38)

no problem - I do have a temperature value under "dev.amdtemp", so I can monitor that for now...
Comment 40 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 16:29:02 UTC
Ok, please try https://people.freebsd.org/~cem/amdtemp-fix-probe.patch .  In particular I am hoping that amdtemp will no longer attach to hostb10 as well as hostb0.
Comment 41 Nils Beyer 2017-09-05 16:48:10 UTC
(In reply to Conrad Meyer from comment #40)

unfortunately, still no "dev.cpu" temperature nodes:
----------------------------------------------------------------------------
amdsmn0: <AMD Family 17h System Management Network> on hostb0
random: harvesting attach, 8 bytes (4 bits) from amdsmn0
amdtemp1: <AMD CPU On-Die Thermal Sensors> on hostb0
amdtemp1: Found 16 cores and 1 sensors.
random: harvesting attach, 8 bytes (4 bits) from amdtemp1
[...]
root@asbach:/usr/src/#sysctl dev.cpu | grep temp^M
root@asbach:/usr/src/#sysctl dev.amdtemp^M
dev.amdtemp.1.core0.sensor0: 46.7C
dev.amdtemp.1.sensor_offset: 0
dev.amdtemp.1.%parent: hostb0
dev.amdtemp.1.%pnpinfo: 
dev.amdtemp.1.%location: 
dev.amdtemp.1.%driver: amdtemp
dev.amdtemp.1.%desc: AMD CPU On-Die Thermal Sensors
dev.amdtemp.%parent:
----------------------------------------------------------------------------

but no more "device_attach: amdtemp0 attach returned 6"...
Comment 42 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 19:27:09 UTC
@Nils, can you share the portion of `devinfo -v` tree containing the hostb10 device?

As far as the sysctl issue, avg@ points out it is because amdtemp is getting the 1 unit number instead of 0.
Comment 43 Nils Beyer 2017-09-05 19:37:46 UTC
(In reply to Conrad Meyer from comment #42)

----------------------------------------------------------------------------
        hostb10 pnpinfo vendor=0x1022 device=0x1463 subvendor=0x0000 subdevice=0x0000 class=0x060000 at slot=24 function=3 dbsf=pci0:0:24:3
          amdtemp0
----------------------------------------------------------------------------

Is the unit number changeable somehow?
Comment 44 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 19:50:36 UTC
(In reply to Nils Beyer from comment #43)
Can you share full devinfo -v?  Sorry. :-)

Have you freshly rebooted since the last time you loaded rozhuk's patched amdtemp?  His driver "attaches" to hostb10 but manually uses the right device by hardcoding some PCI b/s/f.
Comment 45 Nils Beyer 2017-09-05 20:13:03 UTC
(In reply to Conrad Meyer from comment #44)

full "devinfo -v" output attached. Too big to post as text.

Yes, I've rebooted after upgrading to 12-CURRENT with your SVN commit...
Comment 46 Nils Beyer 2017-09-05 20:13:46 UTC
Created attachment 186098 [details]
"devinfo -v" output
Comment 47 Conrad Meyer freebsd_committer freebsd_triage 2017-09-05 20:32:35 UTC
Can you do a clean boot w/ bootverbose?  (`nextboot -o "-v"` if you use UFS, or set it in loader.conf temporarily if you use ZFS.)  It might be useful to see a full dmesg log, including 'kldload amdtemp'.  Thanks for your patience!
Comment 48 commit-hook freebsd_committer freebsd_triage 2017-09-05 20:35:48 UTC
A commit references this bug:

Author: cem
Date: Tue Sep  5 20:35:25 UTC 2017
New revision: 323195
URL: https://svnweb.freebsd.org/changeset/base/323195

Log:
  amdtemp(4): Do not probe not matching hostbridges

  Some systems have hostbs that do not match our PCI device id criteria.
  Detect and ignore these devices in probe.

  PR:		218264
  Sponsored by:	Dell EMC Isilon

Changes:
  head/sys/dev/amdtemp/amdtemp.c
Comment 49 Nils Beyer 2017-09-06 06:27:14 UTC
(In reply to Conrad Meyer from comment #47)

a reboot is not possible at the moment because I'm running a poudriere bulk build to check system's stability. So, we have to wait until system freezes/crashes or the build is finished.

But, because I've already enabled verbose boot, I've attached the dmesg output...
Comment 50 Nils Beyer 2017-09-06 06:28:09 UTC
Created attachment 186109 [details]
dmesg -a output
Comment 51 Conrad Meyer freebsd_committer freebsd_triage 2017-09-06 16:09:44 UTC
Ok, dmesg shows rozhuk's driver being loaded first.  I think this will work fine when you get around to doing a clean reboot.  Reopen if that's not the case :-).
Comment 52 Nils Beyer 2017-09-07 07:20:16 UTC
(In reply to Conrad Meyer from comment #51)

you're right; after an unexpected clean reboot, temperature nodes are present now:
--------------------------------------------------------------------------------
root@asbach:/root/#dmesg -a | tail -6 ; sysctl dev.cpu | grep temp^M
amdsmn0: <AMD Family 17h System Management Network> on hostb0
random: harvesting attach, 8 bytes (4 bits) from amdsmn0
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0
amdtemp0: Found 16 cores and 1 sensors.
random: harvesting attach, 8 bytes (4 bits) from amdtemp0
dev.cpu.15.temperature: 27.6C
dev.cpu.14.temperature: 27.6C
dev.cpu.13.temperature: 27.6C
dev.cpu.12.temperature: 27.6C
dev.cpu.11.temperature: 27.6C
dev.cpu.10.temperature: 27.6C
dev.cpu.9.temperature: 27.6C
dev.cpu.8.temperature: 27.6C
dev.cpu.7.temperature: 27.6C
dev.cpu.6.temperature: 27.6C
dev.cpu.5.temperature: 27.6C
dev.cpu.4.temperature: 27.6C
dev.cpu.3.temperature: 27.6C
dev.cpu.2.temperature: 27.6C
dev.cpu.1.temperature: 27.6C
dev.cpu.0.temperature: 27.6C
--------------------------------------------------------------------------------

issue solved I'd say - thanks a lot for your help...
Comment 53 Conrad Meyer freebsd_committer freebsd_triage 2017-09-07 07:25:56 UTC
Glad to hear it's working.
Comment 54 commit-hook freebsd_committer freebsd_triage 2018-02-22 00:36:29 UTC
A commit references this bug:

Author: truckman
Date: Thu Feb 22 00:36:13 UTC 2018
New revision: 329767
URL: https://svnweb.freebsd.org/changeset/base/329767

Log:
  MFC r323067, r323184, r323185, r323195, r323196 (by cem)

  ------------------------------------------------------------------------
  r323067 | cem | 2017-08-31 11:39:18 -0700 (Thu, 31 Aug 2017) | 4 lines

  amdtemp.4: Update BKDG URL to current location

  Sponsored by:	Dell EMC Isilon

  ------------------------------------------------------------------------
  r323184 | cem | 2017-09-05 08:13:41 -0700 (Tue, 05 Sep 2017) | 10 lines

  Add smn(4) driver for AMD System Management Network

  AMD Family 17h CPUs have an internal network used to communicate between
  the host CPU and the PSP and SMU coprocessors.  It exposes a simple
  32-bit register space.

  Reviewed by:	avg (no +1), mjoras, truckman
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D12217

  ------------------------------------------------------------------------
  r323185 | cem | 2017-09-05 08:19:14 -0700 (Tue, 05 Sep 2017) | 13 lines

  amdtemp(4): Add support for Family 17h temperature sensor

  The sensor value is formatted similarly to previous models (same
  bitfield sizes, same units), but must be read off of the internal
  System Management Network (SMN) from the System Management Unit (SMU)
  co-processor.

  PR:		218264
  Reported and tested by:	Nils Beyer <nbe AT renzel.net>
  Reviewed by:	avg (no +1), mjoras, truckman
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D12217

  ------------------------------------------------------------------------
  r323195 | cem | 2017-09-05 13:35:25 -0700 (Tue, 05 Sep 2017) | 8 lines

  amdtemp(4): Do not probe not matching hostbridges

  Some systems have hostbs that do not match our PCI device id criteria.
  Detect and ignore these devices in probe.

  PR:		218264
  Sponsored by:	Dell EMC Isilon

  ------------------------------------------------------------------------
  r323196 | cem | 2017-09-05 14:00:33 -0700 (Tue, 05 Sep 2017) | 8 lines

  amdsmn(4): Do not probe not matching hostbridges

  Similar to r323195, but for amdsmn(4) driver (which borrowed some design).

  Ignore hostbs that do not match our PCI device id criteria.

  Sponsored by:	Dell EMC Isilon

  PR:		218264
  Differential Revision:	https://reviews.freebsd.org/D12217

Changes:
_U  stable/11/
  stable/11/share/man/man4/Makefile
  stable/11/share/man/man4/amdsmn.4
  stable/11/share/man/man4/amdtemp.4
  stable/11/stand/forth/loader.conf
  stable/11/sys/amd64/conf/NOTES
  stable/11/sys/conf/files.amd64
  stable/11/sys/conf/files.i386
  stable/11/sys/dev/amdsmn/
  stable/11/sys/dev/amdsmn/amdsmn.c
  stable/11/sys/dev/amdtemp/amdtemp.c
  stable/11/sys/modules/Makefile
  stable/11/sys/modules/amdsmn/
Comment 55 commit-hook freebsd_committer freebsd_triage 2018-02-22 00:36:32 UTC
A commit references this bug:

Author: truckman
Date: Thu Feb 22 00:36:13 UTC 2018
New revision: 329767
URL: https://svnweb.freebsd.org/changeset/base/329767

Log:
  MFC r323067, r323184, r323185, r323195, r323196 (by cem)

  ------------------------------------------------------------------------
  r323067 | cem | 2017-08-31 11:39:18 -0700 (Thu, 31 Aug 2017) | 4 lines

  amdtemp.4: Update BKDG URL to current location

  Sponsored by:	Dell EMC Isilon

  ------------------------------------------------------------------------
  r323184 | cem | 2017-09-05 08:13:41 -0700 (Tue, 05 Sep 2017) | 10 lines

  Add smn(4) driver for AMD System Management Network

  AMD Family 17h CPUs have an internal network used to communicate between
  the host CPU and the PSP and SMU coprocessors.  It exposes a simple
  32-bit register space.

  Reviewed by:	avg (no +1), mjoras, truckman
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D12217

  ------------------------------------------------------------------------
  r323185 | cem | 2017-09-05 08:19:14 -0700 (Tue, 05 Sep 2017) | 13 lines

  amdtemp(4): Add support for Family 17h temperature sensor

  The sensor value is formatted similarly to previous models (same
  bitfield sizes, same units), but must be read off of the internal
  System Management Network (SMN) from the System Management Unit (SMU)
  co-processor.

  PR:		218264
  Reported and tested by:	Nils Beyer <nbe AT renzel.net>
  Reviewed by:	avg (no +1), mjoras, truckman
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D12217

  ------------------------------------------------------------------------
  r323195 | cem | 2017-09-05 13:35:25 -0700 (Tue, 05 Sep 2017) | 8 lines

  amdtemp(4): Do not probe not matching hostbridges

  Some systems have hostbs that do not match our PCI device id criteria.
  Detect and ignore these devices in probe.

  PR:		218264
  Sponsored by:	Dell EMC Isilon

  ------------------------------------------------------------------------
  r323196 | cem | 2017-09-05 14:00:33 -0700 (Tue, 05 Sep 2017) | 8 lines

  amdsmn(4): Do not probe not matching hostbridges

  Similar to r323195, but for amdsmn(4) driver (which borrowed some design).

  Ignore hostbs that do not match our PCI device id criteria.

  Sponsored by:	Dell EMC Isilon

  PR:		218264
  Differential Revision:	https://reviews.freebsd.org/D12217

Changes:
_U  stable/11/
  stable/11/share/man/man4/Makefile
  stable/11/share/man/man4/amdsmn.4
  stable/11/share/man/man4/amdtemp.4
  stable/11/stand/forth/loader.conf
  stable/11/sys/amd64/conf/NOTES
  stable/11/sys/conf/files.amd64
  stable/11/sys/conf/files.i386
  stable/11/sys/dev/amdsmn/
  stable/11/sys/dev/amdsmn/amdsmn.c
  stable/11/sys/dev/amdtemp/amdtemp.c
  stable/11/sys/modules/Makefile
  stable/11/sys/modules/amdsmn/
Comment 56 commit-hook freebsd_committer freebsd_triage 2018-02-22 00:36:35 UTC
A commit references this bug:

Author: truckman
Date: Thu Feb 22 00:36:13 UTC 2018
New revision: 329767
URL: https://svnweb.freebsd.org/changeset/base/329767

Log:
  MFC r323067, r323184, r323185, r323195, r323196 (by cem)

  ------------------------------------------------------------------------
  r323067 | cem | 2017-08-31 11:39:18 -0700 (Thu, 31 Aug 2017) | 4 lines

  amdtemp.4: Update BKDG URL to current location

  Sponsored by:	Dell EMC Isilon

  ------------------------------------------------------------------------
  r323184 | cem | 2017-09-05 08:13:41 -0700 (Tue, 05 Sep 2017) | 10 lines

  Add smn(4) driver for AMD System Management Network

  AMD Family 17h CPUs have an internal network used to communicate between
  the host CPU and the PSP and SMU coprocessors.  It exposes a simple
  32-bit register space.

  Reviewed by:	avg (no +1), mjoras, truckman
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D12217

  ------------------------------------------------------------------------
  r323185 | cem | 2017-09-05 08:19:14 -0700 (Tue, 05 Sep 2017) | 13 lines

  amdtemp(4): Add support for Family 17h temperature sensor

  The sensor value is formatted similarly to previous models (same
  bitfield sizes, same units), but must be read off of the internal
  System Management Network (SMN) from the System Management Unit (SMU)
  co-processor.

  PR:		218264
  Reported and tested by:	Nils Beyer <nbe AT renzel.net>
  Reviewed by:	avg (no +1), mjoras, truckman
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D12217

  ------------------------------------------------------------------------
  r323195 | cem | 2017-09-05 13:35:25 -0700 (Tue, 05 Sep 2017) | 8 lines

  amdtemp(4): Do not probe not matching hostbridges

  Some systems have hostbs that do not match our PCI device id criteria.
  Detect and ignore these devices in probe.

  PR:		218264
  Sponsored by:	Dell EMC Isilon

  ------------------------------------------------------------------------
  r323196 | cem | 2017-09-05 14:00:33 -0700 (Tue, 05 Sep 2017) | 8 lines

  amdsmn(4): Do not probe not matching hostbridges

  Similar to r323195, but for amdsmn(4) driver (which borrowed some design).

  Ignore hostbs that do not match our PCI device id criteria.

  Sponsored by:	Dell EMC Isilon

  PR:		218264
  Differential Revision:	https://reviews.freebsd.org/D12217

Changes:
_U  stable/11/
  stable/11/share/man/man4/Makefile
  stable/11/share/man/man4/amdsmn.4
  stable/11/share/man/man4/amdtemp.4
  stable/11/stand/forth/loader.conf
  stable/11/sys/amd64/conf/NOTES
  stable/11/sys/conf/files.amd64
  stable/11/sys/conf/files.i386
  stable/11/sys/dev/amdsmn/
  stable/11/sys/dev/amdsmn/amdsmn.c
  stable/11/sys/dev/amdtemp/amdtemp.c
  stable/11/sys/modules/Makefile
  stable/11/sys/modules/amdsmn/
Comment 57 Kubilay Kocak freebsd_committer freebsd_triage 2019-12-03 04:48:29 UTC
^Triage: Track stable branch merge (from head/12-CURRENT at the time)