My system: FreeBSD 12.0-CURRENT #0 334829e6c(drm-next)-dirty My CPU: AMD Ryzen 1700 after kldloading "amdtemp", there are still no temperature values under "sysctl dev.cpu". dmidecode -t processor --------------------------------------------------------------------------------- # dmidecode 3.0 Scanning /dev/mem for entry point. SMBIOS 3.0 present. Handle 0x002D, DMI type 4, 48 bytes Processor Information Socket Designation: AM4 Type: Central Processor Family: <OUT OF SPEC> Manufacturer: Advanced Micro Devices, Inc. ID: 11 0F 80 00 FF FB 8B 17 Version: AMD Ryzen 7 1700 Eight-Core Processor Voltage: 1.2 V External Clock: 100 MHz Max Speed: 3400 MHz Current Speed: 3400 MHz Status: Populated, Enabled Upgrade: <OUT OF SPEC> L1 Cache Handle: 0x002A L2 Cache Handle: 0x002B L3 Cache Handle: 0x002C Serial Number: Unknown Asset Tag: Unknown Part Number: Unknown Core Count: 8 Core Enabled: 8 Thread Count: 16 Characteristics: 64-bit capable --------------------------------------------------------------------------------- TIA and regards, Nils
Created attachment 183887 [details] superiotool dump parser for NCT6779D chips...
FWIW, I've changed my mainboard to "ASRock AB350 Pro4". The port "superiotool" is able to read the sensor's data (NCT6779D) - so I've created a quick&dirty parser for its output: ------------------------------------------------------------------------- #./nct6779d.lua MB temperature: 38 (hyst=0, crit=75) CPU temperature: 52 (hyst=75, crit=80) FAN1: 0 FAN1 pulse: 2 FAN2: 19 FAN2 pulse: 2 FAN3: 255 FAN3 pulse: 2 FAN4: 255 FAN4 pulse: 2 FAN5: 255 FAN5 pulse: 2 FAN6: 255 ------------------------------------------------------------------------- (at the moment 14 cores are busy compiling poudriere stuff)
(In reply to Nils Beyer from comment #0) AMD hasn't released the documentation for this yet. Linux is in the same boat.
Bug also applies AMD Ryzen 1800X - I'm affected too.
(In reply to Nils Beyer from comment #1) This is amusing, but for context I have a similar nct sensor on my Intel system and it doesn't track the core temperatures at all. At idle, it reports about 8°C cooler than the package/core sensors. Under load, it climbs more slowly and reaches a peak ~30°C below the package/core sensors. Back to idle, it starts falling again while the cores still report 10°C higher. So I don't think this sensor is sufficient even if we integrated it better. Fortunately, I was able to find the AMD SB-TSI (sideband temperature sensor) spec which supposedly documents communication with the CPU temperature sensor over SMBus.
Here's a patch that should add Zen support to amdtemp(4): https://reviews.freebsd.org/D12217 Please give it a spin and let me know what you think.
Ryzen 1800X over here. # kldload amdsnm amdsmn0: <AMD Family 17h System Management Network> on hostb0 # kldload amdtemp amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0 # sysctl -a | grep temp dev.amdtemp.0.core0.sensor0: 45.1C dev.amdtemp.0.sensor_offset: 0 dev.amdtemp.0.%parent: hostb0 dev.amdtemp.0.%pnpinfo: dev.amdtemp.0.%location: dev.amdtemp.0.%driver: amdtemp dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors dev.amdtemp.%parent: dev.cpu.15.temperature: 45.1C dev.cpu.14.temperature: 45.1C dev.cpu.13.temperature: 45.1C dev.cpu.12.temperature: 45.1C dev.cpu.11.temperature: 45.1C dev.cpu.10.temperature: 45.1C dev.cpu.9.temperature: 45.1C dev.cpu.8.temperature: 45.1C dev.cpu.7.temperature: 54.5C dev.cpu.6.temperature: 54.5C dev.cpu.5.temperature: 54.5C dev.cpu.4.temperature: 54.5C dev.cpu.3.temperature: 54.5C dev.cpu.2.temperature: 54.5C dev.cpu.1.temperature: 54.5C dev.cpu.0.temperature: 54.5C Seems to work. What's the difference between core0.sensor0 and the per-cpu temperature values?
(In reply to Raphael 'kena' Poss from comment #7) > Seems to work. What's the difference between core0.sensor0 and the per-cpu > temperature values? No difference -- you're just seeing measurements from slightly different points in time. sysctl -a just generates enough CPU activity to bring the temperature up from idle on some CPUs. It's all one single sensor per Zeppelin die (and there's one die in the 1800X).
Brilliant, thanks for the good work.
Huh could it be that this sensor hardware is shared with previous AMD tech? Check this out: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/vega10/THM/thm_9_0_offset.h
(In reply to Raphael 'kena' Poss from comment #10) Those files are for AMD GPUs.
(In reply to Raphael 'kena' Poss from comment #10) Yeah, it wouldn't be too surprising if it's shared across models. That said, Vega isn't exactly "previous" AMD tech :-).
So there are other sensors possibly in there. For example the temperature coefficient.
(In reply to Raphael 'kena' Poss from comment #13) CUR_TMP, HTC, and THERM_TRIP match the PPR. Nothing else is documented between that and what would be 0x80 and 0x81 in the Vega link (not present in Vega). When I attempt to read from the (undocumented for Zen) Vega THM_TMON0_COEFF (index 0x5e == 0x59978), I get 0x00024068. Any idea how that field decodes? Is that sane for that register on Vega?
I'm sorry I can't say, no access to that hardware for now.
(In reply to Conrad Meyer from comment #6) thanks a lot for that patch - applied it on 11-STABLE r323151. Unfortunately, temperature values are all strange: ------------------------------------------------------------------------------- root@asbach:/usr/src/#sysctl dev.cpu | grep temp dev.cpu.15.temperature: 255.9C dev.cpu.14.temperature: 255.9C dev.cpu.13.temperature: 255.9C dev.cpu.12.temperature: 255.9C dev.cpu.11.temperature: 255.9C dev.cpu.10.temperature: 255.9C dev.cpu.9.temperature: 255.9C dev.cpu.8.temperature: 255.9C dev.cpu.7.temperature: 255.9C dev.cpu.6.temperature: 255.9C dev.cpu.5.temperature: 255.9C dev.cpu.4.temperature: 255.9C dev.cpu.3.temperature: 255.9C dev.cpu.2.temperature: 255.9C dev.cpu.1.temperature: 255.9C dev.cpu.0.temperature: 255.9C -------------------------------------------------------------------------------
Can you also try my patch https://reviews.freebsd.org/D9759 ?
(In reply to rozhuk.im from comment #17) sorry, mea culpa - too soon from vacation and too much patches to try. Ok, for your patch D9759, I get these: ---------------------------------------------------------------------------- amdtemp0: cpu_id = 800f11, AMD_REG_CPUID = 0. amdtemp0: cpu_id = 800f11, AMD_REG_CPUID = 0. amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb10 amdtemp0: amdtemp_attach: 0:24:3 amdtemp0: CPU have TS: Temperature sensor. amdtemp0: Found: Reported Temperature Control (RTC). amdtemp0: Found: Thermaltrip Status (TTS). amdtemp0: Found: Hardware Thermal Control (HTC), state: enabled. random: harvesting attach, 8 bytes (4 bits) from amdtemp0 [...] dev.cpu.15.temperature: 255.9C dev.cpu.14.temperature: 255.9C dev.cpu.13.temperature: 255.9C dev.cpu.12.temperature: 255.9C dev.cpu.11.temperature: 255.9C dev.cpu.10.temperature: 255.9C dev.cpu.9.temperature: 255.9C dev.cpu.8.temperature: 255.9C dev.cpu.7.temperature: 255.9C dev.cpu.6.temperature: 255.9C dev.cpu.5.temperature: 255.9C dev.cpu.4.temperature: 255.9C dev.cpu.3.temperature: 255.9C dev.cpu.2.temperature: 255.9C dev.cpu.1.temperature: 255.9C dev.cpu.0.temperature: 255.9C ---------------------------------------------------------------------------- For Conrad's patch D12217, I get: ---------------------------------------------------------------------------- amdsmn0: <AMD Family 17h System Management Network> on hostb0 random: harvesting attach, 8 bytes (4 bits) from amdsmn0 amdtemp1: <AMD CPU On-Die Thermal Sensors> on hostb0 amdtemp1: Found 16 cores and 1 sensors. random: harvesting attach, 8 bytes (4 bits) from amdtemp1 amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb10 amdtemp0: Found 16 cores and 1 sensors. random: harvesting attach, 8 bytes (4 bits) from amdtemp0 [...] dev.cpu.15.temperature: -1 dev.cpu.14.temperature: -1 dev.cpu.13.temperature: -1 dev.cpu.12.temperature: -1 dev.cpu.11.temperature: -1 dev.cpu.10.temperature: -1 dev.cpu.9.temperature: -1 dev.cpu.8.temperature: -1 dev.cpu.7.temperature: -1 dev.cpu.6.temperature: -1 dev.cpu.5.temperature: -1 dev.cpu.4.temperature: -1 dev.cpu.3.temperature: -1 dev.cpu.2.temperature: -1 dev.cpu.1.temperature: -1 dev.cpu.0.temperature: -1 [...] amdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULLamdtemp0: amdtemp_gettemp17h: SMN: NULL ---------------------------------------------------------------------------- I have _not_ rebooted my system after building the "amdtemp" and "amdsmn" modules if that matters. Again, sorry for the confusion...
(In reply to Nils Beyer from comment #16) It certainly looks like your host bridge PCI device read is failing. That's a temperature value of 0x7ff (11 bit field), which is consistent with a register read of 0xffffffff.
(In reply to Conrad Meyer from comment #19) okay, what can I do to help debugging it?
(In reply to Nils Beyer from comment #20) I am not sure, honestly. Does your motherboard BIOS have any update available? It might be worth trying that in case the BIOS has done something funky. Can you try pciconf | grep hostb0, and grep hostb10 to find the PCI (D)/B/D/F for those devices? On my system it's pci0:0:0:0 and pci0:64:0:0. (Also: it's odd that your system has two host bridges -- I think it's usually one per Zeppelin die on Zen systems. Maybe the problem is that both of our amdtemp drivers are attaching to the wrong device.) Once you have those pci names, try: $ pciconf -r pci0:foo:bar:baz 0x4 (must be root) Which will print the PCIR_COMMAND status. While you're at it, can you try: $ pciconf -w hostb0@pci0:0:0:0 0x60 0x59800 && pciconf -r hostb0@pci0:0:0:0 0x64 (Substituting your first hostbridge PCI (D)/B/D/F for pci0:0:0:0, if it is not 0/0/0, in both commands.) That's what the amdtemp + amdsmn driver do from the kernel on family 17h.
(In reply to Conrad Meyer from comment #21) > $ pciconf -r pci0:foo:bar:baz 0x4 (must be root) Sorry, that should be: $ pciconf -r -h pci0:foo:bar:baz 0x4
(In reply to Conrad Meyer from comment #21) -------------------------------------------------------------------------------- root@asbach:/root/#pciconf -l | grep hostb0 hostb0@pci0:0:0:0: class=0x060000 card=0x14501849 chip=0x14501022 rev=0x00 hdr=0x00 root@asbach:/root/#pciconf -l | grep hostb10 hostb10@pci0:0:24:3: class=0x060000 card=0x00000000 chip=0x14631022 rev=0x00 hdr=0x00 root@asbach:/root/#pciconf -r -h pci0:0:0:0 0x4 0000 root@asbach:/root/#pciconf -r -h pci0:0:24:3 0x4 0000 root@asbach:/root/#pciconf -w hostb0@pci0:0:0:0 0x60 0x59800 && pciconf -r hostb0@pci0:0:0:0 0x64 30000fef root@asbach:/root/#pciconf -w hostb10@pci0:0:24:3 0x60 0x59800 && pciconf -r hostb10@pci0:0:24:3 0x64 00000000 -------------------------------------------------------------------------------- The command "pciconf -w hostb0@pci0:0:0:0 0x60 0x59800 && pciconf -r hostb0@pci0:0:0:0 0x64" generates a different output after nearly every execution: 30000fef, 30200fef, 31200fef, 31400fef, 31600fef, 31800fef, 31400fef, 31a00fef, 31400fef So it seems that there's something like a temperature behind it?
(In reply to Nils Beyer from comment #18) Thanks! Can you retest with updated patch?
(In reply to Nils Beyer from comment #23) It seems we are attaching to the wrong host bridge. Should be the 14501022 one (hostb0). That explains the bogus value. I'll fix it when I get home.
(In reply to rozhuk.im from comment #24) that looks much better: ---------------------------------------------------------------------------- amdtemp0: cpu_id = 800f11, AMD_REG_CPUID = 0. amdtemp0: cpu_id = 800f11, AMD_REG_CPUID = 0. amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb10 amdtemp0: amdtemp_attach: 0:24:3 amdtemp0: CPU have TS: Temperature sensor. amdtemp0: Found: Reported Temperature Control (RTC). amdtemp0: Found: Thermaltrip Status (TTS). amdtemp0: Found: Hardware Thermal Control (HTC), state: enabled. random: harvesting attach, 8 bytes (4 bits) from amdtemp0 [...] root@asbach:/usr/src/#sysctl dev.cpu | grep temp dev.cpu.15.temperature: 49.6C dev.cpu.14.temperature: 49.6C dev.cpu.13.temperature: 49.6C dev.cpu.12.temperature: 49.6C dev.cpu.11.temperature: 49.6C dev.cpu.10.temperature: 49.6C dev.cpu.9.temperature: 49.6C dev.cpu.8.temperature: 49.6C dev.cpu.7.temperature: 49.6C dev.cpu.6.temperature: 49.6C dev.cpu.5.temperature: 49.6C dev.cpu.4.temperature: 49.6C dev.cpu.3.temperature: 49.6C dev.cpu.2.temperature: 49.6C dev.cpu.1.temperature: 49.6C dev.cpu.0.temperature: 49.6C ---------------------------------------------------------------------------- *thumbs up*
(In reply to Conrad Meyer from comment #25) looking forward to testing it...
(In reply to Nils Beyer from comment #26) Thanks! sysctl dev.amdtemp please!
(In reply to Nils Beyer from comment #27) Please try the latest revision: https://reviews.freebsd.org/D12217 Thanks!
(In reply to rozhuk.im from comment #28) ------------------------------------------------------------------------ root@asbach:/usr/src/#sysctl dev.amdtemp dev.amdtemp.0.htc.PslApicLoEn: 1 dev.amdtemp.0.htc.PslApicHiEn: 1 dev.amdtemp.0.htc.HtcActSts: 1 dev.amdtemp.0.htc.HtcAct: 1 dev.amdtemp.0.htc.HtcPstateLimit: 7 dev.amdtemp.0.htc.HtcSlewSel: 1 dev.amdtemp.0.htc.HtcLock: 1 dev.amdtemp.0.htc.HtcEn: 1 dev.amdtemp.0.htc.HtcHystLmt: 7.6C dev.amdtemp.0.htc.HtcTmpLmt: 115.6C dev.amdtemp.0.tts.core1.sensor1_offset: 0 dev.amdtemp.0.tts.core1.sensor0_offset: 0 dev.amdtemp.0.tts.core1.sensor1: -3.9C dev.amdtemp.0.tts.core1.sensor0: -3.9C dev.amdtemp.0.tts.core0.sensor1_offset: 0 dev.amdtemp.0.tts.core0.sensor0_offset: 0 dev.amdtemp.0.tts.core0.sensor1: -3.9C dev.amdtemp.0.tts.core0.sensor0: -3.9C dev.amdtemp.0.tts.thermtrip: 0 dev.amdtemp.0.tts.sense: 0 dev.amdtemp.0.tts.enable: 1 dev.amdtemp.0.tts.DiodeOffset: 45 dev.amdtemp.0.tts.TjOffset: 0 dev.amdtemp.0.rtc.sensor_offset: 0 dev.amdtemp.0.rtc.PerStepTimeUp: 15 dev.amdtemp.0.rtc.PerStepTimeDn: 15 dev.amdtemp.0.rtc.TmpMaxDiffUp: 3 dev.amdtemp.0.rtc.TmpSlewDnEn: 1 dev.amdtemp.0.rtc.CurTmpTjSel: -3.3C dev.amdtemp.0.rtc.CurTmp: 45.7C dev.amdtemp.0.%parent: hostb10 dev.amdtemp.0.%pnpinfo: dev.amdtemp.0.%location: dev.amdtemp.0.%driver: amdtemp dev.amdtemp.0.%desc: AMD CPU On-Die Thermal Sensors dev.amdtemp.%parent: ------------------------------------------------------------------------
(In reply to Conrad Meyer from comment #29) ------------------------------------------------------------------------------ amdsmn0: <AMD Family 17h System Management Network> on hostb0 random: harvesting attach, 8 bytes (4 bits) from amdsmn0 amdtemp1: <AMD CPU On-Die Thermal Sensors> on hostb0 amdtemp1: Found 16 cores and 1 sensors. random: harvesting attach, 8 bytes (4 bits) from amdtemp1 amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb10 amdtemp0: No SMN device found device_attach: amdtemp0 attach returned 6 [...] root@asbach:/usr/src/#sysctl dev.cpu | grep temp root@asbach:/usr/src/#sysctl dev.amdtemp dev.amdtemp.1.core0.sensor0: 50.0C dev.amdtemp.1.sensor_offset: 0 dev.amdtemp.1.%parent: hostb0 dev.amdtemp.1.%pnpinfo: dev.amdtemp.1.%location: dev.amdtemp.1.%driver: amdtemp dev.amdtemp.1.%desc: AMD CPU On-Die Thermal Sensors dev.amdtemp.%parent: ------------------------------------------------------------------------------
(In reply to Nils Beyer from comment #31) Thanks! Looks good.
A commit references this bug: Author: cem Date: Tue Sep 5 15:19:14 UTC 2017 New revision: 323185 URL: https://svnweb.freebsd.org/changeset/base/323185 Log: amdtemp(4): Add support for Family 17h temperature sensor The sensor value is formatted similarly to previous models (same bitfield sizes, same units), but must be read off of the internal System Management Network (SMN) from the System Management Unit (SMU) co-processor. PR: 218264 Reported and tested by: Nils Beyer <nbe AT renzel.net> Reviewed by: avg (no +1), mjoras, truckman Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12217 Changes: head/share/man/man4/amdtemp.4 head/sys/conf/files.amd64 head/sys/conf/files.i386 head/sys/dev/amdtemp/amdtemp.c
Fixed in (r323184 and) r323185.
(In reply to Conrad Meyer from comment #34) I do not have temperature nodes under "dev.cpu": -------------------------------------------------------------------------------- root@asbach:/usr/src/#sysctl dev.cpu | grep temp^M root@asbach:/usr/src/ -------------------------------------------------------------------------------- sorry, if that wasn't clear in my response...
Ah, I missed that. Thanks, I'll take a look.
(In reply to Nils Beyer from comment #35) Sorry about that. Please try https://people.freebsd.org/~cem/amdtemp-fix-probe.patch .
(In reply to Conrad Meyer from comment #37) Scratch that, it doesn't work on my system anyway.
(In reply to Conrad Meyer from comment #38) no problem - I do have a temperature value under "dev.amdtemp", so I can monitor that for now...
Ok, please try https://people.freebsd.org/~cem/amdtemp-fix-probe.patch . In particular I am hoping that amdtemp will no longer attach to hostb10 as well as hostb0.
(In reply to Conrad Meyer from comment #40) unfortunately, still no "dev.cpu" temperature nodes: ---------------------------------------------------------------------------- amdsmn0: <AMD Family 17h System Management Network> on hostb0 random: harvesting attach, 8 bytes (4 bits) from amdsmn0 amdtemp1: <AMD CPU On-Die Thermal Sensors> on hostb0 amdtemp1: Found 16 cores and 1 sensors. random: harvesting attach, 8 bytes (4 bits) from amdtemp1 [...] root@asbach:/usr/src/#sysctl dev.cpu | grep temp^M root@asbach:/usr/src/#sysctl dev.amdtemp^M dev.amdtemp.1.core0.sensor0: 46.7C dev.amdtemp.1.sensor_offset: 0 dev.amdtemp.1.%parent: hostb0 dev.amdtemp.1.%pnpinfo: dev.amdtemp.1.%location: dev.amdtemp.1.%driver: amdtemp dev.amdtemp.1.%desc: AMD CPU On-Die Thermal Sensors dev.amdtemp.%parent: ---------------------------------------------------------------------------- but no more "device_attach: amdtemp0 attach returned 6"...
@Nils, can you share the portion of `devinfo -v` tree containing the hostb10 device? As far as the sysctl issue, avg@ points out it is because amdtemp is getting the 1 unit number instead of 0.
(In reply to Conrad Meyer from comment #42) ---------------------------------------------------------------------------- hostb10 pnpinfo vendor=0x1022 device=0x1463 subvendor=0x0000 subdevice=0x0000 class=0x060000 at slot=24 function=3 dbsf=pci0:0:24:3 amdtemp0 ---------------------------------------------------------------------------- Is the unit number changeable somehow?
(In reply to Nils Beyer from comment #43) Can you share full devinfo -v? Sorry. :-) Have you freshly rebooted since the last time you loaded rozhuk's patched amdtemp? His driver "attaches" to hostb10 but manually uses the right device by hardcoding some PCI b/s/f.
(In reply to Conrad Meyer from comment #44) full "devinfo -v" output attached. Too big to post as text. Yes, I've rebooted after upgrading to 12-CURRENT with your SVN commit...
Created attachment 186098 [details] "devinfo -v" output
Can you do a clean boot w/ bootverbose? (`nextboot -o "-v"` if you use UFS, or set it in loader.conf temporarily if you use ZFS.) It might be useful to see a full dmesg log, including 'kldload amdtemp'. Thanks for your patience!
A commit references this bug: Author: cem Date: Tue Sep 5 20:35:25 UTC 2017 New revision: 323195 URL: https://svnweb.freebsd.org/changeset/base/323195 Log: amdtemp(4): Do not probe not matching hostbridges Some systems have hostbs that do not match our PCI device id criteria. Detect and ignore these devices in probe. PR: 218264 Sponsored by: Dell EMC Isilon Changes: head/sys/dev/amdtemp/amdtemp.c
(In reply to Conrad Meyer from comment #47) a reboot is not possible at the moment because I'm running a poudriere bulk build to check system's stability. So, we have to wait until system freezes/crashes or the build is finished. But, because I've already enabled verbose boot, I've attached the dmesg output...
Created attachment 186109 [details] dmesg -a output
Ok, dmesg shows rozhuk's driver being loaded first. I think this will work fine when you get around to doing a clean reboot. Reopen if that's not the case :-).
(In reply to Conrad Meyer from comment #51) you're right; after an unexpected clean reboot, temperature nodes are present now: -------------------------------------------------------------------------------- root@asbach:/root/#dmesg -a | tail -6 ; sysctl dev.cpu | grep temp^M amdsmn0: <AMD Family 17h System Management Network> on hostb0 random: harvesting attach, 8 bytes (4 bits) from amdsmn0 amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0 amdtemp0: Found 16 cores and 1 sensors. random: harvesting attach, 8 bytes (4 bits) from amdtemp0 dev.cpu.15.temperature: 27.6C dev.cpu.14.temperature: 27.6C dev.cpu.13.temperature: 27.6C dev.cpu.12.temperature: 27.6C dev.cpu.11.temperature: 27.6C dev.cpu.10.temperature: 27.6C dev.cpu.9.temperature: 27.6C dev.cpu.8.temperature: 27.6C dev.cpu.7.temperature: 27.6C dev.cpu.6.temperature: 27.6C dev.cpu.5.temperature: 27.6C dev.cpu.4.temperature: 27.6C dev.cpu.3.temperature: 27.6C dev.cpu.2.temperature: 27.6C dev.cpu.1.temperature: 27.6C dev.cpu.0.temperature: 27.6C -------------------------------------------------------------------------------- issue solved I'd say - thanks a lot for your help...
Glad to hear it's working.
A commit references this bug: Author: truckman Date: Thu Feb 22 00:36:13 UTC 2018 New revision: 329767 URL: https://svnweb.freebsd.org/changeset/base/329767 Log: MFC r323067, r323184, r323185, r323195, r323196 (by cem) ------------------------------------------------------------------------ r323067 | cem | 2017-08-31 11:39:18 -0700 (Thu, 31 Aug 2017) | 4 lines amdtemp.4: Update BKDG URL to current location Sponsored by: Dell EMC Isilon ------------------------------------------------------------------------ r323184 | cem | 2017-09-05 08:13:41 -0700 (Tue, 05 Sep 2017) | 10 lines Add smn(4) driver for AMD System Management Network AMD Family 17h CPUs have an internal network used to communicate between the host CPU and the PSP and SMU coprocessors. It exposes a simple 32-bit register space. Reviewed by: avg (no +1), mjoras, truckman Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12217 ------------------------------------------------------------------------ r323185 | cem | 2017-09-05 08:19:14 -0700 (Tue, 05 Sep 2017) | 13 lines amdtemp(4): Add support for Family 17h temperature sensor The sensor value is formatted similarly to previous models (same bitfield sizes, same units), but must be read off of the internal System Management Network (SMN) from the System Management Unit (SMU) co-processor. PR: 218264 Reported and tested by: Nils Beyer <nbe AT renzel.net> Reviewed by: avg (no +1), mjoras, truckman Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12217 ------------------------------------------------------------------------ r323195 | cem | 2017-09-05 13:35:25 -0700 (Tue, 05 Sep 2017) | 8 lines amdtemp(4): Do not probe not matching hostbridges Some systems have hostbs that do not match our PCI device id criteria. Detect and ignore these devices in probe. PR: 218264 Sponsored by: Dell EMC Isilon ------------------------------------------------------------------------ r323196 | cem | 2017-09-05 14:00:33 -0700 (Tue, 05 Sep 2017) | 8 lines amdsmn(4): Do not probe not matching hostbridges Similar to r323195, but for amdsmn(4) driver (which borrowed some design). Ignore hostbs that do not match our PCI device id criteria. Sponsored by: Dell EMC Isilon PR: 218264 Differential Revision: https://reviews.freebsd.org/D12217 Changes: _U stable/11/ stable/11/share/man/man4/Makefile stable/11/share/man/man4/amdsmn.4 stable/11/share/man/man4/amdtemp.4 stable/11/stand/forth/loader.conf stable/11/sys/amd64/conf/NOTES stable/11/sys/conf/files.amd64 stable/11/sys/conf/files.i386 stable/11/sys/dev/amdsmn/ stable/11/sys/dev/amdsmn/amdsmn.c stable/11/sys/dev/amdtemp/amdtemp.c stable/11/sys/modules/Makefile stable/11/sys/modules/amdsmn/
^Triage: Track stable branch merge (from head/12-CURRENT at the time)