Bug 237063 - After "kld amdtemp": devinfo missing cpu31 for Ryzen Threadripper 1950X context
Summary: After "kld amdtemp": devinfo missing cpu31 for Ryzen Threadripper 1950X context
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-06 18:13 UTC by Mark Millard
Modified: 2019-06-19 01:27 UTC (History)
1 user (show)

See Also:


Attachments
devinfo -r output from 1950X system (17.52 KB, text/plain)
2019-04-06 18:13 UTC, Mark Millard
no flags Details
pciconf -lvcb outout for 1950X context (39.16 KB, text/plain)
2019-04-06 18:14 UTC, Mark Millard
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Millard 2019-04-06 18:13:17 UTC
Created attachment 203430 [details]
devinfo -r output from 1950X system

[The could well apply beyond head.]

Beyond cpu31 being missing, it was stated that
amdtemp0 and amdtemp1 were not were they were
initially expected.

# devinfo
nexus0
 cryptosoft0
 vtvga0
 apic0
 ram0
 acpi0
   cpu0
     hwpstate0
     cpufreq0
   cpu1
   cpu2
   cpu3
   cpu4
   cpu5
   cpu6
   cpu7
   cpu8
   cpu9
   cpu10
   cpu11
   cpu12
   cpu13
   cpu14
   cpu15
   cpu16
   cpu17
   cpu18
   cpu19
   cpu20
   cpu21
   cpu22
   cpu23
   cpu24
   cpu25
   cpu26
   cpu27
   cpu28
   cpu29
   cpu30
   pcib0
     pci0
       hostb0
         amdsmn0
         amdtemp0
. . ,
   pcib12
     pci12
       hostb23
         amdsmn1
         amdtemp1
. . .
Comment 1 Mark Millard 2019-04-06 18:14:29 UTC
Created attachment 203431 [details]
pciconf -lvcb outout for 1950X context
Comment 2 Conrad Meyer freebsd_committer 2019-04-06 18:23:47 UTC
The important question that never got answered: did you have cpu31 before kldload amdtemp?

The amdtemp driver is attached where it is supposed to be.

Have you updated your BIOS recently, and/or is it up to date?
Comment 3 Conrad Meyer freebsd_committer 2019-04-06 18:25:44 UTC
(I have this exact CPU and my devinfo topology is the same, except I am not missing cpu31...)
Comment 4 Mark Millard 2019-04-06 19:06:57 UTC
(In reply to Conrad Meyer from comment #2)

For reference, from dmesg -a:

FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
FreeBSD/SMP: 1 package(s) x 2 groups x 2 cache groups x 4 core(s) x 2 hardware threads

Rebooting without any equivalent of "kldstat amdtemp": cpu31 is still missing:

# devinfo | less
nexus0
  cryptosoft0
  vtvga0
  apic0
  ram0
  acpi0
    cpu0
      hwpstate0
      cpufreq0
    cpu1
    cpu2
    cpu3
    cpu4
    cpu5
    cpu6
    cpu7
    cpu8
    cpu9
    cpu10
    cpu11
    cpu12
    cpu13
    cpu14
    cpu15
    cpu16
    cpu17
    cpu18
    cpu19
    cpu20
    cpu21
    cpu22
    cpu23
    cpu24
    cpu25
    cpu26
    cpu27
    cpu28
    cpu29
    cpu30
    pcib0
. . .

This is with:

# kldstat
Id Refs Address                Size Name
 1   41 0xffffffff80200000  2dafae8 kernel
 2    1 0xffffffff83222000     2668 intpm.ko
 3    1 0xffffffff83225000      b50 smbus.ko
 4    1 0xffffffff83226000     18a0 uhid.ko
 5    1 0xffffffff83228000     2928 ums.ko
 6    1 0xffffffff8322b000     3ae0 ng_ubt.ko
 7    5 0xffffffff8322f000     9e30 netgraph.ko
 8    1 0xffffffff83239000     91b8 ng_hci.ko
 9    3 0xffffffff83243000      9c0 ng_bluetooth.ko
10    1 0xffffffff83244000     6e20 uftdi.ko
11    1 0xffffffff8324b000     44d8 ucom.ko
12    1 0xffffffff83250000     1aa0 wmt.ko
13    1 0xffffffff83252000     cad0 ng_l2cap.ko
14    1 0xffffffff8325f000    1ba00 ng_btsocket.ko
15    1 0xffffffff8327b000     21c0 ng_socket.ko
16    1 0xffffffff8327e000      acf mac_ntpd.ko
17    1 0xffffffff8327f000     1600 imgact_binmisc.ko

https://www.gigabyte.com/us/Motherboard/X399-AORUS-Gaming-7-rev-10#support-dl-bios
lists F11 for "Update AGESA 1.1.0.1a" with date 2018/10/12 as the most recent. That
is what the system has.

I wonder what the bios technical issue would be and what would be
good evidence for reporting such a bios defect (presuming that is
what the issue is).

Note: I'm going to lose access to the system for a few(?) days.
Comment 5 Mark Millard 2019-04-06 19:23:18 UTC
(In reply to Mark Millard from comment #4)

Looking with ps I do see 31 for the likes of:

[kernel/softirq_31]
[kernel/if_io_tqg_31]
[kernel/crypto_31]

(I also see 0-30 for them.)

With cpu31 missing from devinfo, I wonder what are the expected
odd-consequences? What is based on the internals that lead to
(not) finding cpu31 in the devinfo output?
Comment 6 Mark Millard 2019-04-07 05:10:22 UTC
(In reply to Mark Millard from comment #5)

While I do not have access to the system currently, looking
around it appears that acpidump output for the MADT may provide
a way to reasonably report a bios defect that might cause cpu 31
to end up missing. (acpidump output is not FreeBSD specific from
what I can tell. But acpidump is new to me.)

So I have something to explore once I have access to the system
again.
Comment 7 Andriy Gapon freebsd_committer 2019-04-08 07:01:37 UTC
(In reply to Mark Millard from comment #6)
Most likely it's ACPI Processor object that's missing (or has some problem).
Yes, acpidump -dt would be helpful.
Comment 8 Ed Maste freebsd_committer 2019-04-08 15:19:05 UTC
I have the same processor:

CPU: AMD Ryzen Threadripper 1950X 16-Core Processor  (3393.71-MHz K8-class CPU)
smbios.bios.reldate="07/18/2017"
smbios.bios.vendor="American Megatrends Inc."
smbios.bios.version="F1"
smbios.planar.maker="Gigabyte Technology Co., Ltd."
smbios.planar.product="X399 AORUS Gaming 7"

and have no such issue
Comment 9 Konstantin Belousov freebsd_committer 2019-04-08 20:39:38 UTC
(In reply to Conrad Meyer from comment #3)
It seems that ACPI misses cpu31 _and hostb30, 31.  The later is the cause of the troubles, if I am reading the code right.

Does temp reading work for cpu30 ?
Comment 10 Mark Millard 2019-04-08 21:47:31 UTC
(In reply to Konstantin Belousov from comment #9)

It will be a few days before I again have access to the system
that showed the problem. But I can show some previously
reported/recorded material for now.

My original list submittal showed the following. I'll note
that there really were 2 temperatures:
dev.amdtemp.[0-1].core0.sensor0 but the only one used and
attributed to cpus was the figure from:
dev.amdtemp.0.core0.sensor0

# sysctl -a | grep "temp.*[0-9]C$"
dev.amdtemp.1.core0.sensor0: 62.0C
dev.amdtemp.0.core0.sensor0: 62.1C
dev.cpu.30.temperature: 62.1C
dev.cpu.29.temperature: 62.1C
dev.cpu.28.temperature: 62.1C
dev.cpu.27.temperature: 62.1C
dev.cpu.26.temperature: 62.1C
dev.cpu.25.temperature: 62.1C
dev.cpu.24.temperature: 62.1C
dev.cpu.23.temperature: 62.1C
dev.cpu.22.temperature: 62.1C
dev.cpu.21.temperature: 62.1C
dev.cpu.20.temperature: 62.1C
dev.cpu.19.temperature: 62.1C
dev.cpu.18.temperature: 62.1C
dev.cpu.17.temperature: 62.1C
dev.cpu.16.temperature: 62.1C
dev.cpu.15.temperature: 62.1C
dev.cpu.14.temperature: 62.1C
dev.cpu.13.temperature: 62.1C
dev.cpu.12.temperature: 62.1C
dev.cpu.11.temperature: 62.1C
dev.cpu.10.temperature: 62.1C
dev.cpu.9.temperature: 62.1C
dev.cpu.8.temperature: 62.1C
dev.cpu.7.temperature: 62.1C
dev.cpu.6.temperature: 62.1C
dev.cpu.5.temperature: 62.1C
dev.cpu.4.temperature: 62.1C
dev.cpu.3.temperature: 62.1C
dev.cpu.2.temperature: 62.1C
dev.cpu.1.temperature: 62.1C
dev.cpu.0.temperature: 62.1C
Comment 11 Mark Millard 2019-04-26 22:25:38 UTC
(In reply to Mark Millard from comment #10)

My lack of access has been longer than originally
expected. But I hope of have access back within a
few days.

I looked and Gigabyte now has a newer F12e BIOS
update shown at:

https://www.gigabyte.com/us/Motherboard/X399-AORUS-Gaming-7-rev-10#support-dl-bios

It also no longer shows the F1 version that Ed
Maste reported he is using on his 1950X. (F2 is
now the oldest listed.)
Comment 12 Mark Millard 2019-05-12 22:32:30 UTC
(In reply to Mark Millard from comment #11)

Access has not yet happened. I'll quit trying to
predict when. But it should eventually happen.
Comment 13 Mark Millard 2019-06-18 07:01:34 UTC
(In reply to Mark Millard from comment #12)

Well, I captured acpidump -dt output and then
updated to firmware F12e. Windows 10 Pro and the
somewhat old Fedora still boot fine. But FreeBSD
hangs, including the older 12-current copies I
still have around, not just the 13-current that
I've been using.

I can still boot all the FreeBSD's under Hyper-V
(same media plugged in the same places either way).

I had hoped to diff the acpidump text files for
F11e vs. F12e for native boots. I'm not sure what
I'll do or when at this point relative to getting
native boots going again at some point.
Comment 14 Mark Millard 2019-06-19 01:27:11 UTC
(In reply to Mark Millard from comment #13)

[FYI: the context is head r347549 based.]

Boot verbose for a debug kernel with BIOS F12e
reported (typed from screen picture):

. . .
crypto: cryptsoft0 registers alg ?? flags 0 maxoplen 0
...
acpi0: <AMD> on motherboard
ACPI: 8 ACPI AML tables successfully acquired and loaded
PCIe: Memory mapped configuration base @ 0xf0000000
ioacpi0: routing intpin 9 (ISA IRQ 9) to lapic 0 vector 48
acpi0: Power Button (fixed)
acpi0: wakeup code va 0xfffffe0005dff000 pa 0x98000

And that was all that was output.

With Verbose SYSINIT instead it reported:
(the 3 hpt_init(0)'s are not typos)

. . .
   module_register_init(&nvd_mod)... done.
subsystem 3800000
   hpt_init(0)... done.
   hpt_init(0)... done.
   configure_first(0)... done.
   hpt_init(0)... done.
   module_register_init(&cam_moduledata)... done.
   fbd_evh_init(0)... done.
   module_register_init(&ata_moduledata)... done.
   configure(0)... nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <AMD> on motherboard
acpi0: Power Button (fixed)

And that was all that was output for this context.
(Mixing in boot verbose scrolls the SYSINIT
information off screen and so does not show anything
new.)