Summary: | jedec_dimm(4) and imcsmb(4): support of memory controllers in Skylake and newer Intel CPUs | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Vladimir Druzenko <vvd> | ||||
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||
Status: | New --- | ||||||
Severity: | Affects Many People | CC: | rb, rpokala, rpokala, ruben | ||||
Priority: | --- | ||||||
Version: | 12.2-RELEASE | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Vladimir Druzenko
2020-08-07 12:07:23 UTC
imcsmb(4) has not been updated to work with *Lake CPUs. I think I started taking a swing at this sometime last year, but eventually put it on hold because I do not have access to systems which both have those CPUs, and for which I know the SMBus address map. I'll see if I can dig up my work-in-progress and attach it here. If you can test it for me, then we might be able to finish this off together. (In reply to Ravi Pokala from comment #1) Ofc I can test! Better as patch to 12.1 - I'll rebuild module and load it for test. But if you need HEAD only, then I can boot it form LiveUSB (https://download.freebsd.org/ftp/snapshots/amd64/amd64/ISO-IMAGES/13.0/) and kldload modules. Created attachment 217141 [details] Support Skylake-Xeon in imcsmb(4) (take 1) (In reply to VVD from comment #2) The attached patch should apply cleanly to both -HEAD and stable/12 (In reply to Ravi Pokala from comment #3) Thanks! kldload imcsmb.ko: imcsmb_pci0: <Intel Skylake Xeon iMC 0 SMBus controllers> at device 30.5 numa-domain 0 on pci5 imcsmb0: <iMC SMBus controller> numa-domain 0 on imcsmb_pci0 smbus1: <System Management Bus> numa-domain 0 on imcsmb0 smb1: <SMBus generic I/O> on smbus1 imcsmb1: <iMC SMBus controller> numa-domain 0 on imcsmb_pci0 smbus2: <System Management Bus> numa-domain 0 on imcsmb1 smb2: <SMBus generic I/O> on smbus2 imcsmb_pci1: <Intel Skylake Xeon iMC 1 SMBus controllers> at device 30.6 numa-domain 0 on pci5 imcsmb2: <iMC SMBus controller> numa-domain 0 on imcsmb_pci1 smbus3: <System Management Bus> numa-domain 0 on imcsmb2 smb3: <SMBus generic I/O> on smbus3 imcsmb3: <iMC SMBus controller> numa-domain 0 on imcsmb_pci1 smbus4: <System Management Bus> numa-domain 0 on imcsmb3 smb4: <SMBus generic I/O> on smbus4 But after kldload jedec_dimm.ko: sysctl -a | grep jedec | wc -l 0 Added to /boot/device.hints: hint.jedec_dimm.0.at="smbus1" hint.jedec_dimm.0.addr="0xa0" hint.jedec_dimm.0.slotid="Silkscreen" kldunload jedec_dimm imcsmb / kldload - nothing changed. (In reply to VVD from comment #4) Are you sure that smbus1:0xa0 is the proper bus:address for the DIMM in question? For experimentation purposes, you could configure the kernel environment to look at all possible addresses: kldunload imcsmb.ko smbus.ko jedec_dimm.ko unit=0 for bus in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ; do for addr in 0xa0 0xa2 0xa4 0xa6 0xa8 0xaa 0xac 0xae ; do kenv hint.jedec_dimm.${unit}.at="smbus${bus}" kenv hint.jedec_dimm.${unit}.addr="${addr}" unit=$(( ${unit} + 1 )) done done kldload /path/to/imcsmb.ko /boot/kernel/smbus.ko /boot/kernel/jedec_dimm.ko Can you try that and let me know if any of them work? When you're done, you can run it again with `kenv -u' to remove all the extra entries, then configure device.hints for the real values. (In reply to Ravi Pokala from comment #5) imcsmb_pci0: <Intel Skylake Xeon iMC 0 SMBus controllers> at device 30.5 numa-domain 0 on pci5 imcsmb0: <iMC SMBus controller> numa-domain 0 on imcsmb_pci0 smbus1: <System Management Bus> numa-domain 0 on imcsmb0 smb1: <SMBus generic I/O> on smbus1 smbus1: <unknown device> at addr 0xa0 smbus1: <unknown device> at addr 0xa2 smbus1: <unknown device> at addr 0xa4 smbus1: <unknown device> at addr 0xa6 smbus1: <unknown device> at addr 0xa8 smbus1: <unknown device> at addr 0xaa smbus1: <unknown device> at addr 0xac smbus1: <unknown device> at addr 0xae imcsmb1: <iMC SMBus controller> numa-domain 0 on imcsmb_pci0 smbus2: <System Management Bus> numa-domain 0 on imcsmb1 smb2: <SMBus generic I/O> on smbus2 smbus2: <unknown device> at addr 0xa0 smbus2: <unknown device> at addr 0xa2 smbus2: <unknown device> at addr 0xa4 smbus2: <unknown device> at addr 0xa6 smbus2: <unknown device> at addr 0xa8 smbus2: <unknown device> at addr 0xaa smbus2: <unknown device> at addr 0xac smbus2: <unknown device> at addr 0xae imcsmb_pci1: <Intel Skylake Xeon iMC 1 SMBus controllers> at device 30.6 numa-domain 0 on pci5 imcsmb2: <iMC SMBus controller> numa-domain 0 on imcsmb_pci1 smbus3: <System Management Bus> numa-domain 0 on imcsmb2 smb3: <SMBus generic I/O> on smbus3 smbus3: <unknown device> at addr 0xa0 smbus3: <unknown device> at addr 0xa2 smbus3: <unknown device> at addr 0xa4 smbus3: <unknown device> at addr 0xa6 smbus3: <unknown device> at addr 0xa8 smbus3: <unknown device> at addr 0xaa smbus3: <unknown device> at addr 0xac smbus3: <unknown device> at addr 0xae imcsmb3: <iMC SMBus controller> numa-domain 0 on imcsmb_pci1 smbus4: <System Management Bus> numa-domain 0 on imcsmb3 smb4: <SMBus generic I/O> on smbus4 smbus4: <unknown device> at addr 0xa0 smbus4: <unknown device> at addr 0xa2 smbus4: <unknown device> at addr 0xa4 smbus4: <unknown device> at addr 0xa6 smbus4: <unknown device> at addr 0xa8 smbus4: <unknown device> at addr 0xaa smbus4: <unknown device> at addr 0xac smbus4: <unknown device> at addr 0xae jedec_dimm0: failed to read dram_type jedec_dimm1: failed to read dram_type jedec_dimm2: failed to read dram_type jedec_dimm3: failed to read dram_type jedec_dimm4: failed to read dram_type jedec_dimm5: failed to read dram_type jedec_dimm6: failed to read dram_type jedec_dimm7: failed to read dram_type imcsmb0: transfer timeout jedec_dimm8: failed to read dram_type imcsmb0: transfer timeout jedec_dimm9: failed to read dram_type imcsmb0: transfer timeout jedec_dimm10: failed to read dram_type imcsmb0: transfer timeout jedec_dimm11: failed to read dram_type imcsmb0: transfer timeout jedec_dimm12: failed to read dram_type imcsmb0: transfer timeout jedec_dimm13: failed to read dram_type imcsmb0: transfer timeout jedec_dimm14: failed to read dram_type imcsmb0: transfer timeout jedec_dimm15: failed to read dram_type imcsmb1: transfer timeout jedec_dimm16: failed to read dram_type imcsmb1: transfer timeout jedec_dimm17: failed to read dram_type imcsmb1: transfer timeout jedec_dimm18: failed to read dram_type imcsmb1: transfer timeout jedec_dimm19: failed to read dram_type imcsmb1: transfer timeout jedec_dimm20: failed to read dram_type imcsmb1: transfer timeout jedec_dimm21: failed to read dram_type imcsmb1: transfer timeout jedec_dimm22: failed to read dram_type imcsmb1: transfer timeout jedec_dimm23: failed to read dram_type imcsmb2: transfer timeout jedec_dimm24: failed to read dram_type imcsmb2: transfer timeout jedec_dimm25: failed to read dram_type imcsmb2: transfer timeout jedec_dimm26: failed to read dram_type imcsmb2: transfer timeout jedec_dimm27: failed to read dram_type imcsmb2: transfer timeout jedec_dimm28: failed to read dram_type imcsmb2: transfer timeout jedec_dimm29: failed to read dram_type imcsmb2: transfer timeout jedec_dimm30: failed to read dram_type imcsmb2: transfer timeout jedec_dimm31: failed to read dram_type imcsmb3: transfer timeout jedec_dimm32: failed to read dram_type imcsmb3: transfer timeout jedec_dimm33: failed to read dram_type imcsmb3: transfer timeout jedec_dimm34: failed to read dram_type imcsmb3: transfer timeout jedec_dimm35: failed to read dram_type imcsmb3: transfer timeout jedec_dimm36: failed to read dram_type imcsmb3: transfer timeout jedec_dimm37: failed to read dram_type imcsmb3: transfer timeout jedec_dimm38: failed to read dram_type imcsmb3: transfer timeout jedec_dimm39: failed to read dram_type "sysctl -a | grep jedec" still empty. Look like addresses are incorrect "0xa0 0xa2 0xa4 0xa6 0xa8 0xaa 0xac 0xae". I'm using IRC - freenode and efnet - we can discuss this faster in IRC. Part of the dmidecode output: Handle 0x002B, DMI type 17, 84 bytes Memory Device Array Handle: 0x0029 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 16384 MB Form Factor: DIMM Set: None Locator: P1-DIMMA1 Bank Locator: P0_Node0_Channel0_Dimm0 Type: DDR4 Type Detail: Synchronous Registered (Buffered) Speed: 2666 MT/s Manufacturer: Samsung Serial Number: 37984D9E Asset Tag: P1-DIMMA1_AssetTag (date:17/48) Part Number: M393A2K40BB2-CTD Rank: 1 Configured Memory Speed: 2133 MT/s Minimum Voltage: 1.2 V Maximum Voltage: 1.2 V Configured Voltage: 1.2 V Memory Technology: DRAM Memory Operating Mode Capability: Volatile memory Firmware Version: 0000 Module Manufacturer ID: Bank 1, Hex 0xCE Module Product ID: Unknown Memory Subsystem Controller Manufacturer ID: Unknown Memory Subsystem Controller Product ID: Unknown Non-Volatile Size: None Volatile Size: 16 GB Cache Size: None Logical Size: None Maybe you need remote access to hardware? Sorry, this fell off my radar. The problem here is that the iMC-SMBus controller was not really intended for use by the OS. During POST, the memory controller uses it to read the SPD information from the DIMMs and configure itself to use their DRAM; during normal operation, the system firmware (the Management Engine?) uses it to read the TSOD temperature from the DIMMs. The hardware has a BUSY indicator, but it appears to be advisory, and it's possible that firmware does not honor it, which could allow firmware-initiated operations to stomp on OS-initiated operations. And to top it off, I know Intel board firmware disabled OS access to the iMC-SMBus controllers on *Well outright, as part of their security-hardening fixes after Spectre-Meltdown; I suspect other board vendors followed suit. It's possible that for *Lake, they disabled it from the start. The upshot of all this, is that the controller might not be usable by the OS on *Lake CPUs. Try adding this line near the start of imcsmb_transfer(): ================================================================ orig_cntl_val = pci_read_config(sc->imcsmb_pci, sc->regs->smb_cntl, 4); + device_printf(sc->dev, "cntl: 0x%08x\n", orig_cntl_val); cntl_val = orig_cntl_val; ================================================================ I'm particularly interested in bit 26 (0x04000000), SMB_DIS_WRT; if it is set, the BIOS has locked the OS out from using the iMC-SMBus controller, and that's game over. :-/ While I appreciate your offer of remote access, I don't have any time to dig into this right now, and probably won't any time in the next few months. (In reply to Ravi Pokala from comment #10) imcsmb_pci0: <Intel Skylake Xeon iMC 0 SMBus controllers> at device 30.5 numa-domain 0 on pci5 imcsmb0: <iMC SMBus controller> numa-domain 0 on imcsmb_pci0 smbus0: <System Management Bus> numa-domain 0 on imcsmb0 smbus0: <unknown device> at addr 0xa0 smbus0: <unknown device> at addr 0xa2 smbus0: <unknown device> at addr 0xa4 smbus0: <unknown device> at addr 0xa6 smbus0: <unknown device> at addr 0xa8 smbus0: <unknown device> at addr 0xaa smbus0: <unknown device> at addr 0xac smbus0: <unknown device> at addr 0xae imcsmb1: <iMC SMBus controller> numa-domain 0 on imcsmb_pci0 smbus1: <System Management Bus> numa-domain 0 on imcsmb1 smbus1: <unknown device> at addr 0xa0 smbus1: <unknown device> at addr 0xa2 smbus1: <unknown device> at addr 0xa4 smbus1: <unknown device> at addr 0xa6 smbus1: <unknown device> at addr 0xa8 smbus1: <unknown device> at addr 0xaa smbus1: <unknown device> at addr 0xac smbus1: <unknown device> at addr 0xae imcsmb_pci1: <Intel Skylake Xeon iMC 1 SMBus controllers> at device 30.6 numa-domain 0 on pci5 imcsmb2: <iMC SMBus controller> numa-domain 0 on imcsmb_pci1 smbus2: <System Management Bus> numa-domain 0 on imcsmb2 smbus2: <unknown device> at addr 0xa0 smbus2: <unknown device> at addr 0xa2 smbus2: <unknown device> at addr 0xa4 smbus2: <unknown device> at addr 0xa6 smbus2: <unknown device> at addr 0xa8 smbus2: <unknown device> at addr 0xaa smbus2: <unknown device> at addr 0xac smbus2: <unknown device> at addr 0xae imcsmb3: <iMC SMBus controller> numa-domain 0 on imcsmb_pci1 smbus3: <System Management Bus> numa-domain 0 on imcsmb3 smbus3: <unknown device> at addr 0xa0 smbus3: <unknown device> at addr 0xa2 smbus3: <unknown device> at addr 0xa4 smbus3: <unknown device> at addr 0xa6 smbus3: <unknown device> at addr 0xa8 smbus3: <unknown device> at addr 0xaa smbus3: <unknown device> at addr 0xac smbus3: <unknown device> at addr 0xae imcsmb0: cntl: 0x00000000 imcsmb0: transfer timeout jedec_dimm0: failed to read dram_type imcsmb0: cntl: 0x00000000 imcsmb0: transfer timeout jedec_dimm1: failed to read dram_type imcsmb0: cntl: 0x00000000 imcsmb0: transfer timeout jedec_dimm2: failed to read dram_type imcsmb0: cntl: 0x00000000 imcsmb0: transfer timeout jedec_dimm3: failed to read dram_type imcsmb0: cntl: 0x00000000 imcsmb0: transfer timeout jedec_dimm4: failed to read dram_type imcsmb0: cntl: 0x00000000 imcsmb0: transfer timeout jedec_dimm5: failed to read dram_type imcsmb0: cntl: 0x00000000 imcsmb0: transfer timeout jedec_dimm6: failed to read dram_type imcsmb0: cntl: 0x00000000 imcsmb0: transfer timeout jedec_dimm7: failed to read dram_type imcsmb1: cntl: 0x00000000 imcsmb1: transfer timeout jedec_dimm8: failed to read dram_type imcsmb1: cntl: 0x00000000 imcsmb1: transfer timeout jedec_dimm9: failed to read dram_type imcsmb1: cntl: 0x00000000 imcsmb1: transfer timeout jedec_dimm10: failed to read dram_type imcsmb1: cntl: 0x00000000 imcsmb1: transfer timeout jedec_dimm11: failed to read dram_type imcsmb1: cntl: 0x00000000 imcsmb1: transfer timeout jedec_dimm12: failed to read dram_type imcsmb1: cntl: 0x00000000 imcsmb1: transfer timeout jedec_dimm13: failed to read dram_type imcsmb1: cntl: 0x00000000 imcsmb1: transfer timeout jedec_dimm14: failed to read dram_type imcsmb1: cntl: 0x00000000 imcsmb1: transfer timeout jedec_dimm15: failed to read dram_type imcsmb2: cntl: 0x00000000 imcsmb2: transfer timeout jedec_dimm16: failed to read dram_type imcsmb2: cntl: 0x00000000 imcsmb2: transfer timeout jedec_dimm17: failed to read dram_type imcsmb2: cntl: 0x00000000 imcsmb2: transfer timeout jedec_dimm18: failed to read dram_type imcsmb2: cntl: 0x00000000 imcsmb2: transfer timeout jedec_dimm19: failed to read dram_type imcsmb2: cntl: 0x00000000 imcsmb2: transfer timeout jedec_dimm20: failed to read dram_type imcsmb2: cntl: 0x00000000 imcsmb2: transfer timeout jedec_dimm21: failed to read dram_type imcsmb2: cntl: 0x00000000 imcsmb2: transfer timeout jedec_dimm22: failed to read dram_type imcsmb2: cntl: 0x00000000 imcsmb2: transfer timeout jedec_dimm23: failed to read dram_type imcsmb3: cntl: 0x00000000 imcsmb3: transfer timeout jedec_dimm24: failed to read dram_type imcsmb3: cntl: 0x00000000 imcsmb3: transfer timeout jedec_dimm25: failed to read dram_type imcsmb3: cntl: 0x00000000 imcsmb3: transfer timeout jedec_dimm26: failed to read dram_type imcsmb3: cntl: 0x00000000 imcsmb3: transfer timeout jedec_dimm27: failed to read dram_type imcsmb3: cntl: 0x00000000 imcsmb3: transfer timeout jedec_dimm28: failed to read dram_type imcsmb3: cntl: 0x00000000 imcsmb3: transfer timeout jedec_dimm29: failed to read dram_type imcsmb3: cntl: 0x00000000 imcsmb3: transfer timeout jedec_dimm30: failed to read dram_type imcsmb3: cntl: 0x00000000 imcsmb3: transfer timeout jedec_dimm31: failed to read dram_type |