Bug 231923 - [pci] AMD Ryzen X370 chipset PCIe bridge failed to allocate initial memory window
Summary: [pci] AMD Ryzen X370 chipset PCIe bridge failed to allocate initial memory wi...
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-03 19:14 UTC by Val Packett
Modified: 2019-03-14 20:27 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Val Packett 2018-10-03 19:14:38 UTC
Running fairly recent modified CURRENT (actually ALPHA7) from September 26. (Haven't noticed PCIe related changes in the commits since then.)

Moved the system from a SATA SSD to an NVMe one. The Mellanox network card that was installed in the bottom (connected to the X370 chipset) PCIe slot stopped working:

pcib8: <ACPI PCI-PCI bridge> irq 32 at device 4.0 on pci3
pcib3: attempting to grow memory window for (0xfdf00000-0xfe0fffff,0x200000)
	front candidate range: 0xfdf00000-0xfe0fffff
pcib8: failed to allocate initial memory window: 0xfdf00000-0xfe0fffff
pcib3: allocated prefetch range (0xf0800000-0xf0ffffff) for rid 24 of pcib8
pcib8:   domain            0
pcib8:   secondary bus     36
pcib8:   subordinate bus   36
pcib8:   prefetched decode 0xf0800000-0xf0ffffff
pcib8: could not get PCI interrupt routing table for \134_SB_.PCI0.GPP2.PT02.PT24 - AE_NOT_FOUND
pci8: <ACPI PCI bus> on pcib8
pcib8: allocated bus range (36-36) for rid 0 of pci8
pci8: domain=0, physical bus=36
pcib3: attempting to grow memory window for (0xfe000000-0xfe0fffff,0x100000)
	front candidate range: 0xfe000000-0xfe0fffff
pcib8: failed to allocate initial memory window (0xfe000000-0xfe0fffff,0x100000)
pci8: pci0:36:0:0 bar 0x10 failed to allocate
	map[18]: type Prefetchable Memory, range 64, base rxf0800000, size 23, memory disabled
pcib8: allocated prefetch range (0xf0800000-0xf0ffffff) for rid 18 of pci0:36:0:0
pcib2: matched entry for 3.0.INTA
pcib2: slot 0 INTA hardwired to IRQ 32
pcib3: slot 4 INTA is routed to irq 32
pcib8: slot 0 INTA is routed to irq 32
pci8: <network, ethernet> at device 0.0 (no driver attached)
[...]
mlx4_core0: <mlx4_core> mem 0xf0800000-0xf0ffffff irq 32 at device 0.0 on pci8
mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017)
mlx4_core: Initializing mlx4_core
pcib3: attempting to grow memory window for (0-0xffffffff,0x100000)
	front candidate range: 0xfe100000-0xfe1fffff
	back candidate range: 0xfe300000-0xfe3fffff
pcib8: failed to allocate initial memory window (0-0xffffffff,0x100000)
mlx4_core0: 0x100000 bytes of rid 0x10 res 3 failed (0, 0xffffffffffffffff).
mlx4_core0: Couldn't get PCI resources, aborting
device_attach: mlx4_core0 attach returned 22


The same thing is happening to pcib9, which has one of the XHCI controllers on it (so some USB3 ports aren't working), but that's been happening even before the NVMe drive:

xhci2: <XHCI (generic) USB 3.0 controller> irq 32 at device 0.0 on pci9
pcib3: attempting to grow memory window for (0-0xffffffff,0x100000)
	front candidate range: 0xfe100000-0xfe1fffff
	back candidate range: 0xfe300000-0xfe3fffff
pcib9: failed to allocate initial memory window (0-0xffffffff,0x8000)
xhci2: 0x8000 bytes of rid 0x10 res 3 failed (0, 0xffffffffffffffff).
xhci2: Could not map memory
device_attach: xhci2 attach returned 12


pciconf looks like this:

pcib8@pci0:22:4:0:	class=0x060400 card=0x33061b21 chip=0x43b41022 rev=0x02 hdr=0x01
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = '300 Series Chipset PCIe Port'
    class      = bridge
    subclass   = PCI-PCI
    bus range  = 36-36
    window[1c] = type I/O Port, range 32, addr 0xfff000-0xfff, disabled
    window[20] = type Memory, range 32, addr 0xfff00000-0xfffff, disabled
    window[24] = type Prefetchable Memory, range 64, addr 0xf0800000-0xf0ffffff, enabled
    cap 05[50] = MSI supports 1 message, 64 bit 
    cap 01[78] = powerspec 3  supports D0 D3  current D0
    cap 10[80] = PCI-Express 2 downstream port max data 128(512) RO NS
                 link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)
                 slot 1 power limit 26000 mW
    cap 0d[c0] = PCI Bridge card=0x33061b21
    ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
    ecap 0019[200] = PCIe Sec 1 lane errors 0
    ecap 001e[400] = unknown 1
     Corrected = Receiver Error
none1@pci0:36:0:0:	class=0x020000 card=0x002115b3 chip=0x675015b3 rev=0xb0 hdr=0x00
    vendor     = 'Mellanox Technologies'
    device     = 'MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s]'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 64, base rxfe000000, size 1048576, disabled
    bar   [18] = type Prefetchable Memory, range 64, base rxf0800000, size 8388608, disabled
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 03[48] = VPD
    cap 11[9c] = MSI-X supports 128 messages
                 Table in map 0x10[0x7c000], PBA in map 0x10[0x7d000]
    cap 10[60] = PCI-Express 2 endpoint max data 128(256) FLR
                 link x4(x8) speed 5.0(5.0) ASPM disabled(L0s)
    ecap 000e[100] = ARI 1
    ecap 0003[148] = Serial 1 0002c903004d7392


Moving the network card to the middle slot (taking away half the lanes from the GPU in the top slot, but it's not like it needs them) fixed it.
But I physically can't apply that fix to the XHCI controller :)

What's interesting is that the network card *did* work before the NVMe SSD was installed — even though the M.2 NVMe lanes are not chipset lanes, they are direct CPU ones.
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2018-10-09 04:50:07 UTC
Clearing 'regression' keyword -- this does not appear to be a software regression.
Comment 2 Val Packett 2019-03-13 17:27:24 UTC
Looks like hw.pci.enable_io_modes=0 fixes this.

(Was discovered when installing an RX Vega GPU — that was actually panicking with "next resource mismatch", so I went to fiddle with pci tunables.)

So this might be some weird misconfiguration in firmware.
Comment 3 Val Packett 2019-03-14 20:27:16 UTC
Okay, the real fix was updating the firmware.

Still, odd that Windows 10 never complained about anything and everything worked there…