I've been trying to build a virtualised box with FreeBSD as a guest to do some network stuff and I needed to passthrough physical NICs to the guest. I attempted to passthrough a port of i340-T4 to FreeBSD guest, but this caused panic at boot time: igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> mem 0xe7a00000-0xe7a7ffff,0xe7afc000-0xe7afffff irq 19 at device 0.0 on pci4 panic: resource_list_release: can't find resource cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b240f7 at kdb_backtrace+0x67 #1 0xffffffff80ad9462 at vpanic+0x182 #2 0xffffffff80ad92d3 at panic+0x43 #3 0xffffffff80b16d5f at resource_list_release+0x1bf #4 0xffffffff8054a7d4 at igb_attach+0x804 #5 0xffffffff80b160f0 at device_attach+0x420 #6 0xffffffff80b172dd at bus_generic_attach+0x2d #7 0xffffffff80711145 at pci_attach+0xd5 #8 0xffffffff80b160f0 at device_attach+0x420 #9 0xffffffff80b172dd at bus_generic_attach+0x2d #10 0xffffffff803c02e1 at acpi_pcib_pci_attach+0xa1 #11 0xffffffff80b160f0 at device_attach+0x420 #12 0xffffffff80b172dd at bus_generic_attach+0x2d #13 0xffffffff80711145 at pci_attach+0xd5 #14 0xffffffff80b160f0 at device_attach+0x420 #15 0xffffffff80b172dd at bus_generic_attach+0x2d #16 0xffffffff803bfa1a at acpi_pcib_acpi_attach+0x3ba #17 0xffffffff80b160f0 at device_attach+0x420 This behaviour is consistent with both ESXi 6.5 and KVM (on Ubuntu 16.04.2) hypervisors, yet when the same is attempted with Linux (Ubuntu 16.04.2) as guest, everything seems alright. The problem doesn't arise when the NIC is i350-T4. For completeness: % uname -a FreeBSD gamma 11.0-RELEASE-p8 FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
Created attachment 181227 [details] correcting resource managements The panic may be vanished by this patch. But I think that your virtualbox setting are something corrupted because of the igb driver failed to allocate pci resources for some reasons.
The patch didn't seem to resolve anything: igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k-p1> mem 0xe7a00000-0xe7a7ffff,0xe7afc000-0xe7afffff irq 17 at device 0.0 on pci5 panic: resource_list_release: can't find resource cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b24067 at kdb_backtrace+0x67 #1 0xffffffff80ad93d2 at vpanic+0x182 #2 0xffffffff80ad9243 at panic+0x43 #3 0xffffffff80b16ccf at resource_list_release+0x1bf #4 0xffffffff8054a76b at igb_attach+0x80b #5 0xffffffff80b16060 at device_attach+0x420 #6 0xffffffff80b1724d at bus_generic_attach+0x2d #7 0xffffffff807110b5 at pci_attach+0xd5 #8 0xffffffff80b16060 at device_attach+0x420 #9 0xffffffff80b1724d at bus_generic_attach+0x2d #10 0xffffffff803c0261 at acpi_pcib_pci_attach+0xa1 #11 0xffffffff80b16060 at device_attach+0x420 #12 0xffffffff80b1724d at bus_generic_attach+0x2d #13 0xffffffff807110b5 at pci_attach+0xd5 #14 0xffffffff80b16060 at device_attach+0x420 #15 0xffffffff80b1724d at bus_generic_attach+0x2d #16 0xffffffff803bf99a at acpi_pcib_acpi_attach+0x3ba #17 0xffffffff80b16060 at device_attach+0x420 Please _do_not_ blame _hypervisor_ as passthrough works perfectly well in Ubuntu 16.04.2 and Windows 2012 R2 with *inbox* drivers- I was able to successfully update both Ubuntu and Windows over the Internet without a hitch.
(In reply to igor from comment #2) Hmm. I'm sorry. Which line can you investigate "igb_attach+0x804" ? Please report some lines before panic with verbose mode.
(In reply to Kaho Toshikazu from comment #3) pcib19: <ACPI PCI-PCI bridge> at device 23.0 on pci0 pcib0: allocated type 3 (0xe7a00000-0xe7afffff) for rid 24 of pcib19 pcib19: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 275 to local APIC 0 vector 74 pcib19: using IRQ 275 for MSI pcib19: [GIANT-LOCKED] pcib19: HotPlug command: 0000 -> 102b pcib19: domain 0 pcib19: secondary bus 19 pcib19: subordinate bus 19 pcib19: prefetched decode 0xe7a00000-0xe7afffff pcib19: special decode ISA pcib19: could not get PCI interrupt routing table for \_SB_.PCI0.PE60 - AE_NOT_FOUND pci5: <ACPI PCI bus> on pcib19 pcib19: allocated bus range (19-19) for rid 0 of pci5 pci5: domain=0, physical bus=19 found-> vendor=0x8086, dev=0x150e, revid=0x01 domain=0, bus=19, slot=0, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0003, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=7 powerspec 3 supports D0 D3 current D0 MSI supports 1 message, 64 bit, vector masks MSI-X supports 10 messages in map 0x20 map[10]: type Prefetchable Memory, range 64, base rxe7a00000, size 19, enabled pcib19: allocated prefetch range (0xe7a00000-0xe7a7ffff) for rid 10 of pci0:19:0:0 map[20]: type Prefetchable Memory, range 64, base rxe7afc000, size 14, enabled pcib19: allocated prefetch range (0xe7afc000-0xe7afffff) for rid 20 of pci0:19:0:0 pcib0: matched entry for 0.23.INTB pcib0: slot 23 INTB hardwired to IRQ 17 pcib19: slot 0 INTB is routed to irq 17 igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k-p1> mem 0xe7a00000-0xe7a7ffff,0xe7afc000-0xe7afffff irq 17 at device 0.0 on pci5 igb0: attempting to allocate 3 MSI-X vectors (10 supported) panic: resource_list_release: can't find resource cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b24067 at kdb_backtrace+0x67 #1 0xffffffff80ad93d2 at vpanic+0x182 #2 0xffffffff80ad9243 at panic+0x43 #3 0xffffffff80b16ccf at resource_list_release+0x1bf #4 0xffffffff8054a76b at igb_attach+0x80b #5 0xffffffff80b16060 at device_attach+0x420 #6 0xffffffff80b1724d at bus_generic_attach+0x2d #7 0xffffffff807110b5 at pci_attach+0xd5 #8 0xffffffff80b16060 at device_attach+0x420 #9 0xffffffff80b1724d at bus_generic_attach+0x2d #10 0xffffffff803c0261 at acpi_pcib_pci_attach+0xa1 #11 0xffffffff80b16060 at device_attach+0x420 #12 0xffffffff80b1724d at bus_generic_attach+0x2d #13 0xffffffff807110b5 at pci_attach+0xd5 #14 0xffffffff80b16060 at device_attach+0x420 #15 0xffffffff80b1724d at bus_generic_attach+0x2d #16 0xffffffff803bf99a at acpi_pcib_acpi_attach+0x3ba #17 0xffffffff80b16060 at device_attach+0x420 For reference, this is what lspci -vvv spits out on Linux when a function is passed through: 0b:00.0 Ethernet controller: Intel Corporation 82580 Gigabit Network Connection (rev 01) DeviceName: pciPassthru0 Subsystem: Intel Corporation Ethernet Server Adapter I340-T4 Physical Slot: 192 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 64 bytes Interrupt: pin D routed to IRQ 18 Region 0: Memory at e7a00000 (64-bit, prefetchable) [size=512K] Region 4: Memory at e7afc000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Count=10 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00002000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x32, ASPM L0s, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x32, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [140 v1] Device Serial Number 90-e2-ba-ff-ff-4e-c3-6c Capabilities: [1a0 v1] Transaction Processing Hints Device specific mode supported Steering table in TPH capability structure Kernel driver in use: igb Kernel modules: igb Let me know if you need anything else :-)
(In reply to igor from comment #4) Thanks for your reports. It reveals the location causing the panic, but I'm sorry that I cannot find why the panic happens. All resource allocations are succeeding but only MSI-X setting is failing. A workaround I can suggest for this panic is to disable MSI-X. Please try to set in your /boot/loader.conf : hw.igb.enable_msix=0
(In reply to Kaho Toshikazu from comment #5) Good news is that the work-around works, but given that MSI-X seems to work in Linux, let's see if we can fix this for FreeBSD. I have sprinkled some trace statements liberally in some places in if_igb.c and pci.c (especially that the latter seems to have been written to fail extraordinarily quietly!) and this is where I got so far, but I've got no idea where PCIB_ALLOC_MSIX(device_get_parent(dev), child, &irq); takes me from pci.c: pcib19: <ACPI PCI-PCI bridge> at device 23.0 on pci0 pcib0: allocated type 3 (0xe7a00000-0xe7afffff) for rid 24 of pcib19 pcib19: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 275 to local APIC 0 vector 74 pcib19: using IRQ 275 for MSI pcib19: [GIANT-LOCKED] pcib19: HotPlug command: 0000 -> 102b pcib19: domain 0 pcib19: secondary bus 19 pcib19: subordinate bus 19 pcib19: prefetched decode 0xe7a00000-0xe7afffff pcib19: special decode ISA pcib19: could not get PCI interrupt routing table for \_SB_.PCI0.PE60 - AE_NOT_FOUND pci5: <ACPI PCI bus> on pcib19 pcib19: allocated bus range (19-19) for rid 0 of pci5 pci5: domain=0, physical bus=19 found-> vendor=0x8086, dev=0x150e, revid=0x01 domain=0, bus=19, slot=0, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0003, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=7 powerspec 3 supports D0 D3 current D0 MSI supports 1 message, 64 bit, vector masks MSI-X supports 10 messages in map 0x20 map[10]: type Prefetchable Memory, range 64, base rxe7a00000, size 19, enabled pcib19: allocated prefetch range (0xe7a00000-0xe7a7ffff) for rid 10 of pci0:19:0:0 map[20]: type Prefetchable Memory, range 64, base rxe7afc000, size 14, enabled pcib19: allocated prefetch range (0xe7afc000-0xe7afffff) for rid 20 of pci0:19:0:0 pcib0: matched entry for 0.23.INTB pcib0: slot 23 INTB hardwired to IRQ 17 pcib19: slot 0 INTB is routed to irq 17 igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k-pX3> mem 0xe7a00000-0xe7a7ffff,0xe7afc000-0xe7afffff irq 17 at device 0.0 on pci5 igb0: X_identify_hardware igb0: X_allocate_pci_resources igb0: got 10 from pci_msix_count() igb0: got 0 BAR() igb0: X_pci_alloc_msix pci5: got to pci_alloc_msix_method pci5: Having a real go here... igb0: attempting to allocate 3 MSI-X vectors (10 supported) pci5: pci_alloc_msix_method: Failied to allocate on i == 0 (REASON: 6) pcib19: was the failed parent igb0: err: 6, msgs: 3, want: 3 igb0: well, that didn't work, X_pci_release_msi panic: resource_list_release: can't find resource cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b24337 at kdb_backtrace+0x67 #1 0xffffffff80ad96a2 at vpanic+0x182 #2 0xffffffff80ad9513 at panic+0x43 #3 0xffffffff80b16f9f at resource_list_release+0x1bf #4 0xffffffff8054a7f0 at igb_attach+0x890 #5 0xffffffff80b16330 at device_attach+0x420 #6 0xffffffff80b1751d at bus_generic_attach+0x2d #7 0xffffffff80711385 at pci_attach+0xd5 #8 0xffffffff80b16330 at device_attach+0x420 #9 0xffffffff80b1751d at bus_generic_attach+0x2d #10 0xffffffff803c0261 at acpi_pcib_pci_attach+0xa1 #11 0xffffffff80b16330 at device_attach+0x420 #12 0xffffffff80b1751d at bus_generic_attach+0x2d #13 0xffffffff80711385 at pci_attach+0xd5 #14 0xffffffff80b16330 at device_attach+0x420 #15 0xffffffff80b1751d at bus_generic_attach+0x2d #16 0xffffffff803bf99a at acpi_pcib_acpi_attach+0x3ba #17 0xffffffff80b16330 at device_attach+0x420
got a bit further, edited pcib_alloc_msix(device_t pcib, device_t dev, int *irq) pci_pci.c to be more noisy on failure (why core is written to fail so silently is really beyond me!): igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k-pX3> mem 0xe7a000 00-0xe7a7ffff,0xe7afc000-0xe7afffff irq 17 at device 0.0 on pci5 igb0: X_identify_hardware igb0: X_allocate_pci_resources igb0: got 10 from pci_msix_count() igb0: got 0 BAR() igb0: X_pci_alloc_msix pci5: got to pci_alloc_msix_method pci5: Having a real go here... igb0: attempting to allocate 3 MSI-X vectors (10 supported) igb0: attempting to pcib_alloc_msix igb0: >>> PCIB_DISABLE_MSIX flag is set! pci5: pci_alloc_msix_method: Failied to allocate on i == 0 (REASON: 6) pcib19: was the failed parent it seems that PCIB_DISABLE_MSIX flag is set which is what sets off the ENXIO error.
Even more digging to see why i350 doesn't panic but i340 does:- here's what i350 looks like: pcib19: <ACPI PCI-PCI bridge> at device 23.0 on pci0 pcib19: >>> MSI-X BLACKLISTED! Setting PCIB_DISABLE_MSIX flag. pcib0: allocated type 3 (0xfd100000-0xfd2fffff) for rid 20 of pcib19 pcib19: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 275 to local APIC 0 vector 74 pcib19: using IRQ 275 for MSI pcib19: [GIANT-LOCKED] pcib19: HotPlug command: 0000 -> 102b pcib19: domain 0 pcib19: secondary bus 19 pcib19: subordinate bus 19 pcib19: memory decode 0xfd100000-0xfd2fffff pcib19: special decode ISA pcib19: could not get PCI interrupt routing table for \_SB_.PCI0.PE60 - AE_NOT_FOUND pci5: <ACPI PCI bus> on pcib19 pcib19: allocated bus range (19-19) for rid 0 of pci5 pci5: domain=0, physical bus=19 found-> vendor=0x8086, dev=0x1521, revid=0x01 domain=0, bus=19, slot=0, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0003, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=a, irq=9 powerspec 3 supports D0 D3 current D0 MSI supports 1 message, 64 bit, vector masks MSI-X supports 10 messages in map 0x1c map[10]: type Memory, range 32, base rxfd100000, size 20, enabled pcib19: allocated memory range (0xfd100000-0xfd1fffff) for rid 10 of pci0:19:0:0 map[1c]: type Memory, range 32, base rxfd2fc000, size 14, enabled pcib19: allocated memory range (0xfd2fc000-0xfd2fffff) for rid 1c of pci0:19:0:0 pcib0: matched entry for 0.23.INTA pcib0: slot 23 INTA hardwired to IRQ 16 pcib19: slot 0 INTA is routed to irq 16 igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k-pX3> mem 0xfd100000-0xfd1fffff,0xfd2fc000-0xfd2fffff irq 16 at device 0.0 on pci5 igb0: X_identify_hardware igb0: X_allocate_pci_resources igb0: got 10 from pci_msix_count() igb0: got -47202304 BAR() igb0: X_pci_alloc_msix pci5: got to pci_alloc_msix_method pci5: Having a real go here... igb0: attempting to allocate 3 MSI-X vectors (10 supported) igb0: attempting to pcib_alloc_msix igb0: >>> PCIB_DISABLE_MSIX flag is set! pci5: pci_alloc_msix_method: Failied to allocate on i == 0 (REASON: 6) pcib19: was the failed parent igb0: err: 6, msgs: 3, want: 3 igb0: well, that didn't work, X_pci_release_msi igb0: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 276 to local APIC 0 vector 75 igb0: using IRQ 276 for MSI igb0: Using an MSI interrupt igb0: X_setup_init_funcs igb0: X_get_bus_info igb0: X_validate_descriptors igb0: X_allocate_queues igb0: X_allocate_stats igb0: X_multicast_array_memory igb0: X_hardware_specifics igb0: X_reset_hw igb0: X_check_eeprom igb0: X_setup_interface igb0: bpf attached igb0: Ethernet address: a0:36:9f:31:a3:90 igb0: X_igb_reset igb0: X_configure_interrupts igb0: attempting to igb_allocate_legacy() igb0: That went better than expected... igb0: X_attaching_netmap igb0: netmap queues/slots: TX 1/1024, RX 1/1024 igb0: X_VOILA!!! cf. i340: pcib19: <ACPI PCI-PCI bridge> at device 23.0 on pci0 pcib19: >>> MSI-X BLACKLISTED! Setting PCIB_DISABLE_MSIX flag. pcib0: allocated type 3 (0xe7a00000-0xe7afffff) for rid 24 of pcib19 pcib19: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 275 to local APIC 0 vector 74 pcib19: using IRQ 275 for MSI pcib19: [GIANT-LOCKED] pcib19: HotPlug command: 0000 -> 102b pcib19: domain 0 pcib19: secondary bus 19 pcib19: subordinate bus 19 pcib19: prefetched decode 0xe7a00000-0xe7afffff pcib19: special decode ISA pcib19: could not get PCI interrupt routing table for \_SB_.PCI0.PE60 - AE_NOT_FOUND pci5: <ACPI PCI bus> on pcib19 pcib19: allocated bus range (19-19) for rid 0 of pci5 pci5: domain=0, physical bus=19 found-> vendor=0x8086, dev=0x150e, revid=0x01 domain=0, bus=19, slot=0, func=0 class=02-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0003, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) intpin=b, irq=7 powerspec 3 supports D0 D3 current D0 MSI supports 1 message, 64 bit, vector masks MSI-X supports 10 messages in map 0x20 map[10]: type Prefetchable Memory, range 64, base rxe7a00000, size 19, enabled pcib19: allocated prefetch range (0xe7a00000-0xe7a7ffff) for rid 10 of pci0:19:0:0 map[20]: type Prefetchable Memory, range 64, base rxe7afc000, size 14, enabled pcib19: allocated prefetch range (0xe7afc000-0xe7afffff) for rid 20 of pci0:19:0:0 pcib0: matched entry for 0.23.INTB pcib0: slot 23 INTB hardwired to IRQ 17 pcib19: slot 0 INTB is routed to irq 17 igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k-pX3> mem 0xe7a00000-0xe7a7ffff,0xe7afc000-0xe7afffff irq 17 at device 0.0 on pci5 igb0: X_identify_hardware igb0: X_allocate_pci_resources igb0: got 10 from pci_msix_count() igb0: got 0 BAR() igb0: X_pci_alloc_msix pci5: got to pci_alloc_msix_method pci5: Having a real go here... igb0: attempting to allocate 3 MSI-X vectors (10 supported) igb0: attempting to pcib_alloc_msix igb0: >>> PCIB_DISABLE_MSIX flag is set! pci5: pci_alloc_msix_method: Failied to allocate on i == 0 (REASON: 6) pcib19: was the failed parent igb0: err: 6, msgs: 3, want: 3 igb0: well, that didn't work, X_pci_release_msi panic: resource_list_release: can't find resource cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b24367 at kdb_backtrace+0x67 #1 0xffffffff80ad96d2 at vpanic+0x182 #2 0xffffffff80ad9543 at panic+0x43 #3 0xffffffff80b16fcf at resource_list_release+0x1bf #4 0xffffffff8054a7f0 at igb_attach+0x890 #5 0xffffffff80b16360 at device_attach+0x420 #6 0xffffffff80b1754d at bus_generic_attach+0x2d #7 0xffffffff80711385 at pci_attach+0xd5 #8 0xffffffff80b16360 at device_attach+0x420 #9 0xffffffff80b1754d at bus_generic_attach+0x2d #10 0xffffffff803c0261 at acpi_pcib_pci_attach+0xa1 #11 0xffffffff80b16360 at device_attach+0x420 #12 0xffffffff80b1754d at bus_generic_attach+0x2d #13 0xffffffff80711385 at pci_attach+0xd5 #14 0xffffffff80b16360 at device_attach+0x420 #15 0xffffffff80b1754d at bus_generic_attach+0x2d #16 0xffffffff803bf99a at acpi_pcib_acpi_attach+0x3ba #17 0xffffffff80b16360 at device_attach+0x420 So despite Linux seemingly supporting MSI-X and FreeBSD blacklisting them in this instance, it would appear that whatever is happening with i350, allows i350 to gracefully back-out from attempting to use MSI-X and use MSI instead, whereas i340 panics.
(In reply to igor from comment #8) Thank you for your useful reports, but I cannot yet find why it panics. I think PCIB_ALLOC_MSIX() is pcib_alloc_msix() in pci_pci.c:2687. It is checking parent pci-pci-bridge chip's MSIX capability. In pci.c:264, VMWare's pci chipsets are listed as PCI_QUIRK_DISABLE_MSIX devices, and the igb device driver will attempt to use MSIX if this line is removed. I don't know what happen if MSIX is enabled in the VMWare.
(In reply to igor from comment #8) Hello. Please try a patch. It seems the rid at _alloc_ and at _release_ are different. Index: if_igb.c =================================================================== --- if_igb.c (revision 315684) +++ if_igb.c (working copy) @@ -2901,7 +2901,7 @@ msi: if (adapter->msix_mem != NULL) { bus_release_resource(dev, SYS_RES_MEMORY, - PCIR_BAR(IGB_MSIX_BAR), adapter->msix_mem); + adapter->memrid, adapter->msix_mem); adapter->msix_mem = NULL; } msgs = 1;
Unrelated but I don't know who wide-spread this is:- Use of pci_read_config(...) in igb_setup_msix(...) is wrong- the return type of the function is `uint32_t' but its return is assigned to an `int' variable. I know it doesn't cause any problem in *this* case, but it's just plain wrong and bad coding. Anyway, should the return from pci_release_msi(dev) be checked? igb_setup_msix() drops the return value but in this case the call returns ENODEV. I've traced the problem in if_igb.c code, which would've been obvious from the calls igb_setup_msix() makes- if_igb.c is releasing the resource that doesn't exist! The igb_setup_msix() allocates resources with: adapter->msix_mem = bus_alloc_resource_any(dev, SYS_RES_MEMORY, &adapter->memrid, RF_ACTIVE); but releases with: bus_release_resource(dev, SYS_RES_MEMORY, PCIR_BAR(IGB_MSIX_BAR), adapter->msix_mem); This is wishful thinking that adapter->memrid == PCIR_BAR(IGB_MSIX_BAR). Another problem is that just a few lines above the call to bus_alloc_resource_any(...) adapter->msix_mem is set by PCIR_BAR(IGB_MSIX_BAR), with a comment of: /* ** Some new devices, as with ixgbe, now may ** use a different BAR, so we need to keep ** track of which is used. */ Yet, nothing but wishful thinking is "keeping track", as the value of memrid is subsequently overwrite by bus_alloc_resource_any(...) a few lines down. Secondly, I have no idea why there's a precaution about seemingly post-ixgbe(4) drivers inside igb(4)? Anyway, the wishful thinking falls on its face with i340: igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k-pX5> mem 0xe7a000 00-0xe7a7ffff,0xe7afc000-0xe7afffff irq 17 at device 0.0 on pci5 igb0: X_identify_hardware igb0: X_allocate_pci_resources igb0: got 10 from pci_msix_count() igb0: got 0 BAR() igb0: allocated msix_mem with rid = 32 igb0: X_pci_alloc_msix pci5: got to pci_alloc_msix_method pci5: Having a real go here... igb0: attempting to allocate 3 MSI-X vectors (10 supported) igb0: attempting to pcib_alloc_msix igb0: >>> PCIB_DISABLE_MSIX flag is set! pci5: pci_alloc_msix_method: Failied to allocate on i == 0 (REASON: 6) pcib19: was the failed parent igb0: err: 6, msgs: 3, want: 3 igb0: well, that didn't work, X_pci_release_msi pci5: into the belly of the beast!.. pci5: nothing to release igb0: done with pci_release_msi(), result: 19 igb0: Releasing MSI-X memory with bus_release_resource() for rid = 28... igb0: got resource at 0 for resource type 3 with rid 28 in 0xfffff800060df808 re source list panic: resource_list_release: can't find resource This is what's going on here:- igb0: allocated msix_mem with rid = 32 ... igb0: Releasing MSI-X memory with bus_release_resource() for rid = 28... Patently, in this case adapter->memrid != PCIR_BAR(IGB_MSIX_BAR) with i350, we have: igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k-pX5> mem 0xfd1000 00-0xfd1fffff,0xfd2fc000-0xfd2fffff irq 16 at device 0.0 on pci5 igb0: X_identify_hardware igb0: X_allocate_pci_resources igb0: got 10 from pci_msix_count() igb0: got -47202304 BAR() igb0: allocated msix_mem with rid = 28 igb0: X_pci_alloc_msix pci5: got to pci_alloc_msix_method pci5: Having a real go here... igb0: attempting to allocate 3 MSI-X vectors (10 supported) igb0: attempting to pcib_alloc_msix igb0: >>> PCIB_DISABLE_MSIX flag is set! pci5: pci_alloc_msix_method: Failied to allocate on i == 0 (REASON: 6) pcib19: was the failed parent igb0: err: 6, msgs: 3, want: 3 igb0: well, that didn't work, X_pci_release_msi pci5: into the belly of the beast!.. pci5: nothing to release igb0: done with pci_release_msi(), result: 19 igb0: Releasing MSI-X memory with bus_release_resource() for rid = 28... igb0: got resource at 0xfffff80006035880 for resource type 3 with rid 28 in 0xff fff800060df808 resource list igb0: attempting to allocate MSI... So, here adapter->memrid == PCIR_BAR(IGB_MSIX_BAR) assumption holds, but that's an *assumption* and it fails badly in this instance! Replacing bus_release_resource(dev, SYS_RES_MEMORY, PCIR_BAR(IGB_MSIX_BAR), adapter->msix_mem); with bus_release_resource(dev, SYS_RES_MEMORY, adapter->memrid, adapter->msix_mem); inside igb_setup_msix(...) seems to make the problem disappear: igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k-pX6> mem 0xe7a000 00-0xe7a7ffff,0xe7afc000-0xe7afffff irq 17 at device 0.0 on pci5 igb0: X_identify_hardware igb0: X_allocate_pci_resources igb0: got 10 from pci_msix_count() igb0: got 0 BAR() igb0: allocated msix_mem with rid = 32 igb0: X_pci_alloc_msix pci5: got to pci_alloc_msix_method pci5: Having a real go here... igb0: attempting to allocate 3 MSI-X vectors (10 supported) igb0: attempting to pcib_alloc_msix igb0: >>> PCIB_DISABLE_MSIX flag is set! pci5: pci_alloc_msix_method: Failied to allocate on i == 0 (REASON: 6) pcib19: was the failed parent igb0: err: 6, msgs: 3, want: 3 igb0: well, that didn't work, X_pci_release_msi pci5: into the belly of the beast!.. pci5: nothing to release igb0: done with pci_release_msi(), result: 19 igb0: Releasing MSI-X memory with bus_release_resource() for rid = 28... igb0: got resource at 0xfffff80006035880 for resource type 3 with rid 32 in 0xff fff800060df808 resource list igb0: attempting to allocate MSI... igb0: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 276 to local APIC 0 vector 75 igb0: using IRQ 276 for MSI igb0: Using an MSI interrupt igb0: X_setup_init_funcs igb0: X_get_bus_info igb0: X_validate_descriptors igb0: X_allocate_queues igb0: X_allocate_stats igb0: X_multicast_array_memory igb0: X_hardware_specifics igb0: X_reset_hw igb0: X_check_eeprom igb0: X_setup_interface igb0: bpf attached igb0: Ethernet address: 90:e2:ba:4e:c3:6d igb0: X_igb_reset igb0: X_configure_interrupts igb0: attempting to igb_allocate_legacy() igb0: That went better than expected... igb0: X_attaching_netmap igb0: netmap queues/slots: TX 1/1024, RX 1/1024 igb0: X_VOILA!!! *HOWEVER*, I do not have the design specification for if_igb (presuming it wasn't written off the top of someone's head), so I don't know whether that change is the *correct* fix. I have no idea how badly this kind of "wishful thinking" is infectiously spread within the drivers, but at least an audit of e1000 out to be carried out given that the assumption demonstratively doesn't hold.
(In reply to Kaho Toshikazu from comment #10) Indeed, we zeroed in on the same issue! So is that the correct fix per spec? An,d subsequently, should /* ** Some new devices, as with ixgbe, now may ** use a different BAR, so we need to keep ** track of which is used. */ adapter->memrid = PCIR_BAR(IGB_MSIX_BAR); inside igb_setup_msix(...) be thrown away to avoid the confusion, or does bus_alloc_resource_any(...) need hint?
(In reply to igor from comment #11) Hello. Thank you for your test. I think the situation is like this. The function bus_alloc_resource_any() calls resource_list_alloc() in subr_bus.c:3362, and the bus_release_resource() calls resource_list_release() in subr_bus.c:3436. The rid(resource id) can be set to any value by the device driver, but it must be same value at both a allocation time and a releasing time. If the rid value is incorrect, the kernel cannot find out the resource requested by the device driver and the kernel panics. The difference between i350 and i340(82580) is using PCIR_BAR3(pci register: Base Address Register #3) or BAR4. "adapter->memrid == PCIR_BAR(IGB_MSIX_BAR)" is setting the rid as BAR3. If the BAR3 is unused, the BAR4 is used instead of the BAR3 by "adapter->memrid += 4" and the rid is also changed to BAR4. If the device driver use MSI-X, the resources are not released and any problem does not occur. But in a vmware environment, the pci bus code of FreeBSD rejects to use MSI-X by the pci quirk mechanism, the igb driver allocates resources at 1st and the driver will release the resources because of the quirk mechanism, and then the driver mistakes the rid value if BAR4 is using. -- The function pci_release_msi(dev) releases all resources related MSI and MSI-X. This function may be called at a time some resources are allocated but some are not. There is nothing to do regardless any return value of the function.
(In reply to Kaho Toshikazu from comment #13) So far as MSI-X under VMware goes, I found this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203874 and the logic behind blacklisting seems rather meretricious- the hypervisor got blamed early on and nobody looked at the correctness of the driver code (like happened in this PR early on ;-)), and nobody knew any better. I commented out the blacklisting entries and there was no apparent problem with MSI-X under ESXi. Going back to the fall-back mode of igb(4), it seems that you submitted the patch while I was writing my reply and experimenting. So your understanding is on par with my experiments. I suspect the call adapter->memrid == PCIR_BAR(IGB_MSIX_BAR) that precedes bus_alloc_resource_any(...) is superfluous as the latter overwrites adapter->memrid regardless of what PCIR_BAR(...) returns earlier.
(In reply to igor from comment #14) Hello, > I suspect the call adapter->memrid == PCIR_BAR(IGB_MSIX_BAR) that precedes > bus_alloc_resource_any(...) is superfluous as the latter overwrites > adapter->memrid regardless of what PCIR_BAR(...) returns earlier. if_igb.c 2823: adapter->memrid = PCIR_BAR(IGB_MSIX_BAR); 2824: bar = pci_read_config(dev, adapter->memrid, 4); 2825: if (bar == 0) /* use next bar */ 2826: adapter->memrid += 4; 2827: adapter->msix_mem = bus_alloc_resource_any(dev, 2828: SYS_RES_MEMORY, &adapter->memrid, RF_ACTIVE); 2902: if (adapter->msix_mem != NULL) { 2903: bus_release_resource(dev, SYS_RES_MEMORY, 2904: adapter->memrid, adapter->msix_mem); 2905: adapter->msix_mem = NULL; 2906: } The variable memrid has the address of pci register using for MSI-X. in line 2823: A normal chip uses "PCIR_BAR(IGB_MSIX_BAR)" meaning BAR3 and the memrid saves it. 2824: It reads BAR3 register using the variable memrid. 2825-2826: If BAR3 has 0, BAR3 is not using and the memrid changes to indicate BAR4. Some chips use BAR4 instead of BAR3. If BAR3 is using, nothing do. 2827-2828: It allocates memory indicated by memrid which has BAR3 or BAR4. msix_mem saves its information. 2902: NULL means free or not using. 2903: The resource releases here. The original code always uses BAR3 regardless which BAR is using and it causes the panic when it is using BAR4. 2905: mark as unused. In dmesg map[10]: BAR0 map[1c]: BAR3 -- i350 map[20]: BAR4 -- i340
(In reply to Kaho Toshikazu from comment #15) Ah, I see! I thought it was bus_alloc_resource_any(...) that was writing to memrid on allocation, why else would it need a pointer to a non-const int, right?! ;-) Thanks for clearing it up. We're good to go, the patch that changes rid passed to bus_release_resource(...) does the trick.
(In reply to igor from comment #16) Which patch are you guys referencing here?
(In reply to Sean Bruno from comment #17) The patch in <a href='#c10'>Comment 10</a> is the one that fixes the panic. The "correcting resource management" patch looks like it fixes unrelated unallocated freeing ;-)
Created attachment 181305 [details] msix resource management (In reply to Sean Bruno from comment #17) Hello. Can you direct commit comment #10 patch to 11-stable and close this bug? The quirk for VMware is another story: bug #203874 The bug seems to be introduced at r256200 which updated to version 2.4.0. 12-current does not affect this bug. iflib treats MSI-X resources and it is correct. em driver in 11-stable has similar code, but the assumption MSI-X BAR is always BAR3 is correct at the moment. The patch attached with this comment contains similar changes which add a variable into "struct adapter", and some memories will be consumed for a unnecessary variable which has a constant value. The first patch unrelated the panic changes a pair of alloc and release. I thought that a garbage of some failures which seems valid despite of nothing allocated caused the panic but the thought was incorrect.
A commit references this bug: Author: sbruno Date: Wed Apr 5 19:46:25 UTC 2017 New revision: 316540 URL: https://svnweb.freebsd.org/changeset/base/316540 Log: Direct commit of fixes to stable/11, resolving PCI passthrough and initialization issues when trying to passthrough a i340 (igb) to VMware. While here, cleanup some bits of em(4) to DTRT as well. PR: 218113 Submitted by: Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> Changes: stable/11/sys/dev/e1000/if_em.c stable/11/sys/dev/e1000/if_em.h stable/11/sys/dev/e1000/if_igb.c
Should this be put into stable/10 ?
(In reply to Sean Bruno from comment #21) Yes, stable/10 has same problem, I think. Same patch can apply with some offsets.
A commit references this bug: Author: sbruno Date: Thu Apr 6 19:13:40 UTC 2017 New revision: 316588 URL: https://svnweb.freebsd.org/changeset/base/316588 Log: Direct commit of fixes to stable/10, resolving PCI passthrough and initialization issues when trying to passthrough a i340 (igb) to VMware. While here, cleanup some bits of em(4) to DTRT as well. PR: 218113 Submitted by: Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> Changes: stable/10/sys/dev/e1000/if_em.c stable/10/sys/dev/e1000/if_em.h stable/10/sys/dev/e1000/if_igb.c