Computer: Dell PowerEdge T300 Server NICs: Myricom 10G-PCIE-8AL-C OS: FreeBSD 7.0/7.1 Release After I rebuild the kernel (to include NIC driver "mxge") and reboot, kernel panic occurs and the computer is automatically rebooted. After contacting with engineers of Myricom, they suggest to disable message signaled interrupt (MSI) in /boot/loader.conf and these NICs can work now. " hw.pci.enable_msix=0 hw.pci.enable_msi=0 " However, MSI is really very important for high speed data transmission. Huge number of packets generate many interruptions. Considering that Fedora 9 works well on the same computer, it may be a bug of FreeBSD. Below is the screen when kernel panic occurs. ...... p4tcc3: <CPU Frequency Thermal Control> on cpu3 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 pci3: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 3.0 on pci0 pci4: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> at device 4.0 on pci0 pci5: <ACPI PCI bus> on pcib3 mxge0: <Myri10G-PCIE-8A> mem 0xd8000000-0xd8ffffff, 0xdfa00000-0xdfafffff irq 16 at device 0.0 on pci5 panic: nexus_add_irq: failed ..... Best Regards, Xiuchao Wu (wuxiuchao@gmail.com)
How many MSI messages is mxge0 attempting to add and how many other devices in your system are using MSI? Getting a full copy of a verbose dmesg would be most helpful. 'pciconf -lc' might also help answer the first question. -- John Baldwin
State Changed From-To: open->feedback Note that submitter feedback was requested.
State Changed From-To: feedback->closed Feedback timeout.
For posterity in case we have to revisit this in the future. I ended up resolving this PR in r189404. I've included some boot -v output from the affected machine below along with some additional notes: MADT: Found IO APIC ID 4, Interrupt 0 at 0xfec00000 ioapic0: Changing APIC ID to 4 ioapic0: Routing external 8259A's -> intpin 0 MADT: Found IO APIC ID 5, Interrupt 256 at 0xfec10000 ioapic1: Changing APIC ID to 5 ioapic1: WARNING: intbase 256 != expected base r24 MADT: Found IO APIC ID 6, Interrupt 64 at 0xfec10000 ioapic2: Changing APIC ID to 6 ioapic2: WARNING: intbase 64 != expected base r280 lapic: Routing NMI -> LINT1 lapic: LINT1 trigger: edge lapic: LINT1 polarity: high MADT: Interrupt override: source 0, irq 2 ioapic0: Routing IRQ 0 -> intpin 2 MADT: Interrupt override: source 9, irq 9 ioapic0: intpin 9 trigger: level ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 0-23 on motherboard ioapic2 <Version 2.0> irqs 64-87 on motherboard Here the error was that ioapic1 and ioapic2 were actually the same I/O APIC (note the same memory mapped base address), but ioapic1 used a IRQ base of 256 which caused IRQ values for MSI to get reserved by the nexus resulting in the panic. Here is the MADT table from this machine: /* APIC: Length=138, Revision=1, Checksum=201, OEMID=DELL, OEM Table ID=PE_SC3, OEM Revision=0x1, Creator ID=DELL, Creator Revision=0x1 Local APIC ADDR=0xfee00000 Flags={PC-AT} Type=Local APIC ACPI CPU=1 Flags={ENABLED} APIC ID=0 Type=Local APIC ACPI CPU=2 Flags={ENABLED} APIC ID=1 Type=Local APIC ACPI CPU=3 Flags={ENABLED} APIC ID=2 Type=Local APIC ACPI CPU=4 Flags={ENABLED} APIC ID=3 Type=Local NMI ACPI CPU=ALL LINT Pin=1 Flags={Polarity=active-hi, Trigger=edge} Type=INT Override BUS=0 IRQ=0 INTR=2 Flags={Polarity=conforming, Trigger=conforming} Type=INT Override BUS=0 IRQ=9 INTR=9 Flags={Polarity=active-hi, Trigger=level} Type=IO APIC APIC ID=4 INT BASE=0 ADDR=0x00000000fec00000 Type=IO APIC APIC ID=5 INT BASE=256 ADDR=0x00000000fec10000 Type=IO APIC APIC ID=6 INT BASE=64 ADDR=0x00000000fec10000 */
A commit references this bug: Author: jhb Date: Tue Aug 28 21:09:21 UTC 2018 New revision: 338360 URL: https://svnweb.freebsd.org/changeset/base/338360 Log: Dynamically allocate IRQ ranges on x86. Previously, x86 used static ranges of IRQ values for different types of I/O interrupts. Interrupt pins on I/O APICs and 8259A PICs used IRQ values from 0 to 254. MSI interrupts used a compile-time-defined range starting at 256, and Xen event channels used a compile-time-defined range after MSI. Some recent systems have more than 255 I/O APIC interrupt pins which resulted in those IRQ values overflowing into the MSI range triggering an assertion failure. Replace statically assigned ranges with dynamic ranges. Do a single pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to determine the total number of IRQs required. Allocate the interrupt source and interrupt count arrays dynamically once this pass has completed. To minimize runtime complexity these arrays are only sized once during bootup. The PIC range is determined by the PICs present in the system. The MSI and Xen ranges continue to use a fixed size, though this does make it possible to turn the MSI range size into a tunable in the future. As a result, various places are updated to use dynamic limits instead of constants. In addition, the vmstat(8) utility has been taught to understand that some kernels may treat 'intrcnt' and 'intrnames' as pointers rather than arrays when extracting interrupt stats from a crashdump. This is determined by the presence (vs absence) of a global 'nintrcnt' symbol. This change reverts r189404 which worked around a buggy BIOS which enumerated an I/O APIC twice (using the same memory mapped address for both entries but using an IRQ base of 256 for one entry and a valid IRQ base for the second entry). Making the "base" of MSI IRQ values dynamic avoids the panic that r189404 worked around, and there may now be valid I/O APICs with an IRQ base above 256 which this workaround would incorrectly skip. If in the future the issue reported in PR 130483 reoccurs, we will have to add a pass over the I/O APIC entries in the MADT to detect duplicates using the memory mapped address and use some strategy to choose the "correct" one. While here, reserve room in intrcnts for the Hyper-V counters. PR: 229429, 130483 Reviewed by: kib, royger, cem Tested by: royger (Xen), kib (DMAR) Approved by: re (gjb) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16861 Changes: head/sys/sys/interrupt.h head/sys/x86/acpica/madt.c head/sys/x86/include/apicvar.h head/sys/x86/include/intr_machdep.h head/sys/x86/iommu/intel_intrmap.c head/sys/x86/isa/atpic.c head/sys/x86/x86/intr_machdep.c head/sys/x86/x86/io_apic.c head/sys/x86/x86/local_apic.c head/sys/x86/x86/msi.c head/sys/x86/x86/nexus.c head/sys/x86/xen/xen_intr.c head/sys/x86/xen/xen_msi.c head/sys/x86/xen/xen_nexus.c head/usr.bin/vmstat/vmstat.c
A commit references this bug: Author: jhb Date: Thu Nov 1 18:34:29 UTC 2018 New revision: 340016 URL: https://svnweb.freebsd.org/changeset/base/340016 Log: MFC 338360,338415,338624,338630,338631,338725: Dynamic x86 IRQ layout. 338360: Dynamically allocate IRQ ranges on x86. Previously, x86 used static ranges of IRQ values for different types of I/O interrupts. Interrupt pins on I/O APICs and 8259A PICs used IRQ values from 0 to 254. MSI interrupts used a compile-time-defined range starting at 256, and Xen event channels used a compile-time-defined range after MSI. Some recent systems have more than 255 I/O APIC interrupt pins which resulted in those IRQ values overflowing into the MSI range triggering an assertion failure. Replace statically assigned ranges with dynamic ranges. Do a single pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to determine the total number of IRQs required. Allocate the interrupt source and interrupt count arrays dynamically once this pass has completed. To minimize runtime complexity these arrays are only sized once during bootup. The PIC range is determined by the PICs present in the system. The MSI and Xen ranges continue to use a fixed size, though this does make it possible to turn the MSI range size into a tunable in the future. As a result, various places are updated to use dynamic limits instead of constants. In addition, the vmstat(8) utility has been taught to understand that some kernels may treat 'intrcnt' and 'intrnames' as pointers rather than arrays when extracting interrupt stats from a crashdump. This is determined by the presence (vs absence) of a global 'nintrcnt' symbol. This change reverts r189404 which worked around a buggy BIOS which enumerated an I/O APIC twice (using the same memory mapped address for both entries but using an IRQ base of 256 for one entry and a valid IRQ base for the second entry). Making the "base" of MSI IRQ values dynamic avoids the panic that r189404 worked around, and there may now be valid I/O APICs with an IRQ base above 256 which this workaround would incorrectly skip. If in the future the issue reported in PR 130483 reoccurs, we will have to add a pass over the I/O APIC entries in the MADT to detect duplicates using the memory mapped address and use some strategy to choose the "correct" one. While here, reserve room in intrcnts for the Hyper-V counters. 338415: Fix build of x86 UP kernels after dynamic IRQ changes in r338360. 338624: msi: remove the check that interrupt sources have been added When running as a specific type of Xen guest the hypervisor won't provide any emulated IO-APICs or legacy PICs at all, thus hitting the following assert in the MSI code: panic: Assertion num_io_irqs > 0 failed at /usr/src/sys/x86/x86/msi.c:334 cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff826ffa70 vpanic() at vpanic+0x1a3/frame 0xffffffff826ffad0 panic() at panic+0x43/frame 0xffffffff826ffb30 msi_init() at msi_init+0xed/frame 0xffffffff826ffb40 apic_setup_io() at apic_setup_io+0x72/frame 0xffffffff826ffb50 mi_startup() at mi_startup+0x118/frame 0xffffffff826ffb70 start_kernel() at start_kernel+0x10 Fix this by removing the assert in the MSI code, since it's possible to get to the MSI initialization without having registered any other interrupt sources. 338630: lapic: skip setting intrcnt if lapic is not present Instead of panicking. Legacy PVH mode doesn't provide a lapic, and since native_lapic_intrcnt is called unconditionally this would cause the assert to trigger. Change the assert into a continue in order to take into account the possibility of systems without a lapic. 338631: xen: legacy PVH fixes for the new interrupt count Register interrupts using the PIC pic_register_sources method instead of doing it in apic_setup_io. This is now required, since the internal interrupt structures are not yet setup when calling apic_setup_io. 338725: Fix a regression in r338360 when booting an x86 machine without APIC. The atpic_register_sources callback tries to avoid registering interrupt sources that would collide with an I/O APIC. However, the previous implementation was failing to register IRQs 8-15 since the slave PIC saw valid IRQs from the master and assumed an I/O APIC was present. To fix, go back to registering all 8259A interrupt sources in one loop when the master's register_sources method is invoked. PR: 229429, 130483, 231291 Changes: _U stable/11/ stable/11/sys/sys/interrupt.h stable/11/sys/x86/acpica/madt.c stable/11/sys/x86/include/apicvar.h stable/11/sys/x86/include/intr_machdep.h stable/11/sys/x86/iommu/intel_intrmap.c stable/11/sys/x86/isa/atpic.c stable/11/sys/x86/x86/intr_machdep.c stable/11/sys/x86/x86/io_apic.c stable/11/sys/x86/x86/local_apic.c stable/11/sys/x86/x86/msi.c stable/11/sys/x86/x86/nexus.c stable/11/sys/x86/xen/pvcpu_enum.c stable/11/sys/x86/xen/xen_intr.c stable/11/sys/x86/xen/xen_msi.c stable/11/sys/x86/xen/xen_nexus.c stable/11/sys/xen/xen_intr.h stable/11/usr.bin/vmstat/vmstat.c