Bug 229429

Summary: FIRST_MSI_INT is too low for large core count systems, panic on boot
Product: Base System Reporter: kevin
Component: kernAssignee: John Baldwin <jhb>
Status: Closed FIXED    
Severity: Affects Some People CC: cem, emaste, feld, jhb, rgrimes, suryakiran3516
Priority: ---    
Version: CURRENT   
Hardware: amd64   
OS: Any   
Bug Depends on:    
Bug Blocks: 228911    

Description kevin 2018-06-30 14:47:46 UTC
As mentioned on the freebsd-current list:

https://lists.freebsd.org/pipermail/freebsd-current/2018-April/069203.html

FIRST_MSI_INT is too low for some large systems. Without raising this constant, the kernel panics on boot unless MSI is disabled.

The original reporter on the mailing list reported success raising it to 512, my system required raising it higher (768 worked). John Baldwin suggested making this a dynamically calculated value.
Comment 1 John Baldwin freebsd_committer freebsd_triage 2018-08-17 12:35:36 UTC
Add this to the 12.0 list.  I have a WIP patch not yet ready for review but that I hope to test shortly.
Comment 2 John Baldwin freebsd_committer freebsd_triage 2018-08-19 14:47:52 UTC
I have booted a patch that should fix this.  I will post it for review once I get it cleaned up and push a few prereqs in, but you can use the 'x86_dynamic_irqs' branch of github.com/bsdjhb/freebsd.git for early testing.
Comment 3 John Baldwin freebsd_committer freebsd_triage 2018-08-23 07:45:35 UTC
See https://reviews.freebsd.org/D16861
Comment 4 commit-hook freebsd_committer freebsd_triage 2018-08-28 21:09:56 UTC
A commit references this bug:

Author: jhb
Date: Tue Aug 28 21:09:21 UTC 2018
New revision: 338360
URL: https://svnweb.freebsd.org/changeset/base/338360

Log:
  Dynamically allocate IRQ ranges on x86.

  Previously, x86 used static ranges of IRQ values for different types
  of I/O interrupts.  Interrupt pins on I/O APICs and 8259A PICs used
  IRQ values from 0 to 254.  MSI interrupts used a compile-time-defined
  range starting at 256, and Xen event channels used a
  compile-time-defined range after MSI.  Some recent systems have more
  than 255 I/O APIC interrupt pins which resulted in those IRQ values
  overflowing into the MSI range triggering an assertion failure.

  Replace statically assigned ranges with dynamic ranges.  Do a single
  pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to
  determine the total number of IRQs required.  Allocate the interrupt
  source and interrupt count arrays dynamically once this pass has
  completed.  To minimize runtime complexity these arrays are only sized
  once during bootup.  The PIC range is determined by the PICs present
  in the system.  The MSI and Xen ranges continue to use a fixed size,
  though this does make it possible to turn the MSI range size into a
  tunable in the future.

  As a result, various places are updated to use dynamic limits instead
  of constants.  In addition, the vmstat(8) utility has been taught to
  understand that some kernels may treat 'intrcnt' and 'intrnames' as
  pointers rather than arrays when extracting interrupt stats from a
  crashdump.  This is determined by the presence (vs absence) of a
  global 'nintrcnt' symbol.

  This change reverts r189404 which worked around a buggy BIOS which
  enumerated an I/O APIC twice (using the same memory mapped address for
  both entries but using an IRQ base of 256 for one entry and a valid
  IRQ base for the second entry).  Making the "base" of MSI IRQ values
  dynamic avoids the panic that r189404 worked around, and there may now
  be valid I/O APICs with an IRQ base above 256 which this workaround
  would incorrectly skip.

  If in the future the issue reported in PR 130483 reoccurs, we will
  have to add a pass over the I/O APIC entries in the MADT to detect
  duplicates using the memory mapped address and use some strategy to
  choose the "correct" one.

  While here, reserve room in intrcnts for the Hyper-V counters.

  PR:		229429, 130483
  Reviewed by:	kib, royger, cem
  Tested by:	royger (Xen), kib (DMAR)
  Approved by:	re (gjb)
  MFC after:	2 weeks
  Differential Revision:	https://reviews.freebsd.org/D16861

Changes:
  head/sys/sys/interrupt.h
  head/sys/x86/acpica/madt.c
  head/sys/x86/include/apicvar.h
  head/sys/x86/include/intr_machdep.h
  head/sys/x86/iommu/intel_intrmap.c
  head/sys/x86/isa/atpic.c
  head/sys/x86/x86/intr_machdep.c
  head/sys/x86/x86/io_apic.c
  head/sys/x86/x86/local_apic.c
  head/sys/x86/x86/msi.c
  head/sys/x86/x86/nexus.c
  head/sys/x86/xen/xen_intr.c
  head/sys/x86/xen/xen_msi.c
  head/sys/x86/xen/xen_nexus.c
  head/usr.bin/vmstat/vmstat.c
Comment 5 John Baldwin freebsd_committer freebsd_triage 2018-08-28 21:13:35 UTC
No longer a blocker for 12.0.
Comment 6 commit-hook freebsd_committer freebsd_triage 2018-11-01 18:35:25 UTC
A commit references this bug:

Author: jhb
Date: Thu Nov  1 18:34:29 UTC 2018
New revision: 340016
URL: https://svnweb.freebsd.org/changeset/base/340016

Log:
  MFC 338360,338415,338624,338630,338631,338725: Dynamic x86 IRQ layout.

  338360:
  Dynamically allocate IRQ ranges on x86.

  Previously, x86 used static ranges of IRQ values for different types
  of I/O interrupts.  Interrupt pins on I/O APICs and 8259A PICs used
  IRQ values from 0 to 254.  MSI interrupts used a compile-time-defined
  range starting at 256, and Xen event channels used a
  compile-time-defined range after MSI.  Some recent systems have more
  than 255 I/O APIC interrupt pins which resulted in those IRQ values
  overflowing into the MSI range triggering an assertion failure.

  Replace statically assigned ranges with dynamic ranges.  Do a single
  pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to
  determine the total number of IRQs required.  Allocate the interrupt
  source and interrupt count arrays dynamically once this pass has
  completed.  To minimize runtime complexity these arrays are only sized
  once during bootup.  The PIC range is determined by the PICs present
  in the system.  The MSI and Xen ranges continue to use a fixed size,
  though this does make it possible to turn the MSI range size into a
  tunable in the future.

  As a result, various places are updated to use dynamic limits instead
  of constants.  In addition, the vmstat(8) utility has been taught to
  understand that some kernels may treat 'intrcnt' and 'intrnames' as
  pointers rather than arrays when extracting interrupt stats from a
  crashdump.  This is determined by the presence (vs absence) of a
  global 'nintrcnt' symbol.

  This change reverts r189404 which worked around a buggy BIOS which
  enumerated an I/O APIC twice (using the same memory mapped address for
  both entries but using an IRQ base of 256 for one entry and a valid
  IRQ base for the second entry).  Making the "base" of MSI IRQ values
  dynamic avoids the panic that r189404 worked around, and there may now
  be valid I/O APICs with an IRQ base above 256 which this workaround
  would incorrectly skip.

  If in the future the issue reported in PR 130483 reoccurs, we will
  have to add a pass over the I/O APIC entries in the MADT to detect
  duplicates using the memory mapped address and use some strategy to
  choose the "correct" one.

  While here, reserve room in intrcnts for the Hyper-V counters.

  338415:
  Fix build of x86 UP kernels after dynamic IRQ changes in r338360.

  338624:
  msi: remove the check that interrupt sources have been added

  When running as a specific type of Xen guest the hypervisor won't
  provide any emulated IO-APICs or legacy PICs at all, thus hitting the
  following assert in the MSI code:

  panic: Assertion num_io_irqs > 0 failed at /usr/src/sys/x86/x86/msi.c:334
  cpuid = 0
  time = 1
  KDB: stack backtrace:
  db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff826ffa70
  vpanic() at vpanic+0x1a3/frame 0xffffffff826ffad0
  panic() at panic+0x43/frame 0xffffffff826ffb30
  msi_init() at msi_init+0xed/frame 0xffffffff826ffb40
  apic_setup_io() at apic_setup_io+0x72/frame 0xffffffff826ffb50
  mi_startup() at mi_startup+0x118/frame 0xffffffff826ffb70
  start_kernel() at start_kernel+0x10

  Fix this by removing the assert in the MSI code, since it's possible
  to get to the MSI initialization without having registered any other
  interrupt sources.

  338630:
  lapic: skip setting intrcnt if lapic is not present

  Instead of panicking. Legacy PVH mode doesn't provide a lapic, and
  since native_lapic_intrcnt is called unconditionally this would cause
  the assert to trigger. Change the assert into a continue in order to
  take into account the possibility of systems without a lapic.

  338631:
  xen: legacy PVH fixes for the new interrupt count

  Register interrupts using the PIC pic_register_sources method instead
  of doing it in apic_setup_io. This is now required, since the internal
  interrupt structures are not yet setup when calling apic_setup_io.

  338725:
  Fix a regression in r338360 when booting an x86 machine without APIC.

  The atpic_register_sources callback tries to avoid registering interrupt
  sources that would collide with an I/O APIC.  However, the previous
  implementation was failing to register IRQs 8-15 since the slave PIC
  saw valid IRQs from the master and assumed an I/O APIC was present.  To
  fix, go back to registering all 8259A interrupt sources in one loop when
  the master's register_sources method is invoked.

  PR:             229429, 130483, 231291

Changes:
_U  stable/11/
  stable/11/sys/sys/interrupt.h
  stable/11/sys/x86/acpica/madt.c
  stable/11/sys/x86/include/apicvar.h
  stable/11/sys/x86/include/intr_machdep.h
  stable/11/sys/x86/iommu/intel_intrmap.c
  stable/11/sys/x86/isa/atpic.c
  stable/11/sys/x86/x86/intr_machdep.c
  stable/11/sys/x86/x86/io_apic.c
  stable/11/sys/x86/x86/local_apic.c
  stable/11/sys/x86/x86/msi.c
  stable/11/sys/x86/x86/nexus.c
  stable/11/sys/x86/xen/pvcpu_enum.c
  stable/11/sys/x86/xen/xen_intr.c
  stable/11/sys/x86/xen/xen_msi.c
  stable/11/sys/x86/xen/xen_nexus.c
  stable/11/sys/xen/xen_intr.h
  stable/11/usr.bin/vmstat/vmstat.c
Comment 7 John Baldwin freebsd_committer freebsd_triage 2019-02-13 21:29:06 UTC
*** Bug 226574 has been marked as a duplicate of this bug. ***