Bug 236838 - Commit r340224 prevents loading the kernel for some Intel Xeon hardware
Summary: Commit r340224 prevents loading the kernel for some Intel Xeon hardware
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-27 22:09 UTC by longwitz
Modified: 2019-08-28 09:30 UTC (History)
3 users (show)

See Also:


Attachments
Output of pciconf -lvcb (9.03 KB, text/plain)
2019-04-03 16:33 UTC, longwitz
no flags Details
Output serial console of full verbose boot (50.77 KB, text/plain)
2019-04-04 17:27 UTC, longwitz
no flags Details
Do not use memory-mapped config space access for PCIe on older chipsets until pmap is ready to create the mapping. (1.28 KB, patch)
2019-04-04 18:22 UTC, Konstantin Belousov
no flags Details | Diff
Do not use memory-mapped config space access for PCIe on older chipsets until pmap is ready to create the mapping. (1.28 KB, patch)
2019-04-04 18:23 UTC, Konstantin Belousov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description longwitz 2019-03-27 22:09:34 UTC
I run GENERIC kernel of 12.0-STABLE FreeBSD 12.0-STABLE #1 r345004M on different hardware. On one type I must revert commit r340224, otherwise the kernel hangs at boot without giving any message. This hardware is

CPU: Intel(R) Xeon(TM) CPU 3.60GHz (3591.07-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0xf43  Family=0xf  Model=0x4  Stepping=3
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x659d<SSE3,DTES64,MON,DS_CPL,EST,TM2,CNXT-ID,CX16,xTPR>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  TSC: P-state invariant
real memory  = 8589934592 (8192 MB)
avail memory = 8287981568 (7904 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <A M I  OEMAPIC >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 1 core(s) x 2 hardware threads

From pciconf -lv:
vgapci0@pci0:5:12:0:    class=0x030000 card=0x10798086 chip=0x47521002 rev=0x27 hdr=0x00
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Rage 3 [Rage XL PCI]'
    class      = display
    subclass   = VGA

I use kern.vty=sc, but same problem with vt. Also the problem is independent of the loader (forth or lua).
Comment 1 Ed Maste freebsd_committer freebsd_triage 2019-04-01 13:39:44 UTC
Reverted commit r340224 is:
MFC r339979: Add pci_early function to detect Intel stolen memory.
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2019-04-02 15:33:20 UTC
Can you show the output of 'pciconf -lvcb' ?
Comment 3 longwitz 2019-04-03 16:33:52 UTC
Created attachment 203352 [details]
Output of pciconf -lvcb
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2019-04-03 16:54:01 UTC
Are you running i386 or amd64 ? Select right file sys/i386/pci/pci_cfgreg.c or sys/amd64/pci/pci_cfgreg.c, find the pci_cfgregopen() function and remove the 'case 0x3590:' line.  Does it help ?

Also please show the verbose dmesg from the successful boot.
Comment 5 longwitz 2019-04-04 17:27:25 UTC
Created attachment 203384 [details]
Output serial console of full verbose boot

I use amd64, the commit r340224 (r339979) does only affect amd64. Removing the line with 'case 0x3590' helps, kernel now boots fine. With the patch

--- pci_cfgreg.c.orig   2018-11-26 16:43:04.706033000 +0100
+++ pci_cfgreg.c        2019-04-04 11:30:51.847357000 +0200
@@ -90,13 +90,15 @@
         * This also implies that it can do PCIe extended config cycles.
         */

+       printf("pci_cfgregopen called\n");
        /* Check for supported chipsets */
        vid = pci_cfgregread(0, 0, 0, PCIR_VENDOR, 2);
+       printf("pci_cfgregopen: vid=%x\n", vid);
        did = pci_cfgregread(0, 0, 0, PCIR_DEVICE, 2);
+       printf("pci_cfgregopen: vid=%x\n", did);
        switch (vid) {
        case 0x8086:
                switch (did) {
-               case 0x3590:
                case 0x3592:
                        /* Intel 7520 or 7320 */
                        pciebar = pci_cfgregread(0, 0, 0, 0xce, 2) << 16;
@@ -112,6 +114,7 @@
                }
        }

+       printf("pci_cfgregopen returns\n");
        return (1);
 }

together with a 'printf(Calling pci_early_quirks())' in machdep.c and setting "debug.late_console=0" in loader.conf I got the attached output on the serial console.
Comment 6 Konstantin Belousov freebsd_committer freebsd_triage 2019-04-04 18:21:04 UTC
(In reply to longwitz from comment #5)
I think it is not the 0xce register read which causes the hang, but the need to map very large (255MB) region by chomping from virtual_avail which causes the breakage.  You can recheck this by keeping your debugging printfs but reverting the removal of the case line.

You dmesg shows the
""PCIe: Memory Mapped configuration base @..."
line so the memory-mapped config access method works, and this is what I looked
for when asking for dmesg.

Please try the attached patch, if my understanding is right, it should be
the proper fix.
Comment 7 Konstantin Belousov freebsd_committer freebsd_triage 2019-04-04 18:22:36 UTC
Created attachment 203387 [details]
Do not use memory-mapped config space access for PCIe on older chipsets until pmap is ready to create the mapping.
Comment 8 Konstantin Belousov freebsd_committer freebsd_triage 2019-04-04 18:23:51 UTC
Created attachment 203388 [details]
Do not use memory-mapped config space access for PCIe on older chipsets until pmap is ready to create the mapping.
Comment 9 longwitz 2019-04-05 19:52:05 UTC
I can confirm the patch based on the variable pmap_initialized works for my older hardware with vid=0x8086 and did=0x3590. The hang without the patch was in the function pcie_cfgregopen().

For my other servers with vid=0x8086 and did=0x25d8 (E5420) the check for pmap_initialized triggers also, I suppose this is ok.
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2019-04-05 20:20:46 UTC
I put a review to allow some more eyes on this patch.

https://reviews.freebsd.org/D19833
Comment 11 commit-hook freebsd_committer freebsd_triage 2019-04-09 18:07:47 UTC
A commit references this bug:

Author: kib
Date: Tue Apr  9 18:07:18 UTC 2019
New revision: 346062
URL: https://svnweb.freebsd.org/changeset/base/346062

Log:
  pci_cfgreg.c: Use io port config access for early boot time.

  Some early PCIe chipsets are explicitly listed in the white-list to
  enable use of the MMIO config space accesses, perhaps because ACPI
  tables were not reliable source of the base MCFG address at that time.
  For that chipsets, MCFG base was read from the known chipset MCFGbase
  config register.

  During very early stage of boot, when access to the PCI config space
  is performed (see e.g. pci_early_quirks.c), we cannot map 255MB of
  registers because the method used with pre-boot pmap overflows initial
  kernel page tables.

  Move fallback to read MCFGbase to the attachment method of the
  x86/legacy device, which removes code duplication, and results in the
  use of io accesses until MCFG is parsed or legacy attach called.

  For amd64, pre-initialize cfgmech with CFGMECH_1, right now we
  dynamically assign CFGMECH_1 to it anyway, and remove checks for
  CFGMECH_NONE.

  There is a mention in the Intel documentation for corresponding
  chipsets that OS must use either io port or MMIO access method, but we
  already break this rule by reading MCFGbase register, so one more
  access seems to be innocent.

  Reported by:	longwitz@incore.de
  PR:	236838
  Reviewed by:	avg (other version), jhb
  Sponsored by:	The FreeBSD Foundation
  MFC after:	1 week
  Differential revision:	https://reviews.freebsd.org/D19833

Changes:
  head/sys/amd64/pci/pci_cfgreg.c
  head/sys/i386/pci/pci_cfgreg.c
  head/sys/x86/include/pci_cfgreg.h
  head/sys/x86/x86/legacy.c
Comment 12 commit-hook freebsd_committer freebsd_triage 2019-04-16 17:16:35 UTC
A commit references this bug:

Author: kib
Date: Tue Apr 16 17:16:19 UTC 2019
New revision: 346284
URL: https://svnweb.freebsd.org/changeset/base/346284

Log:
  MFC r346062:
  pci_cfgreg.c: Use io port config access for early boot time.

  PR:	236838

Changes:
_U  stable/12/
  stable/12/sys/amd64/pci/pci_cfgreg.c
  stable/12/sys/i386/pci/pci_cfgreg.c
  stable/12/sys/x86/include/pci_cfgreg.h
  stable/12/sys/x86/x86/legacy.c