Bug 232288 - i386 boot time panic of head on amd64 machine (PCI BIOS search, 0x49435024)
Summary: i386 boot time panic of head on amd64 machine (PCI BIOS search, 0x49435024)
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: i386 Any
: --- Affects Only Me
Assignee: John Baldwin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-15 14:53 UTC by Bjoern A. Zeeb
Modified: 2018-10-22 18:06 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-10-15 14:53:12 UTC
Hi,

cross compiled i386 from amd64, r339354 booted on rabbit4 in the netperf cluster.

GDB: no debug ports presentl]...
KDB: debugger backends: ddb..
KDB: current backend: ddbe modules!
---<<BOOT>>---
MP Configuration Table version 1.4 found at 0x4fd540r command prompt.
Table 'FACP' at 0x7df408f0el] in 9 seconds...
Table 'APIC' at 0x7df409e8
APIC: Found table at 0x7df409e8
APIC: Using the MADT enumerator.
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-ALPHA9 r339354 GENERIC i386
FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): resolution 640x480
Preloaded elf kernel "/boot/kernel/kernel" at 0x23dd000.
Table 'FACP' at 0x7df408f0
FACP: Found table at 0x7df408f0
Calibrating TSC clock ... TSC clock: 3500078580 Hz
CPU: Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz (3500.08-MHz 686-class CPU)
  Origin="GenuineIntel"  Id=0x306e4  Family=0x6  Model=0x3e  Stepping=4
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2c100000<NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
  XSAVE Features=0x1<XSAVEOPT>
  VT-x: Basic Features=0xda0400<SMM,INS/OUTS,TRUE>
        Pin-Based Controls=0xff<ExtINT,NMI,VNMI,PreTmr,PostIntr>
        Primary Processor Controls=0xfff9fffe<INTWIN,TSCOff,HLT,INVLPG,MWAIT,RDPMC,RDTSC,CR3-LD,CR3-ST,CR8-LD,CR8-ST,TPR,NMIWIN,MOV-DR,IO,IOmap,MTF,MSRmap,MONITOR,PAUSE>
        Secondary Processor Controls=0xfff<APIC,EPT,DT,RDTSCP,x2APIC,VPID,WBINVD,UG,APIC-reg,VID,PAUSE-loop,RDRAND>
        Exit Controls=0xda0400<PAT-LD,EFER-SV,PTMR-SV>
        Entry Controls=0xda0400
        EPT Features=0x6134141<XO,PW4,UC,WB,2M,1G,INVEPT,single,all>
        VPID Features=0xf01<INVVPID,individual,single,all,single-globals>
  TSC: P-state invariant, performance statistics
Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte pages, 4-way set associative, 4 entries
Data TLB: 4 KB pages, 4-way set associative, 64 entries
Instruction TLB: 2M/4M pages, fully associative, 8 entries
Instruction TLB: 4KByte pages, 4-way set associative, 64 entries
64-Byte prefetching
Shared 2nd-Level TLB: 4 KByte pages, 4-way associative, 512 entries
L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
real memory  = 34368126976 (32776 MB)
Physical memory chunk(s):
0x0000000000001000 - 0x0000000000099fff, 626688 bytes (153 pages)
0x0000000000100000 - 0x00000000007fffff, 7340032 bytes (1792 pages)
0x0000000002429000 - 0x000000007bb33fff, 2037428224 bytes (497419 pages)
avail memory = 2034909184 (1940 MB)
Table 'FACP' at 0x7df408f0
Table 'APIC' at 0x7df409e8
Table 'FPDT' at 0x7df40ab0
Table 'HPET' at 0x7df40af8
Table 'PRAD' at 0x7df40b30
Table 'SPMI' at 0x7df40bf0
Table 'SSDT' at 0x7df40c30
Table 'EINJ' at 0x7e008718
Table 'ERST' at 0x7e008848
Table 'HEST' at 0x7e008a78
Table 'BERT' at 0x7e008b20
Table 'DMAR' at 0x7e008b50
DMAR: Found table at 0x7e008b50
MADT: Found CPU APIC ID 2 ACPI ID 0: enabled
SMP: Added CPU 2 (AP)
MADT: Found CPU APIC ID 4 ACPI ID 2: enabled
SMP: Added CPU 4 (AP)
MADT: Found CPU APIC ID 6 ACPI ID 4: enabled
SMP: Added CPU 6 (AP)
MADT: Found CPU APIC ID 8 ACPI ID 6: enabled
SMP: Added CPU 8 (AP)
MADT: Found CPU APIC ID 3 ACPI ID 1: enabled
SMP: Added CPU 3 (AP)
MADT: Found CPU APIC ID 5 ACPI ID 3: enabled
SMP: Added CPU 5 (AP)
MADT: Found CPU APIC ID 7 ACPI ID 5: enabled
SMP: Added CPU 7 (AP)
MADT: Found CPU APIC ID 9 ACPI ID 7: enabled
SMP: Added CPU 9 (AP)
Event timer "LAPIC" quality 600
ACPI APIC Table: < >
Package ID shift: 5
L3 cache ID shift: 5
L2 cache ID shift: 1
L1 cache ID shift: 1
Core ID shift: 1
INTR: Adding local APIC 4 as a target
INTR: Adding local APIC 6 as a target
INTR: Adding local APIC 8 as a target
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads
Package HW ID = 0
        Core HW ID = 1
                CPU0 (BSP): APIC ID: 2
                CPU1 (AP/HT): APIC ID: 3
        Core HW ID = 2
                CPU2 (AP): APIC ID: 4
                CPU3 (AP/HT): APIC ID: 5
        Core HW ID = 3
                CPU4 (AP): APIC ID: 6
                CPU5 (AP/HT): APIC ID: 7
        Core HW ID = 4
                CPU6 (AP): APIC ID: 8
                CPU7 (AP/HT): APIC ID: 9
APIC: CPU 0 has ACPI ID 0
APIC: CPU 1 has ACPI ID 1
APIC: CPU 2 has ACPI ID 2
APIC: CPU 3 has ACPI ID 3
APIC: CPU 4 has ACPI ID 4
APIC: CPU 5 has ACPI ID 5
APIC: CPU 6 has ACPI ID 6
APIC: CPU 7 has ACPI ID 7
Pentium Pro MTRR support enabled
bios32: Found BIOS32 Service Directory header at 0x4e8500
bios32: Entry = 0xe8510 (4e8510)  Rev = 0  Len = 1
stray irq1


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 02
fault virtual address   = 0x49435024
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0x4e8510
stack pointer           = 0x28:0x2423b68
frame pointer           = 0x28:0x2423ba0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = 0 ()
[ thread pid 0 tid 0 ]
Stopped at      0x4e8510


db> show pcpu
cpuid        = 0
dynamic pcpu = 0x6192c0
curthread    = 0x2133820: pid 0 tid 0 ""
curpcb       = 0x2423c00
fpcurthread  = none
idlethread   = none
APIC ID      = 2
currentldt   = 0x50
trampstk     = 0xffc07ff0
kesp0        = 0x2423bf0
common_tssp  = 0xffc014c0
curvnet      = 0
spin locks held:

db> show thread
Thread 0 at 0x2133820:
 proc (pid 0): 0x21334a8
 stack: 0x2420000-0x2423fff
 flags: 0x4  pflags: 0
 state: INACTIVE
 priority: 0
<..>
Comment 1 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-10-17 09:08:13 UTC
Is there anything we can (automagically) do to prevent this panic?


This seems to come from:

sys/i386/i386/bios.c:bios32_init()

    104             if (bootverbose) {
    105                 printf("bios32: Found BIOS32 Service Directory header at %p\n", sdh);
    106                 printf("bios32: Entry = 0x%x (%x)  Rev = %d  Len = %d\n",
    107                        sdh->entry, bios32_SDCI, sdh->revision, sdh->len);
    108             }
    109
    110             /* Allow user override of PCI BIOS search */
    111             if (((p = kern_getenv("machdep.bios.pci")) == NULL) || strcmp(p, "disable")) {
    112
    113                 /* See if there's a PCI BIOS entrypoint here */
    114                 PCIbios.ident.id = 0x49435024;  /* PCI systems should have this */

^^^^^^^

    115                 if (!bios32_SDlookup(&PCIbios) && bootverbose)
    116                     printf("pcibios: PCI BIOS entry at 0x%x+0x%x\n", PCIbios.base, PCIbios.entry);
    117             }
    118             if (p != NULL)
    119                     freeenv(p);
    120         } else {
    121             printf("bios32: Bad BIOS32 Service Directory\n");
    122         }


set machdep.bios.pci=disable in loader allows rabbit4 to boot.

(some information from kenv on the system):

smbios.bios.reldate="07/05/2013"
smbios.bios.vendor="American Megatrends Inc."
smbios.bios.version="3.00"
smbios.chassis.maker="Supermicro"
smbios.memory.enabled="33562624"
smbios.planar.product="X9SRW-F"
smbios.planar.serial="ZM148S031878"
smbios.planar.version="1.02"
smbios.socket.enabled="1"
smbios.socket.populated="1"
smbios.system.maker="iXsystems"
smbios.system.product="1204S"
smbios.system.serial="A1-35883"
smbios.system.uuid="00000000-0000-0000-0000-0cc47a407c78"
smbios.version="2.7"
Comment 2 John Baldwin freebsd_committer freebsd_triage 2018-10-17 22:30:23 UTC
The panic was not in the C code, but in the BIOS code it called.  The page fault information doesn't make much sense though.  The 0xe8510 is a physical address of the BIOS function in question.  Can you do something like 'dd bs=1 if=/dev/mem iseek=0xe8510 count=32 | ndisasm -U' (have to install devel/nasm) to get the disassembly of the instruction that faulted?  It seems like the first instruction faulted which seems odd.
Comment 3 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-10-17 22:51:46 UTC
(In reply to John Baldwin from comment #2)

ndsiasm -u - (lower case u and - for stadin) ; I guessed is what you asked for.

root@rabbit4:~ # dd bs=1 if=/dev/mem iseek=0xe8510 count=32 | ndisasm -u -
32+0 records in
32+0 records out
32 bytes transferred in 0.001126 secs (28414 bytes/sec)
00000000  FF00              inc dword [eax]
00000002  0000              add [eax],al
00000004  0000              add [eax],al
00000006  0000              add [eax],al
00000008  0000              add [eax],al
0000000A  0000              add [eax],al
0000000C  0000              add [eax],al
0000000E  0000              add [eax],al
00000010  3D24504349        cmp eax,0x49435024
00000015  B080              mov al,0x80
00000017  752D              jnz 0x46
00000019  B081              mov al,0x81
0000001B  0ADB              or bl,bl
0000001D  7527              jnz 0x46
0000001F  E8                db 0xe8
Comment 4 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-10-17 22:56:19 UTC
In anticipation ... I've booted into the panic again:

APIC: CPU 7 has ACPI ID 7
Pentium Pro MTRR support enabled
bios32: Found BIOS32 Service Directory header at 0x4e8500
bios32: Entry = 0xe8510 (4e8510)  Rev = 0  Len = 1
stray irq1


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 02
fault virtual address   = 0x49435024
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0x4e8510
stack pointer           = 0x28:0x2423b68
frame pointer           = 0x28:0x2423ba0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = 0 ()
[ thread pid 0 tid 0 ]
Stopped at      0x4e8510
db> show reg
cs                0x20
ds                0x28
es                0x28
fs                 0x8
gs                0x3b
ss                0x28
eax         0x49435024
ecx                  0
edx                  0
ebx                  0
esp          0x2423b68
ebp          0x2423ba0
esi          0x2423bc4
edi          0x1129df7  counter_u64_alloc+0x27
eip           0x4e8510
efl            0x10002
0x4e8510
Comment 5 John Baldwin freebsd_committer freebsd_triage 2018-10-18 20:36:37 UTC
Hmm, the entry point seems wrong (off by 0x10).  Can you adjust the dd to read from the start of the structure (0xe8500) and extend the len by another 16 bytes and paste the same dd | ndisasm output?
Comment 6 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-10-18 21:52:27 UTC
(In reply to John Baldwin from comment #5)

root@rabbit4:~ # dd bs=1 if=/dev/mem iseek=0xe8500 count=48 | ndisasm -u -
48+0 records in
48+0 records out
48 bytes transferred in 0.001682 secs (28538 bytes/sec)
00000000  5F                pop edi
00000001  3332              xor esi,[edx]
00000003  5F                pop edi
00000004  10850E000001      adc [ebp+0x100000e],al
0000000A  3900              cmp [eax],eax
0000000C  0000              add [eax],al
0000000E  0000              add [eax],al
00000010  FF00              inc dword [eax]
00000012  0000              add [eax],al
00000014  0000              add [eax],al
00000016  0000              add [eax],al
00000018  0000              add [eax],al
0000001A  0000              add [eax],al
0000001C  0000              add [eax],al
0000001E  0000              add [eax],al
00000020  3D24504349        cmp eax,0x49435024
00000025  B080              mov al,0x80
00000027  752D              jnz 0x56
00000029  B081              mov al,0x81
0000002B  0ADB              or bl,bl
0000002D  7527              jnz 0x56
0000002F  E8                db 0xe8
Comment 7 John Baldwin freebsd_committer freebsd_triage 2018-10-22 16:36:23 UTC
So the table in the BIOS is just busted / incorrect in that it has the entry point at the wrong place (or the code at the wrong place).  There's not a lot we can do about that except that for 13+ we could perhaps require ACPI and retire PCI BIOS and PnP BIOS support code entirely on i386.
Comment 8 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-10-22 18:06:39 UTC
Thanks for looking into this John.

Good to know it's the BIOS and not FreeBSD.

At least the magic addresses and the tunable are documented in this PR now should anyone by accident run into a similar problem they'll hopefully be able to find this.

I'll keep   machdep.bios.pci=disable   set in loader.conf for the i386 installations I am testing.