Hi, cross compiled i386 from amd64, r339354 booted on rabbit4 in the netperf cluster. GDB: no debug ports presentl]... KDB: debugger backends: ddb.. KDB: current backend: ddbe modules! ---<<BOOT>>--- MP Configuration Table version 1.4 found at 0x4fd540r command prompt. Table 'FACP' at 0x7df408f0el] in 9 seconds... Table 'APIC' at 0x7df409e8 APIC: Found table at 0x7df409e8 APIC: Using the MADT enumerator. Copyright (c) 1992-2018 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.0-ALPHA9 r339354 GENERIC i386 FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1) WARNING: WITNESS option enabled, expect reduced performance. VT(vga): resolution 640x480 Preloaded elf kernel "/boot/kernel/kernel" at 0x23dd000. Table 'FACP' at 0x7df408f0 FACP: Found table at 0x7df408f0 Calibrating TSC clock ... TSC clock: 3500078580 Hz CPU: Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz (3500.08-MHz 686-class CPU) Origin="GenuineIntel" Id=0x306e4 Family=0x6 Model=0x3e Stepping=4 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x7fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND> AMD Features=0x2c100000<NX,Page1GB,RDTSCP,LM> AMD Features2=0x1<LAHF> Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS> XSAVE Features=0x1<XSAVEOPT> VT-x: Basic Features=0xda0400<SMM,INS/OUTS,TRUE> Pin-Based Controls=0xff<ExtINT,NMI,VNMI,PreTmr,PostIntr> Primary Processor Controls=0xfff9fffe<INTWIN,TSCOff,HLT,INVLPG,MWAIT,RDPMC,RDTSC,CR3-LD,CR3-ST,CR8-LD,CR8-ST,TPR,NMIWIN,MOV-DR,IO,IOmap,MTF,MSRmap,MONITOR,PAUSE> Secondary Processor Controls=0xfff<APIC,EPT,DT,RDTSCP,x2APIC,VPID,WBINVD,UG,APIC-reg,VID,PAUSE-loop,RDRAND> Exit Controls=0xda0400<PAT-LD,EFER-SV,PTMR-SV> Entry Controls=0xda0400 EPT Features=0x6134141<XO,PW4,UC,WB,2M,1G,INVEPT,single,all> VPID Features=0xf01<INVVPID,individual,single,all,single-globals> TSC: P-state invariant, performance statistics Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte pages, 4-way set associative, 4 entries Data TLB: 4 KB pages, 4-way set associative, 64 entries Instruction TLB: 2M/4M pages, fully associative, 8 entries Instruction TLB: 4KByte pages, 4-way set associative, 64 entries 64-Byte prefetching Shared 2nd-Level TLB: 4 KByte pages, 4-way associative, 512 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line real memory = 34368126976 (32776 MB) Physical memory chunk(s): 0x0000000000001000 - 0x0000000000099fff, 626688 bytes (153 pages) 0x0000000000100000 - 0x00000000007fffff, 7340032 bytes (1792 pages) 0x0000000002429000 - 0x000000007bb33fff, 2037428224 bytes (497419 pages) avail memory = 2034909184 (1940 MB) Table 'FACP' at 0x7df408f0 Table 'APIC' at 0x7df409e8 Table 'FPDT' at 0x7df40ab0 Table 'HPET' at 0x7df40af8 Table 'PRAD' at 0x7df40b30 Table 'SPMI' at 0x7df40bf0 Table 'SSDT' at 0x7df40c30 Table 'EINJ' at 0x7e008718 Table 'ERST' at 0x7e008848 Table 'HEST' at 0x7e008a78 Table 'BERT' at 0x7e008b20 Table 'DMAR' at 0x7e008b50 DMAR: Found table at 0x7e008b50 MADT: Found CPU APIC ID 2 ACPI ID 0: enabled SMP: Added CPU 2 (AP) MADT: Found CPU APIC ID 4 ACPI ID 2: enabled SMP: Added CPU 4 (AP) MADT: Found CPU APIC ID 6 ACPI ID 4: enabled SMP: Added CPU 6 (AP) MADT: Found CPU APIC ID 8 ACPI ID 6: enabled SMP: Added CPU 8 (AP) MADT: Found CPU APIC ID 3 ACPI ID 1: enabled SMP: Added CPU 3 (AP) MADT: Found CPU APIC ID 5 ACPI ID 3: enabled SMP: Added CPU 5 (AP) MADT: Found CPU APIC ID 7 ACPI ID 5: enabled SMP: Added CPU 7 (AP) MADT: Found CPU APIC ID 9 ACPI ID 7: enabled SMP: Added CPU 9 (AP) Event timer "LAPIC" quality 600 ACPI APIC Table: < > Package ID shift: 5 L3 cache ID shift: 5 L2 cache ID shift: 1 L1 cache ID shift: 1 Core ID shift: 1 INTR: Adding local APIC 4 as a target INTR: Adding local APIC 6 as a target INTR: Adding local APIC 8 as a target FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads Package HW ID = 0 Core HW ID = 1 CPU0 (BSP): APIC ID: 2 CPU1 (AP/HT): APIC ID: 3 Core HW ID = 2 CPU2 (AP): APIC ID: 4 CPU3 (AP/HT): APIC ID: 5 Core HW ID = 3 CPU4 (AP): APIC ID: 6 CPU5 (AP/HT): APIC ID: 7 Core HW ID = 4 CPU6 (AP): APIC ID: 8 CPU7 (AP/HT): APIC ID: 9 APIC: CPU 0 has ACPI ID 0 APIC: CPU 1 has ACPI ID 1 APIC: CPU 2 has ACPI ID 2 APIC: CPU 3 has ACPI ID 3 APIC: CPU 4 has ACPI ID 4 APIC: CPU 5 has ACPI ID 5 APIC: CPU 6 has ACPI ID 6 APIC: CPU 7 has ACPI ID 7 Pentium Pro MTRR support enabled bios32: Found BIOS32 Service Directory header at 0x4e8500 bios32: Entry = 0xe8510 (4e8510) Rev = 0 Len = 1 stray irq1 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 02 fault virtual address = 0x49435024 fault code = supervisor write, page not present instruction pointer = 0x20:0x4e8510 stack pointer = 0x28:0x2423b68 frame pointer = 0x28:0x2423ba0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 0 () [ thread pid 0 tid 0 ] Stopped at 0x4e8510 db> show pcpu cpuid = 0 dynamic pcpu = 0x6192c0 curthread = 0x2133820: pid 0 tid 0 "" curpcb = 0x2423c00 fpcurthread = none idlethread = none APIC ID = 2 currentldt = 0x50 trampstk = 0xffc07ff0 kesp0 = 0x2423bf0 common_tssp = 0xffc014c0 curvnet = 0 spin locks held: db> show thread Thread 0 at 0x2133820: proc (pid 0): 0x21334a8 stack: 0x2420000-0x2423fff flags: 0x4 pflags: 0 state: INACTIVE priority: 0 <..>
Is there anything we can (automagically) do to prevent this panic? This seems to come from: sys/i386/i386/bios.c:bios32_init() 104 if (bootverbose) { 105 printf("bios32: Found BIOS32 Service Directory header at %p\n", sdh); 106 printf("bios32: Entry = 0x%x (%x) Rev = %d Len = %d\n", 107 sdh->entry, bios32_SDCI, sdh->revision, sdh->len); 108 } 109 110 /* Allow user override of PCI BIOS search */ 111 if (((p = kern_getenv("machdep.bios.pci")) == NULL) || strcmp(p, "disable")) { 112 113 /* See if there's a PCI BIOS entrypoint here */ 114 PCIbios.ident.id = 0x49435024; /* PCI systems should have this */ ^^^^^^^ 115 if (!bios32_SDlookup(&PCIbios) && bootverbose) 116 printf("pcibios: PCI BIOS entry at 0x%x+0x%x\n", PCIbios.base, PCIbios.entry); 117 } 118 if (p != NULL) 119 freeenv(p); 120 } else { 121 printf("bios32: Bad BIOS32 Service Directory\n"); 122 } set machdep.bios.pci=disable in loader allows rabbit4 to boot. (some information from kenv on the system): smbios.bios.reldate="07/05/2013" smbios.bios.vendor="American Megatrends Inc." smbios.bios.version="3.00" smbios.chassis.maker="Supermicro" smbios.memory.enabled="33562624" smbios.planar.product="X9SRW-F" smbios.planar.serial="ZM148S031878" smbios.planar.version="1.02" smbios.socket.enabled="1" smbios.socket.populated="1" smbios.system.maker="iXsystems" smbios.system.product="1204S" smbios.system.serial="A1-35883" smbios.system.uuid="00000000-0000-0000-0000-0cc47a407c78" smbios.version="2.7"
The panic was not in the C code, but in the BIOS code it called. The page fault information doesn't make much sense though. The 0xe8510 is a physical address of the BIOS function in question. Can you do something like 'dd bs=1 if=/dev/mem iseek=0xe8510 count=32 | ndisasm -U' (have to install devel/nasm) to get the disassembly of the instruction that faulted? It seems like the first instruction faulted which seems odd.
(In reply to John Baldwin from comment #2) ndsiasm -u - (lower case u and - for stadin) ; I guessed is what you asked for. root@rabbit4:~ # dd bs=1 if=/dev/mem iseek=0xe8510 count=32 | ndisasm -u - 32+0 records in 32+0 records out 32 bytes transferred in 0.001126 secs (28414 bytes/sec) 00000000 FF00 inc dword [eax] 00000002 0000 add [eax],al 00000004 0000 add [eax],al 00000006 0000 add [eax],al 00000008 0000 add [eax],al 0000000A 0000 add [eax],al 0000000C 0000 add [eax],al 0000000E 0000 add [eax],al 00000010 3D24504349 cmp eax,0x49435024 00000015 B080 mov al,0x80 00000017 752D jnz 0x46 00000019 B081 mov al,0x81 0000001B 0ADB or bl,bl 0000001D 7527 jnz 0x46 0000001F E8 db 0xe8
In anticipation ... I've booted into the panic again: APIC: CPU 7 has ACPI ID 7 Pentium Pro MTRR support enabled bios32: Found BIOS32 Service Directory header at 0x4e8500 bios32: Entry = 0xe8510 (4e8510) Rev = 0 Len = 1 stray irq1 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 02 fault virtual address = 0x49435024 fault code = supervisor write, page not present instruction pointer = 0x20:0x4e8510 stack pointer = 0x28:0x2423b68 frame pointer = 0x28:0x2423ba0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 0 () [ thread pid 0 tid 0 ] Stopped at 0x4e8510 db> show reg cs 0x20 ds 0x28 es 0x28 fs 0x8 gs 0x3b ss 0x28 eax 0x49435024 ecx 0 edx 0 ebx 0 esp 0x2423b68 ebp 0x2423ba0 esi 0x2423bc4 edi 0x1129df7 counter_u64_alloc+0x27 eip 0x4e8510 efl 0x10002 0x4e8510
Hmm, the entry point seems wrong (off by 0x10). Can you adjust the dd to read from the start of the structure (0xe8500) and extend the len by another 16 bytes and paste the same dd | ndisasm output?
(In reply to John Baldwin from comment #5) root@rabbit4:~ # dd bs=1 if=/dev/mem iseek=0xe8500 count=48 | ndisasm -u - 48+0 records in 48+0 records out 48 bytes transferred in 0.001682 secs (28538 bytes/sec) 00000000 5F pop edi 00000001 3332 xor esi,[edx] 00000003 5F pop edi 00000004 10850E000001 adc [ebp+0x100000e],al 0000000A 3900 cmp [eax],eax 0000000C 0000 add [eax],al 0000000E 0000 add [eax],al 00000010 FF00 inc dword [eax] 00000012 0000 add [eax],al 00000014 0000 add [eax],al 00000016 0000 add [eax],al 00000018 0000 add [eax],al 0000001A 0000 add [eax],al 0000001C 0000 add [eax],al 0000001E 0000 add [eax],al 00000020 3D24504349 cmp eax,0x49435024 00000025 B080 mov al,0x80 00000027 752D jnz 0x56 00000029 B081 mov al,0x81 0000002B 0ADB or bl,bl 0000002D 7527 jnz 0x56 0000002F E8 db 0xe8
So the table in the BIOS is just busted / incorrect in that it has the entry point at the wrong place (or the code at the wrong place). There's not a lot we can do about that except that for 13+ we could perhaps require ACPI and retire PCI BIOS and PnP BIOS support code entirely on i386.
Thanks for looking into this John. Good to know it's the BIOS and not FreeBSD. At least the magic addresses and the tunable are documented in this PR now should anyone by accident run into a similar problem they'll hopefully be able to find this. I'll keep machdep.bios.pci=disable set in loader.conf for the i386 installations I am testing.