Bug 225450

Summary: 11.1-* panics on AMD Opteron 2k due to EARLY_AP_STARTUP
Product: Base System Reporter: Pablo Ruiz <pablo.ruiz>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed DUPLICATE    
Severity: Affects Many People CC: cem, emaste, jhb, mail, pablo.ruiz
Priority: --- Keywords: patch
Version: 11.1-STABLE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
verbose output of a kernel w/o EARLY_AP_STARTUP none

Description Pablo Ruiz 2018-01-25 15:12:07 UTC
The EARLY_AP_STARTUP kernel option causes kernel panics on AMD Opteron 2xxxSE CPUs. Including SUN X4100, X4200, etc.

Commenting EARLY_AP_STARTUP while building kernel allows boot. Safe mode boot works too because it disables smp.

An svn bisect was performed for 11-stable and the addition of EARLY_AP_STARTUP 
 at r318763 seems to be the culprit.

This has been reproduced with 11.1-RELEASE, 11.1-RELEASE-p1, 11.1-RELEASE-p2, 11.1-RELEASE-p4 & 11.1-RELEASE-p6. Meanwhile FreeBSD 10 works/boots fine.

Example boot output with failing kernel (includes garbage):

/boot/kernel.old/kernel text=0x14972f8 data=0x1384c0+0x4c15e8 
syms=[0x8+0x15e8b0+0x8+0x178422]
/boot/entropy size=0x1000
Booting...
Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.1-RELEASE #0 r321309: Fri Jul 21 02:08:28 UTC 2017
    r...@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 
4.0.0)
VT(vga): resolution 640x480
CPU: Dual-Core AMD Opteron(tm) Processor 2218 (2593.16-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x40f12  Family=0xf  Model=0x41  Stepping=2
  
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x2001<SSE3,CX16>
  AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
  SVM: NAsids=64
real memory  = 4563402752 (4352 MB)
avail memory = 4104478720 (3914 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <SUN    X4200 M2>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
random: unblocking device.
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 
(20170303/tbfadt-748)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 128/64 
(20170303/tbfadt-748)
ioapic1: Changing APIC ID to 16
ioapic2: Changing APIC ID to 17
ioapic3 <Version 1.1> irqs 24-47 on motherboard
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 48-54 on motherboard
ioapic2 <Version 1.1> irqs 56-62 on motherboard
SMP: AP CPU

SMP: AP CPU
k


F
 aF
aFat



F
 aF
kkernel trap 12 with interrupts disabþÿÿÿÿ�ÿÿÿÿÿÿÿÿkernel trap 
12 with interrupts disabled


Fatal trap 60276736: UNÿÿkernel trap 12 with interrupts disabled


iatal trap -2130508367: UNKNOWN whilpanierc: staeck ovlerflowt detrected; 
backt1race mawy bei corrup ted
 cpuid t= 1
KrDB: stack bupackttrace:
#0 0xffffffffs80aada97 at lkdb_backtrace+0x67
xffffffff80a6bb76 aFat vpanic+0x1l86
 #r2 0xfpffffff1f80a6b9e3 at  panipc+0x43g
#3e 0xffffaffff80a9b072u at __tstackw_chkh_lfail+0x12
 i#4 0xffffff kff80eab3f2b eat vprintf+0 x10b
000 atcp dmapbase+0x397c000
Upt =ime: 12s
Au;toma tic rebpoot inc 15 secondsi - press a key on 0the con
                                                             sole fto abort
-u-> Press a key on the iconsroule to reboot ,
-a-> or drswitech off   the =system xnow.
ke¡ÿÿÿÿÿÿÿÿkernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cð
c

---------------------------------------------------------------------------
Comment 1 John Baldwin freebsd_committer freebsd_triage 2018-01-26 19:39:38 UTC
Can you build a kernel with MAXCPU set to 2?  That will reduce the garbage by only having 1 AP startup so the panic stack trace is cleaner.
Comment 2 Pablo Ruiz 2018-01-26 23:15:45 UTC
I've built a kernel with MAXCPU=2, and funnily enough it boots ok.. it only launches CPU#1 (along with #0), but it works ok..

This system has two physical CPUs, each with two cores. Dunno if this may be related.. 

Here is the output:

-----------------------------------
OK boot kernel.test
/boot/kernel.test/kernel text=0x16f1f28 data=0xb66738+0x34cc58 syms=[0x8+0x1898e8+0x8+0x18a675]
Booting...
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.1-RELEASE-p6 #0 r313908+b1692e611a8(TEST): Fri Jan 26 22:52:47 UTC 2018
    root@pfsense-builder:/usr/src/kernel/tmp/obj/usr/src/kernel/tmp/FreeBSD-src/sys/test amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)
VT(vga): resolution 640x480
CPU: Dual-Core AMD Opteron(tm) Processor 2222 SE (2992.11-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x40f13  Family=0xf  Model=0x41  Stepping=3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x2001<SSE3,CX16>
  AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
  SVM: NAsids=64
real memory  = 6442450944 (6144 MB)
avail memory = 6190903296 (5904 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <SUN    X4200 M2>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
FreeBSD/SMP Online: 1 package(s) x 2 core(s)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20170303/tbfadt-748)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 128/64 (20170303/tbfadt-748)
ioapic1: Changing APIC ID to 16
ioapic2: Changing APIC ID to 17
ioapic3 <Version 1.1> irqs 24-47 on motherboard
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 48-54 on motherboard
ioapic2 <Version 1.1> irqs 56-62 on motherboard
SMP: AP CPU #1 Launched!
random: entropy device external interface
wlan: mac acl policy registered
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff81155cc0, 0) error 19
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
padlock0: No ACE support.
acpi0: <SUN X4200 M2> on motherboard
acpi0: Power Button (fixed)
unknown: I/O range not supported
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed00fff on acpi0
Timecounter "HPET" frequency 25000000 Hz quality 950
hpet1: <High Precision Event Timer> iomem 0xfed10000-0xfed10fff on acpi0
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x2008-0x200b on acpi0
pcib0: <ACPI Host-PCI bridge> on acpi0
pcib0: _OSC returned error 0x10
pci0: <ACPI PCI bus> on pcib0
pci0: <memory> at device 0.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
ohci0: <nVidia nForce CK804 USB Controller> mem 0xfe3ff000-0xfe3fffff irq 20 at device 2.0 on pci0
usbus0 on ohci0
usbus0: 12Mbps Full Speed USB v1.0
ehci0: <NVIDIA nForce CK804 USB 2.0 controller> mem 0xfe3fec00-0xfe3fecff irq 21 at device 2.1 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
usbus1: 480Mbps High Speed USB v2.0
atapci0: <nVidia nForce CK804 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x9100-0x910f at device 6.0 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
pcib1: <ACPI PCI-PCI bridge> at device 9.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> port 0xa800-0xa8ff mem 0xfc000000-0xfcffffff,0xfdbff000-0xfdbfffff irq 16 at device 3.0 on pci1
vgapci0: Boot video device
nfe0: <NVIDIA nForce4 CK804 MCP9 Networking Adapter> port 0xdc00-0xdc07 mem 0xfe3fd000-0xfe3fdfff irq 22 at device 10.0 on pci0
miibus0: <MII bus> on nfe0
e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 1 on miibus0
e1000phy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseSX, 1000baseSX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
nfe0: Using defaults for TSO: 65518/35/2048
nfe0: Ethernet address: 00:14:4f:e5:3a:3c
pcib2: <ACPI PCI-PCI bridge> at device 11.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 12.0 on pci0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 13.0 on pci0
pci4: <ACPI PCI bus> on pcib4
em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0xbc00-0xbc1f mem 0xfdde0000-0xfddfffff,0xfddc0000-0xfdddffff irq 17 at device 0.0 on pci4
em0: Using an MSI interrupt
em0: Ethernet address: 00:15:17:c3:df:7c
em0: netmap queues/slots: TX 1/4096, RX 1/4096
em1: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0xb800-0xb81f mem 0xfdd80000-0xfdd9ffff,0xfdd60000-0xfdd7ffff irq 16 at device 0.1 on pci4
em1: Using an MSI interrupt
em1: Ethernet address: 00:15:17:c3:df:7d
em1: netmap queues/slots: TX 1/4096, RX 1/4096
pcib5: <ACPI PCI-PCI bridge> at device 14.0 on pci0
pci5: <ACPI PCI bus> on pcib5
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xcc00-0xcc1f mem 0xfe280000-0xfe2fffff,0xfe27c000-0xfe27ffff irq 18 at device 0.0 on pci5
ix0: Using MSIX interrupts with 3 vectors
ix0: Ethernet address: 00:1b:21:bd:63:8c
ix0: PCI Express Bus: Speed 2.5GT/s Width x8
ix0: Error 2 setting up SR-IOV
ix0: netmap queues/slots: TX 2/2048, RX 2/2048
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port 0xc800-0xc81f mem 0xfe100000-0xfe17ffff,0xfe278000-0xfe27bfff irq 17 at device 0.1 on pci5
ix1: Using MSIX interrupts with 3 vectors
ix1: Ethernet address: 00:1b:21:bd:63:8e
ix1: PCI Express Bus: Speed 2.5GT/s Width x8
ix1: Error 2 setting up SR-IOV
ix1: netmap queues/slots: TX 2/2048, RX 2/2048
pcib6: <ACPI Host-PCI bridge> on acpi0
pcib6: _OSC returned error 0x10
pci6: <ACPI PCI bus> on pcib6
pci6: <memory> at device 0.0 (no driver attached)
pci6: <memory> at device 1.0 (no driver attached)
nfe1: <NVIDIA nForce4 CK804 MCP9 Networking Adapter> port 0xfc00-0xfc07 mem 0xfeafe000-0xfeafefff irq 44 at device 10.0 on pci6
miibus1: <MII bus> on nfe1
e1000phy1: <Marvell 88E1111 Gigabit PHY> PHY 1 on miibus1
e1000phy1:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseSX, 1000baseSX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
nfe1: Using defaults for TSO: 65518/35/2048
nfe1: Ethernet address: 00:14:4f:e5:3a:3d
pcib7: <ACPI PCI-PCI bridge> at device 11.0 on pci6
pci7: <ACPI PCI bus> on pcib7
pcib8: <ACPI PCI-PCI bridge> at device 12.0 on pci6
pci8: <ACPI PCI bus> on pcib8
pcib9: <ACPI PCI-PCI bridge> at device 13.0 on pci6
pci9: <ACPI PCI bus> on pcib9
pcib10: <ACPI PCI-PCI bridge> at device 14.0 on pci6
pci10: <ACPI PCI bus> on pcib10
pcib11: <ACPI PCI-PCI bridge> at device 16.0 on pci6
pci11: <ACPI PCI bus> on pcib11
pcib12: <ACPI PCI-PCI bridge> at device 17.0 on pci6
pci12: <ACPI PCI bus> on pcib12
em2: <Intel(R) PRO/1000 Legacy Network Connection 1.1.0> port 0xec00-0xec3f mem 0xfe9e0000-0xfe9fffff irq 56 at device 1.0 on pci12
em2: Ethernet address: 00:14:4f:e5:3a:3e
em2: netmap queues/slots: TX 1/4096, RX 1/4096
em3: <Intel(R) PRO/1000 Legacy Network Connection 1.1.0> port 0xe800-0xe83f mem 0xfe9c0000-0xfe9dffff irq 57 at device 1.1 on pci12
em3: Ethernet address: 00:14:4f:e5:3a:3f
em3: netmap queues/slots: TX 1/4096, RX 1/4096
mpt0: <LSILogic SAS/SATA Adapter> port 0xe400-0xe4ff mem 0xfe9bc000-0xfe9bffff,0xfe9a0000-0xfe9affff irq 58 at device 2.0 on pci12
mpt0: MPI Version=1.5.16.0
mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 )
mpt0: 1 Active Volume (2 Max)
mpt0: 2 Hidden Drive Members (14 Max)
mpt0: mpt_read_cfg_page: Config Info Status 22
mpt0:vol0(mpt0:0:0): mpt_refresh_raid_vol: Failed to read RAID Vol Page(0)
mpt0:vol0(mpt0:0:0): Settings ( )
mpt0:vol0(mpt0:0:0): 0 Members:
mpt0:vol0(mpt0:0:0): RAID-0 - Optimal
(mpt0:0:4): Physical (mpt0:0:4:0), Pass-thru (mpt0:1:0:0)
(mpt0:0:4): Online
acpi_button0: (mpt0:0:3): <Power Button>Physical (mpt0:0:3:0), Pass-thru (mpt0:1:1:0)
(mpt0:0:3): Online
 on acpi0
(noperiph:mpt0:1:-1:ffffffff): rescan already queued
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: console (115200,n,8,1)
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc9fff,0xca000-0xcb7ff,0xcb800-0xcc7ff,0xcc800-0xcd7ff,0xcd800-0xce7ff,0xce800-0xcf7ff,0xd5800-0xd67ff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
ppc0: cannot reserve I/O port range
powernow0: <PowerNow! K8> on cpu0
powernow1: <PowerNow! K8> on cpu1
Timecounters tick every 1.000 msec
nvme cam probe device init
ugen0.1: <nVidia OHCI root HUB> at usbus0
ugen1.1: <nVidia EHCI root HUB> at usbus1
uhub0: <nVidia OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
uhub1: <nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
da0 at mpt0 bus 0 scbus2 target 2 lun 0
da0: <LSILOGIC Logical Volume 3000> Fixed Direct Access SCSI-2 device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 69618MB (142577664 512 byte sectors)
cd0 at ata0 bus 0 scbus0 target 0 lun 0
cd0: <TEAC DW-224SL-R 1.0B> Removable CD-ROM SCSI device
cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
Trying to mount root from ufs:/dev/ufsid/55953f58932667bb [rw]...
Configuring crash dumps...
No suitable dump device was found.
/dev/ufsid/55953f58932667bb: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufsid/55953f58932667bb: clean, 15583596 free (7260 frags, 1947042 blocks, 0.0% fragmentation)
Comment 3 Pablo Ruiz 2018-01-26 23:44:41 UTC
Hi,

I've tried with MAXCPU=3 & MAXCPU=4.. 

With MAXCPU=3, the system boots ok too:

------------------------------------------------------------
OK boot kernel.test
/boot/kernel.test/kernel text=0x16f2208 data=0xb66730+0x34e428 syms=[0x8+0x1898e8+0x8+0x18a675]
Booting...
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.1-RELEASE-p6 #1 r313908+e3535340e0f(RELENG_2_4-EVI): Fri Jan 26 23:22:34 UTC 2018
    root@builder: /usr/src/kernel/tmp/obj/usr/src/kernel/tmp/FreeBSD-src/sys/test  amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)
VT(vga): resolution 640x480
CPU: Dual-Core AMD Opteron(tm) Processor 2222 SE (2992.11-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x40f13  Family=0xf  Model=0x41  Stepping=3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x2001<SSE3,CX16>
  AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
  SVM: NAsids=64
real memory  = 6442450944 (6144 MB)
avail memory = 6190874624 (5904 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <SUN    X4200 M2>
FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
FreeBSD/SMP Online: Non-uniform topology
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20170303/tbfadt-748)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 128/64 (20170303/tbfadt-748)
ioapic1: Changing APIC ID to 16
ioapic2: Changing APIC ID to 17
ioapic3 <Version 1.1> irqs 24-47 on motherboard
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 48-54 on motherboard
ioapic2 <Version 1.1> irqs 56-62 on motherboard
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
random: entropy device external interface
wlan: mac acl policy registered
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff81155ef0, 0) error 19
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
padlock0: No ACE support.
acpi0: <SUN X4200 M2> on motherboard
acpi0: Power Button (fixed)
unknown: I/O range not supported
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
...


While with MAXCPU=4 it fails:


---------------------------------------------------------
OK boot kernel.test
/boot/kernel.test/kernel text=0x16f3108 data=0xb66730+0x34fc98 syms=[0x8+0x1898e8+0x8+0x18a675]
Booting...
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.1-RELEASE-p6 #2 r313908+cac12d01a89(RELENG_2_4-EVI): Fri Jan 26 23:35:27 UTC 2018
    root@pfsense-builder: /usr/src/kernel/tmp/obj/usr/src/kernel/tmp/FreeBSD-src/sys/test amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)
VT(vga): resolution 640x480
CPU: Dual-Core AMD Opteron(tm) Processor 2222 SE (2992.11-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x40f13  Family=0xf  Model=0x41  Stepping=3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x2001<SSE3,CX16>
  AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
  SVM: NAsids=64
real memory  = 6442450944 (6144 MB)
avail memory = 6190829568 (5904 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <SUN    X4200 M2>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20170303/tbfadt-748)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 128/64 (20170303/tbfadt-748)
ioapic1: Changing APIC ID to 16
ioapic2: Changing APIC ID to 17
ioapic3 <Version 1.1> irqs 24-47 on motherboard
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 48-54 on motherboard
ioapic2 <Version 1.1> irqs 56-62 on motherboard
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched
Fa
Fa
Fa
Fa

...
Comment 4 John Baldwin freebsd_committer freebsd_triage 2018-01-27 00:30:01 UTC
Can you get boot -v output?
Comment 5 Pablo Ruiz 2018-01-27 15:56:46 UTC
Sure,

Here's a "boot -v" for a MAXCPU=4 kernel:

-------------------------
OK boot kernel.test -v
/boot/kernel.test/kernel text=0x16f3108 data=0xb66730+0x34fc98 syms=[0x8+0x1898e8+0x8+0x18a675]
Booting...
KDB: debugger backends: ddb
KDB: current backend: ddb
Table 'FACP' at 0xdbfb0290
Table 'APIC' at 0xdbfb0390
APIC: Found table at 0xdbfb0390
APIC: Using the MADT enumerator.
MADT: Found CPU APIC ID 0 ACPI ID 1: enabled
SMP: Added CPU 0 (AP)
MADT: Found CPU APIC ID 1 ACPI ID 2: enabled
SMP: Added CPU 1 (AP)
MADT: Found CPU APIC ID 2 ACPI ID 3: enabled
SMP: Added CPU 2 (AP)
MADT: Found CPU APIC ID 3 ACPI ID 4: enabled
SMP: Added CPU 3 (AP)
MADT: Found CPU APIC ID 132 ACPI ID 5: disabled
MADT: Found CPU APIC ID 133 ACPI ID 6: disabled
MADT: Found CPU APIC ID 134 ACPI ID 7: disabled
MADT: Found CPU APIC ID 135 ACPI ID 8: disabled
Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.1-RELEASE-p6 #2 r313908+cac12d01a89(RELENG_2_4-EVI): Fri Jan 26 23:35:27 UTC 2018
    root@builder:/usr/src/kernel/tmp/obj/usr/src/FreeBSD-src/sys/test amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)
Table 'FACP' at 0xdbfb0290
Table 'APIC' at 0xdbfb0390
Table 'SPCR' at 0xdbfb0440
Table 'SLIT' at 0xdbfb0490
Table 'SPMI' at 0xdbfb04c0
Table 'OEMB' at 0xdbfbe040
Table 'SRAT' at 0xdbfb7540
SRAT: Found table at 0xdbfb7540
SRAT: Found CPU APIC ID 0 domain 0: enabled
SRAT: Found CPU APIC ID 1 domain 0: enabled
SRAT: Found memory domain 0 addr 0x0 len 0xa0000: enabled
SRAT: Found memory domain 0 addr 0x100000 len 0xbff00000: enabled
SRAT: Found CPU APIC ID 2 domain 1: enabled
SRAT: Found CPU APIC ID 3 domain 1: enabled
SRAT: Found memory domain 1 addr 0xc0000000 len 0x1c000000: enabled
SRAT: Found memory domain 1 addr 0x100000000 len 0xa4000000: enabled
Table 'FACP' at 0xdbfb0290
Table 'APIC' at 0xdbfb0390
Table 'SPCR' at 0xdbfb0440
Table 'SLIT' at 0xdbfb0490
SLIT: Found table at 0xdbfb0490
SLIT.Localities: 2
0: 10 10
1: 10 10
PPIM 0: PA=0xa0000, VA=0xffffffff82e10000, size=0x10000, mode=0
VT(vga): resolution 640x480
Preloaded elf kernel "/boot/kernel.test/kernel" at 0xffffffff82cbe000.
Calibrating TSC clock ... TSC clock: 2992117590 Hz
CPU: Dual-Core AMD Opteron(tm) Processor 2222 SE (2992.12-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x40f13  Family=0xf  Model=0x41  Stepping=3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x2001<SSE3,CX16>
  AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
  SVM: Features=0x0
Revision=1, ASIDs=64
L1 2MB data TLB: 8 entries, fully associative
L1 2MB instruction TLB: 8 entries, fully associative
L1 4KB data TLB: 32 entries, fully associative
L1 4KB instruction TLB: 32 entries, fully associative
L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L2 2MB unified TLB: 0 entries, disabled/not present
L2 4KB data TLB: 512 entries, 4-way associative
L2 4KB instruction TLB: 512 entries, 4-way associative
L2 unified cache: 1024 kbytes, 64 bytes/line, 1 lines/tag, 16-way associative
real memory  = 6442450944 (6144 MB)
Physical memory chunk(s):
0x0000000000010000 - 0x0000000000097fff, 557056 bytes (136 pages)
0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
0x0000000002d0d000 - 0x00000000d1422fff, 3463536640 bytes (845590 pages)
0x00000000dbfae000 - 0x00000000dbfaffff, 8192 bytes (2 pages)
0x0000000100000000 - 0x00000001a3fe7fff, 2751365120 bytes (671720 pages)
avail memory = 6190829568 (5904 MB)
Event timer "LAPIC" quality 100
LAPIC: ipi_wait() us multiplier 72 (r 4115237 tsc 2992117590)
ACPI APIC Table: <SUN    X4200 M2>
Package ID shift: 1
L2 cache ID shift: 0
L1 cache ID shift: 0
Core ID shift: 0
INTR: Adding local APIC 1 as a target
INTR: Adding local APIC 2 as a target
INTR: Adding local APIC 3 as a target
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
Package HW ID = 0 (0)
	Core HW ID = 0 (0)
		CPU0 (BSP): APIC ID: 0 (0)
	Core HW ID = 1 (0x1)
		CPU1 (AP): APIC ID: 1 (0x1)
Package HW ID = 1 (0x1)
	Core HW ID = 2 (0x2)
		CPU2 (AP): APIC ID: 2 (0x2)
	Core HW ID = 3 (0x3)
		CPU3 (AP): APIC ID: 3 (0x3)
APIC: CPU 0 has ACPI ID 1
APIC: CPU 1 has ACPI ID 2
APIC: CPU 2 has ACPI ID 3
APIC: CPU 3 has ACPI ID 4
SRAT: CPU 0 has memory domain 0
SRAT: CPU 1 has memory domain 0
SRAT: CPU 2 has memory domain 1
SRAT: CPU 3 has memory domain 1
x86bios:  IVT 0x000000-0x0004ff at 0xfffff80000000000
x86bios: SSEG 0x090000-0x090fff at 0xfffffe001b990000
x86bios: EBDA 0x09b000-0x09ffff at 0xfffff8000009b000
x86bios:  ROM 0x0a0000-0x0fefff at 0xfffff800000a0000
Pentium Pro MTRR support enabled
ULE: setup cpu 0
ULE: setup cpu 1
ULE: setup cpu 2
ULE: setup cpu 3
ACPI: RSDP 0x00000000000F9C10 000024 (v02 SUN   )
ACPI: XSDT 0x00000000DBFB0100 000094 (v01 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI: FACP 0x00000000DBFB0290 0000F4 (v03 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20170303/tbfadt-748)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 128/64 (20170303/tbfadt-748)
ACPI: DSDT 0x00000000DBFB0510 00702A (v01 SUN    X4200 M1 00000081 INTL 20050624)
ACPI: FACS 0x00000000DBFBE000 000040
ACPI: FACS 0x00000000DBFBE000 000040
ACPI: APIC 0x00000000DBFB0390 0000B0 (v01 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI: SPCR 0x00000000DBFB0440 000050 (v01 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI: SLIT 0x00000000DBFB0490 000030 (v01 SUN    OEMSLIT  00000081 MSFT 00000097)
ACPI: SPMI 0x00000000DBFB04C0 000041 (v05 SUN    OEMSPMI  00000081 MSFT 00000097)
ACPI: OEMB 0x00000000DBFBE040 000063 (v01 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI: SRAT 0x00000000DBFB7540 000110 (v01 AMD    HAMMER   00000001 AMD  00000001)
ACPI: HPET 0x00000000DBFB7650 000038 (v01 SUN    OEMHPET0 00000081 MSFT 00000097)
ACPI: IPET 0x00000000DBFB7690 000038 (v01 SUN    OEMHPET1 00000081 MSFT 00000097)
ACPI: EINJ 0x00000000DBFB76D0 000130 (v01 AMIER  AMI_EINJ 06000827 MSFT 00000097)
ACPI: BERT 0x00000000DBFB7860 000030 (v01 AMIER  AMI_BERT 06000827 MSFT 00000097)
ACPI: ERST 0x00000000DBFB7890 0001B0 (v01 AMIER  AMI_ERST 06000827 MSFT 00000097)
ACPI: HEST 0x00000000DBFB7A40 0000A8 (v01 AMIER  AMI_HEST 06000827 MSFT 00000097)
ACPI: SSDT 0x00000000DBFB7AF0 0005F8 (v01 A M I  POWERNOW 00000001 AMD  00000001)
MADT: Found IO APIC ID 15, Interrupt 0 at 0xfec00000
ioapic0: ver 0x11 maxredir 0x17
ioapic0: Routing external 8259A's -> intpin 0
MADT: Found IO APIC ID 16, Interrupt 48 at 0xfeafd000
ioapic1: Changing APIC ID to 16
ioapic1: WARNING: intbase 48 != expected base r24
ioapic1: ver 0x11 maxredir 0x06
MADT: Found IO APIC ID 17, Interrupt 56 at 0xfeafc000
ioapic2: Changing APIC ID to 17
ioapic2: WARNING: intbase 56 != expected base r55
ioapic2: ver 0x11 maxredir 0x06
MADT: Found IO APIC ID 14, Interrupt 24 at 0xfeaff000
ioapic3: WARNING: intbase 24 != expected base r63
ioapic3: ver 0x11 maxredir 0x17
MADT: Interrupt override: source 0, irq 2
ioapic0: Routing IRQ 0 -> intpin 2
MADT: Interrupt override: source 9, irq 9
ioapic0: intpin 9 trigger: level
ioapic3 <Version 1.1> irqs 24-47 on motherboard
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 48-54 on motherboard
ioapic2 <Version 1.1> irqs 56-62 on motherboard
cpu0 BSP:
     ID: 0x00000000   VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff
  lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
  timer: 0x000100ef therm: 0x00010000 err: 0x000000f0 pmc: 0x00010400
   AMD ext features: 0x00010003
   AMD elvt0: 0x00010000
SMP: AP CPU #1 Launched!
cpu1 AP:

Fatal k
vaketkpl andic:ou blse tfaacuklr tno
 eerl riflpo w dr= t0exctedaf;f bfpac ktrfafcdfe wma10yc 4b8e0 4i
frorhspr r= upt i0netxdf
0s1b KdisD97B:8 e0nt0er:0
pbrniec

[ =
    0
0 0faaffadl ep t001 r1abp97 t0d0:  10a0g0
   5acupulid
              =wSto1p; iadl ep ictn kide rnel= m od0e1

pa n ipc: doubi dle  dau 3b;lenta
ptcip+u idix3bd = 03
 aUult vortume: 1sl q	adRdr$0
xkdsb_	w=h y0
Comment 6 Pablo Ruiz 2018-01-27 15:57:43 UTC
Created attachment 190117 [details]
verbose output of a kernel w/o EARLY_AP_STARTUP

For reference, attached is the verbose output of a kernel w/o EARLY_AP_STARTUP from this same system.
Comment 7 Pablo Ruiz 2018-02-01 02:50:19 UTC
Hi guys,

Did you need anything else from my side? May I help with some additional diagnosis?

Best Regards
Comment 8 John Baldwin freebsd_committer freebsd_triage 2018-02-01 19:12:57 UTC
So it looks like the panic is a double fault.  Please try this hack patch to see if it cleans up the printfs:

Index: amd64/amd64/trap.c
===================================================================
--- amd64/amd64/trap.c  (revision 328557)
+++ amd64/amd64/trap.c  (working copy)
@@ -830,6 +830,11 @@
 void
 dblfault_handler(struct trapframe *frame)
 {
+
+       static int dblflt_lock = 0;
+
+       while (!atomic_cmpset_int(&dblflt_lock, 0, 1))
+               cpu_spinwait();
 #ifdef KDTRACE_HOOKS
        if (dtrace_doubletrap_func != NULL)
                (*dtrace_doubletrap_func)();

It won't fix the panic, but hopefully only one CPU will print out the messages so we can debug this further.
Comment 9 Pablo Ruiz 2018-02-04 01:27:03 UTC
Hi John,

I've tried your patch, but even after trying to boot quite a few times, I got no stacktrace. All I got was:

[...]
MADT: Found IO APIC ID 17, Interrupt 56 at 0xfeafc000
ioapic2: Changing APIC ID to 17
ioapic2: WARNING: intbase 56 != expected base r55
ioapic2: ver 0x11 maxredir 0x06
MADT: Found IO APIC ID 14, Interrupt 24 at 0xfeaff000
ioapic3: WARNING: intbase 24 != expected base r63
ioapic3: ver 0x11 maxredir 0x17
MADT: Interrupt override: source 0, irq 2
ioapic0: Routing IRQ 0 -> intpin 2
MADT: Interrupt override: source 9, irq 9
ioapic0: intpin 9 trigger: level
ioapic3 <Version 1.1> irqs 24-47 on motherboard
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 48-54 on motherboard
ioapic2 <Version 1.1> irqs 56-62 on motherboard
cpu0 BSP:
     ID: 0x00000000   VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff
  lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
  timer: 0x000100ef therm: 0x00010000 err: 0x000000f0 pmc: 0x00010400
   AMD ext features: 0x00010003
   AMD elvt0: 0x00010000
SMP: AP CPU #1 Launched!
cpu1 AP:




Some times it get to the point of printing 'Fa', or just ends up there w/o showing anything else.. :(
Comment 10 Pablo Ruiz 2018-02-09 23:17:11 UTC
Hi,

Is there anything I can help with this weekend? :)
Comment 11 John Baldwin freebsd_committer freebsd_triage 2018-02-10 23:59:22 UTC
Hmm, I don't know why the previous simple lock didn't help.  One other possible thing to try is placing 'while (1);' infinite loop in the init_secondary_tail() function in sys/x86/x86/mp_x86.c and moving it around in the function to narrow down when the APs are triggering the double fault (which is a stack overflow).  If you put the while(1) before the smp_cpus++; the failure mode you should see if the AP doesn't fault is a 'panic AP #x failed to start'.  After the smp_cpus++ line you should at least no longer get the double fault panic if you haven't hit the double fault yet.

Another thought is that it might be there is a missing MFC in 11 related to one or more kthreads starting too early.  You could perhaps build a kernel with:

options KTR_COMPILE=KTR_PROC
options KTR_MASK=KTR_PROC
options KTR_VERBOSE

And see what messages are logged before the crash (to see if the APs are starting to run other kthreads besides the idle thread).
Comment 12 Pablo Ruiz 2018-02-11 01:47:59 UTC
Hi,

Here is the output from building with:

options KTR_COMPILE=KTR_PROC
options KTR_MASK=KTR_PROC
options KTR_VERBOSE

...

OK boot kernel.test -v
/boot/kernel.test/kernel text=0x16f3170 data=0xb66748+0x34fc98 syms=[0x8+0x189900+0x8+0x18a692]
Booting...
KDB: debugger backends: ddb
KDB: current backend: ddb
Table 'FACP' at 0xdbfb0290
Table 'APIC' at 0xdbfb0390
APIC: Found table at 0xdbfb0390
APIC: Using the MADT enumerator.
MADT: Found CPU APIC ID 0 ACPI ID 1: enabled
SMP: Added CPU 0 (AP)
MADT: Found CPU APIC ID 1 ACPI ID 2: enabled
SMP: Added CPU 1 (AP)
MADT: Found CPU APIC ID 2 ACPI ID 3: enabled
SMP: Added CPU 2 (AP)
MADT: Found CPU APIC ID 3 ACPI ID 4: enabled
SMP: Added CPU 3 (AP)
MADT: Found CPU APIC ID 132 ACPI ID 5: disabled
MADT: Found CPU APIC ID 133 ACPI ID 6: disabled
MADT: Found CPU APIC ID 134 ACPI ID 7: disabled
MADT: Found CPU APIC ID 135 ACPI ID 8: disabled
Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.1-RELEASE-p6 #4 r313908+a3379502983(RELENG_2_4-EVI): Sun Feb 11 01:38:39 UTC 2018
    root@pfsense-builder:/usr/src/pfsense/tmp/obj/usr/src/pfsense/tmp/FreeBSD-src/sys/pfTest amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)
Table 'FACP' at 0xdbfb0290
Table 'APIC' at 0xdbfb0390
Table 'SPCR' at 0xdbfb0440
Table 'SLIT' at 0xdbfb0490
Table 'SPMI' at 0xdbfb04c0
Table 'OEMB' at 0xdbfbe040
Table 'SRAT' at 0xdbfb7540
SRAT: Found table at 0xdbfb7540
SRAT: Found CPU APIC ID 0 domain 0: enabled
SRAT: Found CPU APIC ID 1 domain 0: enabled
SRAT: Found memory domain 0 addr 0x0 len 0xa0000: enabled
SRAT: Found memory domain 0 addr 0x100000 len 0xbff00000: enabled
SRAT: Found CPU APIC ID 2 domain 1: enabled
SRAT: Found CPU APIC ID 3 domain 1: enabled
SRAT: Found memory domain 1 addr 0xc0000000 len 0x1c000000: enabled
SRAT: Found memory domain 1 addr 0x100000000 len 0xa4000000: enabled
Table 'FACP' at 0xdbfb0290
Table 'APIC' at 0xdbfb0390
Table 'SPCR' at 0xdbfb0440
Table 'SLIT' at 0xdbfb0490
SLIT: Found table at 0xdbfb0490
SLIT.Localities: 2
0: 10 10
1: 10 10
PPIM 0: PA=0xa0000, VA=0xffffffff82e10000, size=0x10000, mode=0
VT(vga): resolution 640x480
Preloaded elf kernel "/boot/kernel.test/kernel" at 0xffffffff82cbe000.
Calibrating TSC clock ... TSC clock: 2992112507 Hz
CPU: Dual-Core AMD Opteron(tm) Processor 2222 SE (2992.11-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x40f13  Family=0xf  Model=0x41  Stepping=3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x2001<SSE3,CX16>
  AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x1f<LAHF,CMP,SVM,ExtAPIC,CR8>
  SVM: Features=0x0
Revision=1, ASIDs=64
L1 2MB data TLB: 8 entries, fully associative
L1 2MB instruction TLB: 8 entries, fully associative
L1 4KB data TLB: 32 entries, fully associative
L1 4KB instruction TLB: 32 entries, fully associative
L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L2 2MB unified TLB: 0 entries, disabled/not present
L2 4KB data TLB: 512 entries, 4-way associative
L2 4KB instruction TLB: 512 entries, 4-way associative
L2 unified cache: 1024 kbytes, 64 bytes/line, 1 lines/tag, 16-way associative
real memory  = 6442450944 (6144 MB)
Physical memory chunk(s):
0x0000000000010000 - 0x0000000000097fff, 557056 bytes (136 pages)
0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256 pages)
0x0000000002d0d000 - 0x00000000d1422fff, 3463536640 bytes (845590 pages)
0x00000000dbfae000 - 0x00000000dbfaffff, 8192 bytes (2 pages)
0x0000000100000000 - 0x00000001a3fe7fff, 2751365120 bytes (671720 pages)
avail memory = 6190829568 (5904 MB)
Event timer "LAPIC" quality 100
LAPIC: ipi_wait() us multiplier 83 (r 3600371 tsc 2992112507)
ACPI APIC Table: <SUN    X4200 M2>
Package ID shift: 1
L2 cache ID shift: 0
L1 cache ID shift: 0
Core ID shift: 0
INTR: Adding local APIC 1 as a target
INTR: Adding local APIC 2 as a target
INTR: Adding local APIC 3 as a target
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
Package HW ID = 0 (0)
	Core HW ID = 0 (0)
		CPU0 (BSP): APIC ID: 0 (0)
	Core HW ID = 1 (0x1)
		CPU1 (AP): APIC ID: 1 (0x1)
Package HW ID = 1 (0x1)
	Core HW ID = 2 (0x2)
		CPU2 (AP): APIC ID: 2 (0x2)
	Core HW ID = 3 (0x3)
		CPU3 (AP): APIC ID: 3 (0x3)
APIC: CPU 0 has ACPI ID 1
APIC: CPU 1 has ACPI ID 2
APIC: CPU 2 has ACPI ID 3
APIC: CPU 3 has ACPI ID 4
SRAT: CPU 0 has memory domain 0
SRAT: CPU 1 has memory domain 0
SRAT: CPU 2 has memory domain 1
SRAT: CPU 3 has memory domain 1
x86bios:  IVT 0x000000-0x0004ff at 0xfffff80000000000
x86bios: SSEG 0x090000-0x090fff at 0xfffffe001b990000
x86bios: EBDA 0x09b000-0x09ffff at 0xfffff8000009b000
x86bios:  ROM 0x0a0000-0x0fefff at 0xfffff800000a0000
Pentium Pro MTRR support enabled
ULE: setup cpu 0
ULE: setup cpu 1
ULE: setup cpu 2
ULE: setup cpu 3
ACPI: RSDP 0x00000000000F9C10 000024 (v02 SUN   )
ACPI: XSDT 0x00000000DBFB0100 000094 (v01 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI: FACP 0x00000000DBFB0290 0000F4 (v03 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20170303/tbfadt-748)
ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 128/64 (20170303/tbfadt-748)
ACPI: DSDT 0x00000000DBFB0510 00702A (v01 SUN    X4200 M1 00000081 INTL 20050624)
ACPI: FACS 0x00000000DBFBE000 000040
ACPI: FACS 0x00000000DBFBE000 000040
ACPI: APIC 0x00000000DBFB0390 0000B0 (v01 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI: SPCR 0x00000000DBFB0440 000050 (v01 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI: SLIT 0x00000000DBFB0490 000030 (v01 SUN    OEMSLIT  00000081 MSFT 00000097)
ACPI: SPMI 0x00000000DBFB04C0 000041 (v05 SUN    OEMSPMI  00000081 MSFT 00000097)
ACPI: OEMB 0x00000000DBFBE040 000063 (v01 SUN    X4200 M2 00000081 MSFT 00000097)
ACPI: SRAT 0x00000000DBFB7540 000110 (v01 AMD    HAMMER   00000001 AMD  00000001)
ACPI: HPET 0x00000000DBFB7650 000038 (v01 SUN    OEMHPET0 00000081 MSFT 00000097)
ACPI: IPET 0x00000000DBFB7690 000038 (v01 SUN    OEMHPET1 00000081 MSFT 00000097)
ACPI: EINJ 0x00000000DBFB76D0 000130 (v01 AMIER  AMI_EINJ 06000827 MSFT 00000097)
ACPI: BERT 0x00000000DBFB7860 000030 (v01 AMIER  AMI_BERT 06000827 MSFT 00000097)
ACPI: ERST 0x00000000DBFB7890 0001B0 (v01 AMIER  AMI_ERST 06000827 MSFT 00000097)
ACPI: HEST 0x00000000DBFB7A40 0000A8 (v01 AMIER  AMI_HEST 06000827 MSFT 00000097)
ACPI: SSDT 0x00000000DBFB7AF0 0005F8 (v01 A M I  POWERNOW 00000001 AMD  00000001)
MADT: Found IO APIC ID 15, Interrupt 0 at 0xfec00000
ioapic0: ver 0x11 maxredir 0x17
ioapic0: Routing external 8259A's -> intpin 0
MADT: Found IO APIC ID 16, Interrupt 48 at 0xfeafd000
ioapic1: Changing APIC ID to 16
ioapic1: WARNING: intbase 48 != expected base r24
ioapic1: ver 0x11 maxredir 0x06
MADT: Found IO APIC ID 17, Interrupt 56 at 0xfeafc000
ioapic2: Changing APIC ID to 17
ioapic2: WARNING: intbase 56 != expected base r55
ioapic2: ver 0x11 maxredir 0x06
MADT: Found IO APIC ID 14, Interrupt 24 at 0xfeaff000
ioapic3: WARNING: intbase 24 != expected base r63
ioapic3: ver 0x11 maxredir 0x17
MADT: Interrupt override: source 0, irq 2
ioapic0: Routing IRQ 0 -> intpin 2
MADT: Interrupt override: source 9, irq 9
ioapic0: intpin 9 trigger: level
ioapic3 <Version 1.1> irqs 24-47 on motherboard
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 48-54 on motherboard
ioapic2 <Version 1.1> irqs 56-62 on motherboard
cpu0 BSP:
     ID: 0x00000000   VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff
  lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
  timer: 0x000100ef therm: 0x00010000 err: 0x000000f0 pmc: 0x00010400
   AMD ext features: 0x00010003
   AMD elvt0: 0x00010000
SMP: AP CPU #1 Launched!
cpu1 AP:
kk
Comment 13 Pablo Ruiz 2018-02-11 02:49:49 UTC
Hi again,

Adding the following patch:

diff --git a/sys/x86/x86/mp_x86.c b/sys/x86/x86/mp_x86.c
index 7cc02d663bf..3cca61ca72e 100644
--- a/sys/x86/x86/mp_x86.c
+++ b/sys/x86/x86/mp_x86.c
@@ -925,6 +925,7 @@ init_secondary_tail(void)

        CTR1(KTR_SMP, "SMP: AP CPU #%d Launched", cpuid);
        printf("SMP: AP CPU #%d Launched!\n", cpuid);
+while(1);

        /* Determine if we are a logical CPU. */
        if (cpu_info[PCPU_GET(apic_id)].cpu_hyperthread)

I get into db while crashing:

[...]
cpu0 BSP:
     ID: 0x00000000   VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff
  lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
  timer: 0x000100ef therm: 0x00010000 err: 0x000000f0 pmc: 0x00010400
   AMD ext features: 0x00010003
   AMD elvt0: 0x00010000
SMP: AP CPU #1 Launched!
kkkerneel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0x0
fault code		= supervisor write data, page not present
instruction pointer	= 0x20:0xffffffff80bb739d
stack pointer	        = 0x28:0xfffffe001b9835b0
frame pointer	        = 0x28:0xfffffe001b983620
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= resume, IOPL = 0
current process		= 11 (idle: cpu2)
[ thread pid 11 tid 100005 ]
Stopped at      putchar+0x15d:  movb    $0,(%rax)
db> bt
Tracing pid 11 tid 100005 td 0xfffff8000332c000
putchar() at putchar+0x15d/frame 0xfffffe001b983620
db> show all procs
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
   11     0     0     0  RL      (threaded)                  [idle]
100003                   CanRun                              [idle: cpu0]
100004                   CanRun                              [idle: cpu1]
100005                   CanRun                              [idle: cpu2]
100006                   CanRun                              [idle: cpu3]
    1     0     0     0  ?L                                  [kernel]
   10     0     0     0  RL                                  [audit]
    0     0     0     0  RLs     CPU 0                       [swapper]
db> show all pcpu
Current CPU: 2

cpuid        = 0
dynamic pcpu = 0x682000
curthread    = 0xffffffff82883640: pid 0 "swapper"
curpcb       = 0xffffffff82c0ecc0
fpcurthread  = none
idlethread   = 0xfffff8000332d000: tid 100003 "idle: cpu0"
curpmap      = 0xffffffff828af188
tssp         = 0xffffffff828ad510
commontssp   = 0xffffffff828ad510
rsp0         = 0xffffffff82c0ecc0
gs32p        = 0xffffffff828ad708
ldt          = 0xffffffff828ad748
tss          = 0xffffffff828ad738

cpuid        = 1
dynamic pcpu = 0xfffffe00993f1000
curthread    = 0xfffff8000332c580: pid 11 "idle: cpu1"
curpcb       = 0
fpcurthread  = none
idlethread   = 0xfffff8000332c580: tid 100004 "idle: cpu1"
curpmap      = 0xffffffff828af188
tssp         = 0xffffffff828ad578
commontssp   = 0xffffffff828ad578
rsp0         = 0x0
gs32p        = 0xffffffff828ad770
ldt          = 0xffffffff828ad7b0
tss          = 0xffffffff828ad7a0

cpuid        = 2
dynamic pcpu = 0xfffffe00993f9000
curthread    = 0xfffff8000332c000: pid 11 "idle: cpu2"
curpcb       = 0
fpcurthread  = none
idlethread   = 0xfffff8000332c000: tid 100005 "idle: cpu2"
curpmap      = 0xffffffff828af188
tssp         = 0xffffffff828ad5e0
commontssp   = 0xffffffff828ad5e0
rsp0         = 0x0
gs32p        = 0xffffffff828ad7d8
ldt          = 0xffffffff828ad818
tss          = 0xffffffff828ad808

cpuid        = 3
dynamic pcpu = 0xfffffe0099401000
curthread    = 0xfffff8000332b580: pid 11 "idle: cpu3"
curpcb       = 0
fpcurthread  = none
idlethread   = 0xfffff8000332b580: tid 100006 "idle: cpu3"
curpmap      = 0xffffffff828af188
tssp         = 0xffffffff828ad648
commontssp   = 0xffffffff828ad648
rsp0         = 0x0
gs32p        = 0xffffffff828ad840
ldt          = 0xffffffff828ad880
tss          = 0xffffffff828ad870
db> show all trace

Tracing command idle pid 11 tid 100003 td 0xfffff8000332d000
fork_trampoline() at fork_trampoline

Tracing command idle pid 11 tid 100004 td 0xfffff8000332c580
fork_trampoline() at fork_trampoline

Tracing command idle pid 11 tid 100005 td 0xfffff8000332c000
putchar() at putchar+0x15d/frame 0xfffffe001b983620

Tracing command idle pid 11 tid 100006 td 0xfffff8000332b580
fork_trampoline() at fork_trampoline

Tracing command kernel pid 1 tid 100002 td 0xfffff8000332d580
fork_trampoline() at fork_trampoline

Tracing command audit pid 10 tid 100001 td 0xfffff8000332e000
fork_trampoline() at fork_trampoline

Tracing command kernel pid 0 tid 100000 td 0xffffffff82883640
KDB: reentering
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001b982cc0
kdb_reenter() at kdb_reenter+0x2f/frame 0xfffffe001b982cd0
trap() at trap+0x4d/frame 0xfffffe001b982e90
calltrap() at calltrap+0x8/frame 0xfffffe001b982e90
--- trap 0xc, rip = 0xffffffff80c351d0, rsp = 0xfffffe001b982f60, rbp = 0xfffffe001b982f60 ---
strcmp() at strcmp+0x10/frame 0xfffffe001b982f60
db_backtrace() at db_backtrace+0x17d/frame 0xfffffe001b982ff0
db_trace_thread() at db_trace_thread+0x3f/frame 0xfffffe001b983010
db_stack_trace_all() at db_stack_trace_all+0x6f/frame 0xfffffe001b9830b0
db_command() at db_command+0x2bf/frame 0xfffffe001b983180
db_command_loop() at db_command_loop+0x64/frame 0xfffffe001b983190
db_trap() at db_trap+0xef/frame 0xfffffe001b983220
kdb_trap() at kdb_trap+0x13e/frame 0xfffffe001b983270
trap_fatal() at trap_fatal+0x2e2/frame 0xfffffe001b9832c0
trap_pfault() at trap_pfault+0x49/frame 0xfffffe001b983320
trap() at trap+0x286/frame 0xfffffe001b9834e0
calltrap() at calltrap+0x8/frame 0xfffffe001b9834e0
--- trap 0xc, rip = 0xffffffff80bb739d, rsp = 0xfffffe001b9835b0, rbp = 0xfffffe001b983620 ---
putchar() at putchar+0x15d/frame 0xfffffe001b983620
db> show threads
  100003 (0xfffff8000332d000) (stack 0xfffffe001b99c000)  fork_trampoline() at fork_trampoline
  100004 (0xfffff8000332c580) (stack 0xfffffe001b9a1000)  fork_trampoline() at fork_trampoline
  100005 (0xfffff8000332c000) (stack 0xfffffe001b9a6000)  putchar() at putchar+0x15d/frame 0xfffffe001b983620
  100006 (0xfffff8000332b580) (stack 0xfffffe001b9ab000)  fork_trampoline() at fork_trampoline
  100002 (0xfffff8000332d580) (stack 0xfffffe001b997000)  fork_trampoline() at fork_trampoline
  100001 (0xfffff8000332e000) (stack 0xfffffe001b992000)  fork_trampoline() at fork_trampoline
  100000 (0xffffffff82883640) (stack 0xffffffff82c0b000)KDB: reentering
KDB: stack backtrace:
  db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001b982cd0
kdb_reenter() at kdb_reenter+0x2f/frame 0xfffffe001b982ce0
trap() at trap+0x4d/frame 0xfffffe001b982ea0
calltrap() at calltrap+0x8/frame 0xfffffe001b982ea0
--- trap 0xc, rip = 0xffffffff80c351d0, rsp = 0xfffffe001b982f70, rbp = 0xfffffe001b982f70 ---
strcmp() at strcmp+0x10/frame 0xfffffe001b982f70
db_backtrace() at db_backtrace+0x17d/frame 0xfffffe001b983000
db_trace_thread() at db_trace_thread+0x3f/frame 0xfffffe001b983020
db_show_threads() at db_show_threads+0x83/frame 0xfffffe001b9830b0
db_command() at db_command+0x2bf/frame 0xfffffe001b983180
db_command_loop() at db_command_loop+0x64/frame 0xfffffe001b983190
db_trap() at db_trap+0xef/frame 0xfffffe001b983220
kdb_trap() at kdb_trap+0x13e/frame 0xfffffe001b983270
trap_fatal() at trap_fatal+0x2e2/frame 0xfffffe001b9832c0
trap_pfault() at trap_pfault+0x49/frame 0xfffffe001b983320
trap() at trap+0x286/frame 0xfffffe001b9834e0
calltrap() at calltrap+0x8/frame 0xfffffe001b9834e0
--- trap 0xc, rip = 0xffffffff80bb739d, rsp = 0xfffffe001b9835b0, rbp = 0xfffffe001b983620 ---
putchar() at putchar+0x15d/frame 0xfffffe001b983620
db> show ktr
--- End of trace buffer ---
db> show dpcpu_off
dpcpu_off[ 0] = 0x682000 (+ DPCPU_START = 0xffffffff82c0f000)
dpcpu_off[ 1] = 0xfffffe00993f1000 (+ DPCPU_START = 0xfffffe001b97e000)
dpcpu_off[ 2] = 0xfffffe00993f9000 (+ DPCPU_START = 0xfffffe001b986000)
dpcpu_off[ 3] = 0xfffffe0099401000 (+ DPCPU_START = 0xfffffe001b98e000)



Let me know if there is anything specific you want me to get from this point on.. :)
Comment 14 Pablo Ruiz 2018-02-11 03:01:36 UTC
I've tried moving the while loop a bit down at mp_x86, and I got a db prompt too:

diff --git a/sys/x86/x86/mp_x86.c b/sys/x86/x86/mp_x86.c
index 3cca61ca72e..1c257d87e58 100644
--- a/sys/x86/x86/mp_x86.c
+++ b/sys/x86/x86/mp_x86.c
@@ -925,7 +925,6 @@ init_secondary_tail(void)

        CTR1(KTR_SMP, "SMP: AP CPU #%d Launched", cpuid);
        printf("SMP: AP CPU #%d Launched!\n", cpuid);
-while(1);

        /* Determine if we are a logical CPU. */
        if (cpu_info[PCPU_GET(apic_id)].cpu_hyperthread)
@@ -951,6 +950,7 @@ while(1);
        load_es(_udatasel);
        load_fs(_ufssel);
 #endif
+while(1);

        mtx_unlock_spin(&ap_boot_mtx);

This is the relevant boot output:

[...]
cpu0 BSP:
     ID: 0x00000000   VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff
  lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
  timer: 0x000100ef therm: 0x00010000 err: 0x000000f0 pmc: 0x00010400
   AMD ext features: 0x00010003
   AMD elvt0: 0x00010000
SMP: AP CPU #1 Launched!
cpu1 AP:

Fa
kFernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0x441
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80ffb704
stack pointer	        = 0x28:0xfffffe001b979d00
frame pointer	        = 0x28:0xfffffe001b979d10
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= resume, IOPL = 0
current process		= 11 (idle: cpu2)
[ thread pid 11 tid 100005 ]
Stopped at      spinlock_exit+0x14:     movq    0x440(%rbx),%rax
db> bt
Tracing pid 11 tid 100005 td 0xfffff8000332c000
spinlock_exit() at spinlock_exit+0x14/frame 0xfffffe001b979d10
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b979d80
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b979dc0
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b979e30
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b979e70
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b979ee0
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b979f20
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b979f90
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b979fd0
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97a040
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97a080
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97a0f0
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97a130
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97a1a0
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97a1e0
[... repeates quite a lot ....]
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x14a/frame 0xfffffe001b97cb90
i8254_delay() at i8254_delay+0x3a/frame 0xfffffe001b97cbd0
ns8250_putc() at ns8250_putc+0x2a/frame 0xfffffe001b97cc00
uart_cnputc() at uart_cnputc+0x47/frame 0xfffffe001b97cc20
cnputc() at cnputc+0x7d/frame 0xfffffe001b97cc50
cnputs() at cnputs+0x68/frame 0xfffffe001b97cc70
putchar() at putchar+0x14d/frame 0xfffffe001b97ccf0
kvprintf() at kvprintf+0x103d/frame 0xfffffe001b97cde0
vprintf() at vprintf+0x87/frame 0xfffffe001b97ceb0
printf() at printf+0x43/frame 0xfffffe001b97cf10
dblfault_handler() at dblfault_handler+0x26/frame 0xfffffe001b97cf30
Xdblfault() at Xdblfault+0xac/frame 0xfffffe001b97cf30
--- trap 0x17, rip = 0xffffffff81159ea5, rsp = 0xfffffe001b978000, rbp = 0xfffffe001b978030 ---
i8254_delay() at i8254_delay+0x35/frame 0xfffffe001b978030
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x14a/frame 0xfffffe001b9780a0
i8254_delay() at i8254_delay+0x3a/frame 0xfffffe001b9780e0
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x14a/frame 0xfffffe001b978150
[... more similar calls repeating omitted ...]
spinlock_exit() at spinlock_exit+0x14/frame 0xfffffe001b9785d0
[... more _mtx_lock_spin_cookie calls ...]
Comment 15 Pablo Ruiz 2018-02-11 03:11:26 UTC
And, finally if I move the while(1) just after the mtx_unlock_spin call, like this:

diff --git a/sys/x86/x86/mp_x86.c b/sys/x86/x86/mp_x86.c
index 1c257d87e58..04975dd8a2e 100644
--- a/sys/x86/x86/mp_x86.c
+++ b/sys/x86/x86/mp_x86.c
@@ -950,9 +950,9 @@ init_secondary_tail(void)
        load_es(_udatasel);
        load_fs(_ufssel);
 #endif
-while(1);

        mtx_unlock_spin(&ap_boot_mtx);
+while(1);

        /* Wait until all the AP's are up. */
        while (atomic_load_acq_int(&smp_started) == 0)

We arrive to the original behaviour of no 'db' prompt, and somewhat garbaged crash:

[...]
cpu0 BSP:
     ID: 0x00000000   VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff
  lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
  timer: 0x000100ef therm: 0x00010000 err: 0x000000f0 pmc: 0x00010400
   AMD ext features: 0x00010003
   AMD elvt0: 0x00010000
SMP: AP CPU #1 Launched!
cpu1 AP:

Fa
Fa

FFa
Fa
Fa
Fa
Fata

.................

I hope this helps.
Comment 16 John Baldwin freebsd_committer freebsd_triage 2018-02-12 17:44:23 UTC
(In reply to Pablo Ruiz from comment #14)
Thanks, this gives me what I was looking for.  We have infinite recursion because the spin lock code is calling DELAY() which is trying to grab a spin lock.

One question is why isn't DELAY using the tsc?

Hmm, it seems 'tsc_is_invariant' isn't set.  Are these older AMD CPUs?

You can try this as a hack-workaround to verify it fixes the issue, but I need to think a bit more about what the right fix might be:

Index: x86/x86/delay.c
===================================================================
--- delay.c     (revision 329004)
+++ delay.c     (working copy)
@@ -72,7 +72,7 @@
                func = get_tsc;
                mask = ~0u;
        } else {
-               if (tc->tc_quality <= 0)
+               if (tc->tc_quality <= 0 || n == 1)
                        return (0);
                func = tc->tc_get_timecount;
                mask = tc->tc_counter_mask;
Comment 17 Pablo Ruiz 2018-02-13 00:31:40 UTC
Hi John,

The CPU mode is 'AMD Opteron(tm) Processor 2222SE (Dual Core)'.

I'll try the patch you suggested and report back..
Comment 18 Pablo Ruiz 2018-02-13 00:42:23 UTC
I've tried the following patch:

diff --git a/sys/x86/x86/delay.c b/sys/x86/x86/delay.c
index 8cbe6012a96..00dfff48c19 100644
--- a/sys/x86/x86/delay.c
+++ b/sys/x86/x86/delay.c
@@ -70,7 +70,7 @@ delay_tc(int n)
                func = get_tsc;
                mask = ~0u;
        } else {
-               if (tc->tc_quality <= 0)
+               if (tc->tc_quality <= 0 || n == 1)
                        return (0);
                func = tc->tc_get_timecount;
                mask = tc->tc_counter_mask;
diff --git a/sys/x86/x86/mp_x86.c b/sys/x86/x86/mp_x86.c
index 04975dd8a2e..c2a90e9a7d4 100644
--- a/sys/x86/x86/mp_x86.c
+++ b/sys/x86/x86/mp_x86.c
@@ -952,7 +952,7 @@ init_secondary_tail(void)
 #endif

        mtx_unlock_spin(&ap_boot_mtx);
-while(1);
+//while(1);

        /* Wait until all the AP's are up. */
        while (atomic_load_acq_int(&smp_started) == 0)

.....

But I got to 'db' prompt to:

cpu0 BSP:
     ID: 0x00000000   VER: 0x80050010 LDR: 0x00000000 DFR: 0xffffffff
  lint0: 0x00010700 lint1: 0x00000400 TPR: 0x00000000 SVR: 0x000001ff
  timer: 0x000100ef therm: 0x00010000 err: 0x000000f0 pmc: 0x00010400
   AMD ext features: 0x00010003
   AMD elvt0: 0x00010000
SMP: AP CPU #1 Launched!
cpu1 AP:


FFa
tFa

FFa

FFa
Fa
kFernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0x441
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80ffb704
stack pointer	        = 0x28:0xfffffe001b97ba90
frame pointer	        = 0x28:0xfffffe001b97baa0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= resume, IOPL = 0
current process		= 11 (idle: cpu2)
[ thread pid 11 tid 100005 ]
Stopped at      spinlock_exit+0x14:     movq    0x440(%rbx),%rax
db> show thread
Thread 100005 at 0xfffff8000332c000:
 proc (pid 11): 0xfffff80003328000
 name: idle: cpu2
 stack: 0xfffffe001b9a6000-0xfffffe001b9a9fff
 flags: 0x40024  pflags: 0x200000
 state: CAN RUN
 priority: 255
 container lock: sched lock 0 (0xffffffff827ee880)
db> bt
Tracing pid 11 tid 100005 td 0xfffff8000332c000
spinlock_exit() at spinlock_exit+0x14/frame 0xfffffe001b97baa0
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97bb10
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97bb50
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97bbc0
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97bc00
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97bc70
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97bcb0
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97bd20
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97bd60
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97bdd0
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97be10
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97be80
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97bec0
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97bf30
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97bf70
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97bfe0
i8254_delay() at i8254_delay+0x143/frame 0xfffffe001b97c020
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xa5/frame 0xfffffe001b97c090
....
Comment 19 Pablo Ruiz 2018-02-20 01:51:57 UTC
Hi,

Did you guys had a chance to take another look at this? May I help with any test?

Regards
Pablo
Comment 20 John Baldwin freebsd_committer freebsd_triage 2018-02-26 21:50:18 UTC
Grrr, not sure why my patch didn't prevent it from recursing.  You could try '|| cold' instead of '|| n == 1' perhaps.  You could also try changing the 'DELAY(1)' in _mtx_lock_indefinite_check() in sys/kern/kern_mutex.c to be something like 'if (cold) cpu_spinwait(); else DELAY(1);' instead of the 'n == 1' hack.

Oh, I see why 'n == 1' didn't help.  The early_delay callback that is used when that 'n == 1' check fails is i8254_delay (set in amd64/amd64/machdep.c).
Comment 21 mail 2018-04-25 09:19:05 UTC
(In reply to John Baldwin from comment #20)

I've followed the steps below in an attempt to test your patch:

Index: x86/x86/delay.c
===================================================================
--- delay.c     (revision 329004)
+++ delay.c     (working copy)
@@ -72,7 +72,7 @@
                func = get_tsc;
                mask = ~0u;
        } else {
-               if (tc->tc_quality <= 0)
+               if (tc->tc_quality <= 0 || cold
                        return (0);
                func = tc->tc_get_timecount;
                mask = tc->tc_counter_mask;

(I do not have any C skills at all, sorry if I've misinterpreted your last comment)

1 :
rm -rvf /usr/src ; mkdir /usr/src ; svnlite checkout https://svn.freebsd.org/base/stable/11/ /usr/src

2:

cd /usr/src/sys ; patch < /root/patch_c16.diff
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: x86/x86/delay.c
|===================================================================
|--- delay.c     (revision 329004)
|+++ delay.c     (working copy)
--------------------------
File to patch: x86/x86/delay.c
Patching file x86/x86/delay.c using Plan A...
Hunk #1 failed at 72.
1 out of 1 hunks failed--saving rejects to x86/x86/delay.c.rej
done

Which tree or src code should the patches mentioned in this PR be applied against exactly?

Regards,

Ruben
Comment 22 mail 2018-11-23 22:22:51 UTC
Hi,

Just to follow up: 12.0 RC-1 boots perfectly on our Opteron cpu's. I upgraded from 11.0 to 12.0 RC-1 , haven't tried 11.2.

Ruben

---<<BOOT>>---
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-RC1 r340470 GENERIC amd64
FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1)
VT(vga): resolution 640x480
CPU: Quad-Core AMD Opteron(tm) Processor 2356 (2300.14-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x100f23  Family=0x10  Model=0x2  Stepping=3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x7ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS>
  SVM: NP,NAsids=64
  TSC: P-state invariant
real memory  = 25769803776 (24576 MB)
avail memory = 25003937792 (23845 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: <SUN    X4x40   >
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s)
random: unblocking device.
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 24-47 on motherboard
Launching APs: 4 5 2 6 3 7 1
Timecounter "TSC-low" frequency 1150069463 Hz quality 800
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
[ath_hal] loaded
module_register_init: MOD_LOAD (vesa, 0xffffffff810f8690, 0) error 19
nexus0
vtvga0: <VT VGA driver> on motherboard
Comment 23 John Baldwin freebsd_committer freebsd_triage 2018-11-26 17:46:28 UTC
Sorry I wasn't able to track this down earlier.  I knew when looking at the other bug (228768) that I had tried to investigate this before but couldn't find this bug.

*** This bug has been marked as a duplicate of bug 228768 ***