Bug 121148 - [panic] Repeatable sysctl crash (Fatal Trap 12) with ACPI enabled
Summary: [panic] Repeatable sysctl crash (Fatal Trap 12) with ACPI enabled
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: i386 (show other bugs)
Version: 7.0-PRERELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-i386 (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-27 16:50 UTC by Jim Pingle
Modified: 2019-01-07 05:18 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jim Pingle 2008-02-27 16:50:00 UTC
SuperMicro SuperServer 6022L-6 will not fully boot RELENG_7 unless I booth with ACPI disabled. RELENG_7_0 does not crash on the same hardware with the same config.

Crash is as follows:
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x2043455c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc0742c86
stack pointer           = 0x28:0xe8cada0c
frame pointer           = 0x28:0xe8cada38
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 68 (sysctl)
trap number             = 12
panic: page fault
cpuid = 3
Uptime: 6s
Physical memory: 2035 MB
Dumping 65 MB: 50 34 18 2

The crash happens just after the "Entropy harvesting..." line, before
swap is started. As you can see in the crash output, the offending process
is sysctl.

I can boot to single user mode, but if I issue sysctl -a while there, it
also crashes. When sysctl -a is run in single user mode, the last three
lines before the crash are (transcribed by hand, no serial console available):

dev.pcib.3.%location: handle=\_SB_.PCI3
dev.pcib.3.%pnpinfo: _HID=PNP0A03 UID=3
dev.pcib.3.%parent: acpi0

With a working RELENG_7_0 the lines immediately following this are:

dev.pcib.4.%desc: ACPI Host-PCI bridge
dev.pcib.4.%driver: pcib
dev.pcib.4.%location: handle=\_SB_.PCI4
dev.pcib.4.%pnpinfo: _HID=PNP0A03 _UID=4
dev.pcib.4.%parent: acpi0

I tried a binary search of the source tree to narrow down the crash. I
found that one possible vector for the crash was introduced between
2007/12/19 20:00:00 (booted OK) and 2007/12/19 23:59:00 (crashed), which
left me with only a handful of files to test.

By process of elimination, I found that if I backed some changes out in
src/sys/i386/i386/machdep.c, the crash stopped.

src/sys/i386/i386/machdep.c v1.658 2007/08/09 njl - Boots OK
src/sys/i386/i386/machdep.c v1.658.2.1 2007/12/19 rpaulo - Crashes

The confusing part (to me) is that my next step was to update all the
way to RELENG_7 as of yesterday, then back out those same changes, but
the crash still happened. So either I misidentified the cause of the
crash -- which is quite possible -- or it was reintroduced in some other
change (or both!). 

kgdb output from vmcore.0:
Unread portion of the kernel message buffer:
Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-PRERELEASE #0: Mon Feb 25 15:22:54 EST 2008
    root@test1.hpcisp.com:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) XEON(TM) CPU 2.00GHz (1999.94-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf24  Stepping = 4
  Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM>
  Logical CPUs per core: 2
real memory  = 2147418112 (2047 MB)
avail memory = 2091872256 (1994 MB)
ACPI APIC Table: <RCC    GCHE    >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ACPI Warning (tbfadt-0505): Optional field "Gpe1Block" has zero address or length:        0       0/8 [20070320]
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-15 on motherboard
ioapic1 <Version 1.1> irqs 16-31 on motherboard
ioapic2 <Version 1.1> irqs 32-47 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Feb 25 2008 15:20:56)
acpi0: <RCC GCHE> on motherboard
ACPI Warning (dswload-0794): Type override - [DEB_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [MLIB] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [IO__] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [DATA] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [ICNT] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [ACPI] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [IORG] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [BIOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [CMOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [KBC_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [OEM_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, 7ff00000 (3) failed
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0
cpu0: <ACPI CPU> on acpi0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
p4tcc3: <CPU Frequency Thermal Control> on cpu3
acpi_button0: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0xa800-0xa8ff mem 0xfd000000-0xfdffffff,0xfe5ff000-0xfe5fffff irq 18 at device 2.0 on pci0
fxp0: <Intel 82550 Pro/100 Ethernet> port 0xae80-0xaebf mem 0xfe5fc000-0xfe5fcfff,0xfe580000-0xfe59ffff irq 17 at device 4.0 on pci0
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> PHY 1 on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:30:48:20:a3:9e
fxp0: [ITHREAD]
fxp1: <Intel 82550 Pro/100 Ethernet> port 0xaf00-0xaf3f mem 0xfe5fd000-0xfe5fdfff,0xfe5a0000-0xfe5bffff irq 19 at device 5.0 on pci0
miibus1: <MII bus> on fxp1
inphy1: <i82555 10/100 media interface> PHY 1 on miibus1
inphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1: Ethernet address: 00:30:48:20:a3:9f
fxp1: [ITHREAD]
isab0: <PCI-ISA bridge> at device 15.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <ServerWorks CSB5 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 15.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
ohci0: <OHCI (generic) USB controller> mem 0xfe5fe000-0xfe5fefff irq 10 at device 15.2 on pci0
ohci0: [GIANT-LOCKED]
ohci0: [ITHREAD]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: <(0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 4 ports with 4 removable, self powered
pcib1: <ACPI Host-PCI bridge> on acpi0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI Host-PCI bridge> on acpi0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI Host-PCI bridge> on acpi0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI Host-PCI bridge> on acpi0
pci4: <ACPI PCI bus> on pcib4
asr0: <Adaptec Caching SCSI RAID> mem 0xfeb00000-0xfebfffff,0xfb000000-0xfbffffff,0xf8000000-0xf9ffffff irq 29 at device 3.0 on pci4
asr0: [GIANT-LOCKED]
asr0: [ITHREAD]
asr0:   ADAPTEC 2005S FW Rev. 380E, 2 channel, 2000 CCBs, Protocol I2O
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model NetMouse/NetScroll Optical, device ID 0
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FILTER]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
sio1: [FILTER]
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcdfff,0xce000-0xcefff,0xcf000-0xcffff pnpid ORM0000 on isa0
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
ppbus0: [ITHREAD]
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
hptrr: no controller detected.
acd0: CDROM <MATSHITA CR-177/7T0D> at ata1-master UDMA33
da0 at asr0 bus 0 target 0 lun 0
da0: <ADAPTEC RAID-5 380E> Fixed Direct Access SCSI-2 device 
ses0 at asr0 bus 0 target 6 lun 0
ses0: <SUPER GEM318 0> Fixed Processor SCSI-2 device 
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/da0s1a
<118>Loading configuration files.
<118>kernel dumps on /dev/da0s1b
<118>Entropy harvesting:
<118> interrupts
<118> ethernet
<118> point_to_point


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x2043455c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc0742c86
stack pointer           = 0x28:0xe8cada0c
frame pointer           = 0x28:0xe8cada38
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 68 (sysctl)
trap number             = 12
panic: page fault
cpuid = 3
Uptime: 6s
Physical memory: 2035 MB
Dumping 65 MB: 50 34 18 2

#0  doadump () at pcpu.h:195
195     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc073a688 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc073a941 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc0a19dc0 in trap_fatal (frame=0xe8cad9cc, eva=541279580) at /usr/src/sys/i386/i386/trap.c:899
#4  0xc0a1a030 in trap_pfault (frame=0xe8cad9cc, usermode=0, eva=541279580) at /usr/src/sys/i386/i386/trap.c:812
#5  0xc0a1a9ad in trap (frame=0xe8cad9cc) at /usr/src/sys/i386/i386/trap.c:490
#6  0xc0a01cab in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0742c86 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:630
#8  0xc0742d46 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:618
#9  0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:630
#10 0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:630
#11 0xc0742de6 in sysctl_sysctl_next (oidp=0xc0b4c940, arg1=0xe8cadc1c, arg2=4, req=0xe8cadba4)
    at /usr/src/sys/kern/kern_sysctl.c:651
#12 0xc07436f2 in sysctl_root (oidp=Variable "oidp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:1306
#13 0xc074382e in userland_sysctl (td=0xc5574210, name=0xe8cadc14, namelen=6, old=0xbfbfe4e8, oldlenp=0xbfbfe598, 
    inkernel=0, new=0x0, newlen=0, retval=0xe8cadc10, flags=0) at /usr/src/sys/kern/kern_sysctl.c:1401
#14 0xc0744462 in __sysctl (td=0xc5574210, uap=0xe8cadcfc) at /usr/src/sys/kern/kern_sysctl.c:1336
#15 0xc0a1a378 in syscall (frame=0xe8cadd38) at /usr/src/sys/i386/i386/trap.c:1035
#16 0xc0a01d10 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:196
#17 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

This is a testing machine that is only being used to evaluate 7.0 for
use on similar hardware. I can take whatever debugging steps that are
needed, just let me know what information is necessary to help resolve
the issue. 

I tried posting this information to the -STABLE list, but received no replies.

System is running with the most current BIOS available from the OEM. RAM
tested OK with memtest86+ left running for a day or so.

Fix: 

Workaround is to run with ACPI disabled, but that is not desired.

One part of the crash was possibly introduced with rev v1.658.2.1 of
src/sys/i386/i386/machdep.c, but I am unable to repeat that fix on
recent RELENG_7 sources.
How-To-Repeat: Attempt to boot with a RELENG_7 world/kernel on a SuperMicro SuperServer
6022L-6 with ACPI enabled.

Alternately, boot to single user mode and issue "sysctl -a". Crashes every
time in the exact same place.
Comment 1 Gavin Atkinson freebsd_committer freebsd_triage 2008-02-28 15:34:11 UTC
State Changed
From-To: open->feedback

To submitter:  Firstly, it looks to me like the commit that you narrowed 
the panic down to is not actually responsible for the problem you are 
seeing - my suspicion is that it actually just moves the layout of memory 
around enough to avoid seeing the problem. 

From single user mode, can you determine which of the following panic: 

sysctl dev.pcib.3 
sysctl dev.pcib.4 

(I'm guessing it's the latter, but it's worth checking). 

Secondly, I wonder if you could test setting debug.acpi.disabled="ec" from 
the loader, and see if that makes any difference?  I notice that the 
"fault virtual address" is 0x2043455c, or " CE", but this may be a 
coincidence... 

Lastly, are you able to recompile the kernel with debugging support 
(options KDB and DDB), and also add printf's to 
/usr/src/sys/kern/kern_sysctl.c at lines 618 and 630 (between the setting 
of lsp and calling sysctl_sysctl_next_ls()) to show the value of the various 
variables?  Something like this line should work: 

printf("lsp=%p, oidp=%p, oidpp=%pn", lsp, oidp, oidpp); 

If you can still recreate the panic with these printf's and the debugger 
compiled in, hopefully we can get more information out of your system as to 
exactly what is happening. 



Comment 2 Gavin Atkinson freebsd_committer freebsd_triage 2008-02-28 15:34:11 UTC
Responsible Changed
From-To: freebsd-i386->gavin

Track
Comment 3 jim 2008-02-28 21:39:50 UTC
 > my suspicion is that it actually just moves the layout of memory
 > around enough to avoid seeing the problem.

I have no doubt that you are correct in that. The code changes in that 
file seemed very unrelated to anything near the crash, but I thought it 
was worth mentioning anyhow.

>From single user mode, can you determine which of the following panic:
> sysctl dev.pcib.3

Crashes at the end. I had a debug kernel built already, I was also able 
to add the printfs with no problem. Here is the last bit of output:

dev.pcib.3.%parent: acpi0
lsp=0xc0bcf314 oidp=0xc0b678e0 iodpp=0xe8c76b4c
lsp=0xc525c7d0 oidp=0xc5262040 iodpp=0xe8c76b4c
lsp=0xc52d9700 oidp=0xc52ed4c0 iodpp=0xe8c76b4c
lsp=0xc52ee140 oidp=0xc52ed140 iodpp=0xe8c76b4c
[crashes here]

kdb says the crash happened at:
sysctl_sysctl_next_ls+0x32   movl 0x8(%esi),%eax
 > print %esi
c074b9ba
 > print %eax
c074b9ba

> sysctl dev.pcib.4

Yields: unknown oid 'dev.pcib.4'

 > I wonder if you could test setting debug.acpi.disabled="ec"
 > from the loader

This made it get farther along in the boot sequence, but it then crashed 
in devd, also a fatal trap 12, with a virtual fault address of 0x108. I 
can get the whole copy of the crash output if you'd like.

> If you can still recreate the panic with these printf's and the debugger
> compiled in, hopefully we can get more information out of your system as to
> exactly what is happening.

It still crashes in the same place, perfectly repeatable, and as far as 
I can tell the addresses are the same each time. Let me know how you'd 
like me to proceed.

Thanks for the quick response, and all the help/ideas.

Jim
Comment 4 Mark Linimon freebsd_committer freebsd_triage 2008-03-02 06:13:04 UTC
State Changed
From-To: feedback->open

Note that feedback was received.
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:58:48 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 6 Warner Losh freebsd_committer freebsd_triage 2019-01-07 05:18:04 UTC
Given the huge amount of change in the ACPI code, I'm going to close this as OBE. The info here is about useless in tracking things down with the latest code. If this problem persists on newer versions of FreeBSD (11 or 12), please file a new bug with updated info.