Bug 217422 - Fatal trap 12: page fault while in kernel mode during heavy IO
Summary: Fatal trap 12: page fault while in kernel mode during heavy IO
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-28 17:07 UTC by Cameron
Modified: 2017-02-28 17:07 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cameron 2017-02-28 17:07:56 UTC
I get 2-4 crashes (followed by automatic soft reboots) a week. Every night at 1 am PST, I have a cron that backs up another system using rsync over ssh.

Last night's crash occurred at ~3:09 am, and the previous crash before that occurred around 3:24 am.

I've noticed that the "periodic daily" cron, which I believe causes some IO load of its own, is set to start at 3:01 am.

The crashes all occur during this heavy backup via rsync. I went through all the bug reports I could, I don't *think* this is a duplicate.

Note: runs on physical hardware w/8 core Intel Avaton CPU.

Motherboard: supermicro a1sai-2750f
Memory: 16G ECC

root file system is ZFS on 2x Intel 730 240G SSD's (ada1 & ada2)

Backup drive is 8TB Seagate ST8000NM0055-1RM112 spinning disk. It's less than a year old on a SATA2 port (ada0). Filesystem is ZFS.

(I recently remade the filesystem so I could create a swap partition for kernel crash dumps).

zpool status
  pool: storage
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon Feb 27 13:01:41 2017
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 0h2m with 0 errors on Sun Feb 26 01:54:37 2017
config:

        NAME                                            STATE     READ WRITE CKSUM
        zroot                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/3f64a6eb-0faa-11e4-8b78-002590f1cfc0  ONLINE       0     0     0
            gptid/2eb24e92-1555-11e4-9076-002590f1cfc0  ONLINE       0     0     0

errors: No known data errors

I'm using a custom kernel with very few changes (I'll switch to GENERIC to see if it makes a difference). Here's the diff:
diff -u GENERIC VASTEEL
--- GENERIC     2016-09-05 10:40:05.944395438 -0700
+++ VASTEEL     2016-09-05 10:40:22.326390926 -0700
@@ -357,3 +357,18 @@
 
 # The crypto framework is required by IPSEC
 device         crypto                  # Required by IPSEC
+
+# Enable disk quota.
+options QUOTA
+
+device         pf
+device         pflog
+device         pfsync
+
+options                ALTQ
+options                ALTQ_CBQ        # Class Bases Queuing (CBQ)
+options                ALTQ_RED        # Random Early Detection (RED)
+options                ALTQ_RIO        # RED In/Out
+options                ALTQ_HFSC       # Hierarchical Packet Scheduler (HFSC)
+options                ALTQ_PRIQ       # Priority Queuing (PRIQ)
+options                ALTQ_NOPCC      # Required for SMP build

kldstat 
Id Refs Address            Size     Name
 1   25 0xffffffff80200000 20058c0  kernel
 2    1 0xffffffff82207000 30b650   zfs.ko
 3    2 0xffffffff82513000 adb0     opensolaris.ko
 4    1 0xffffffff8251e000 4c60     coretemp.ko
 5    1 0xffffffff82621000 587b     fdescfs.ko
 6    1 0xffffffff82627000 3710     ums.ko
 7    1 0xffffffff8262b000 abf1     linprocfs.ko
 8    1 0xffffffff82636000 7b18     linux_common.ko

Kernel crash dump:

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 0e
fault virtual address   = 0x8
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80b7b5d0
stack pointer           = 0x28:0xfffffe04669a87c0
frame pointer           = 0x28:0xfffffe04669a8800
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 10403 (rsync)
trap number             = 12
panic: page fault
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff80b2dcb7 at kdb_backtrace+0x67
#1 0xffffffff80ae2302 at vpanic+0x182
#2 0xffffffff80ae2173 at panic+0x43
#3 0xffffffff80ff2c71 at trap_fatal+0x351
#4 0xffffffff80ff2e63 at trap_pfault+0x1e3
#5 0xffffffff80ff240d at trap+0x26d
#6 0xffffffff80fd5441 at calltrap+0x8
#7 0xffffffff80b7a398 at sbdestroy+0x18
#8 0xffffffff80b7cd9a at sofree+0x22a
#9 0xffffffff80b7d516 at soclose+0x516
#10 0xffffffff80a7ad0d at _fdrop+0x1d
#11 0xffffffff80a7e90d at closef+0x2ed
#12 0xffffffff80a7e35d at fdescfree_fds+0x7d
#13 0xffffffff80a7dee9 at fdescfree+0x6b9
#14 0xffffffff80a9011e at exit1+0x75e
#15 0xffffffff80a8f9bd at sys_sys_exit+0xd
#16 0xffffffff80ff35e3 at amd64_syscall+0x4e3
#17 0xffffffff80fd572b at Xfast_syscall+0xfb
Uptime: 13h43m7s
Dumping 2913 out of 16321 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from /usr/lib/debug//boot/kernel/coretemp.ko.debug...done.
done.
Loaded symbols for /boot/kernel/coretemp.ko
Reading symbols from /boot/kernel/fdescfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/fdescfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/fdescfs.ko
Reading symbols from /boot/kernel/ums.ko...Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ums.ko
Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/linprocfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/linprocfs.ko
Reading symbols from /boot/kernel/linux_common.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux_common.ko.debug...done.
done.
Loaded symbols for /boot/kernel/linux_common.ko
Reading symbols from /boot/kernel/snp.ko...Reading symbols from /usr/lib/debug//boot/kernel/snp.ko.debug...done.
done.
Loaded symbols for /boot/kernel/snp.ko
#0  doadump (textdump=<value optimized out>) at pcpu.h:221
221             __asm("movq %%gs:%1,%0" : "=r" (td


dmesg:
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-RELEASE-p8 #16 r314186: Thu Feb 23 16:40:33 PST 2017
    root@vasteel.neo-zeon.de:/usr/obj/usr/src/sys/VASTEEL amd64
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
VT(vga): resolution 640x480
CPU: Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz (2400.07-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x406d8  Family=0x6  Model=0x4d  Stepping=8
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x43d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,AESNI,RDRAND>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x101<LAHF,Prefetch>
  Structured Extended Features=0x2282<TSCADJ,SMEP,ERMS,NFPUSG>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 17179869184 (16384 MB)
avail memory = 16515948544 (15750 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <INTEL  TIANO   >
WARNING: L1 data cache covers less APIC IDs than a core
0 < 1
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 8 core(s)
random: unblocking device.
ioapic0 <Version 2.0> irqs 0-23 on motherboard
random: entropy device external interface
kbd1 at kbdmux0
netmap: loaded module
module_register_init: MOD_LOAD (vesa, 0xffffffff8106e9a0, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <ALASKA A M I > on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
cpu4: <ACPI CPU> on acpi0
cpu5: <ACPI CPU> on acpi0
cpu6: <ACPI CPU> on acpi0
cpu7: <ACPI CPU> on acpi0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 350
Event timer "HPET1" frequency 14318180 Hz quality 340
Event timer "HPET2" frequency 14318180 Hz quality 340
atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0
atrtc0: Warning: Couldn't map I/O.
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pcib0: _OSC returned error 0x10
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> mem 0xdf2c0000-0xdf2dffff irq 16 at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1
pci2: <ACPI PCI bus> on pcib2
vgapci0: <VGA-compatible display> port 0xd000-0xd07f mem 0xde000000-0xdeffffff,0xdf000000-0xdf01ffff irq 16 at device 0.0 on pci2
vgapci0: Boot video device
pcib3: <ACPI PCI-PCI bridge> mem 0xdf2a0000-0xdf2bffff irq 16 at device 2.0 on pci0
pci3: <ACPI PCI bus> on pcib3
xhci0: <XHCI (generic) USB 3.0 controller> mem 0xdf100000-0xdf101fff irq 17 at device 0.0 on pci3
xhci0: 64 bytes context size, 32-bit DMA
xhci0: Unable to map MSI-X table 
usbus0 on xhci0
pcib4: <ACPI PCI-PCI bridge> mem 0xdf280000-0xdf29ffff irq 20 at device 3.0 on pci0
pci4: <ACPI PCI bus> on pcib4
pci0: <base peripheral, IOMMU> at device 15.0 (no driver attached)
igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe0c0-0xe0df mem 0xdf260000-0xdf27ffff,0xdf2ec000-0xdf2effff irq 20 at device 20.0 on pci0
igb0: Using MSIX interrupts with 9 vectors
igb0: Ethernet address: 00:25:90:f1:cf:c0
igb0: Bound queue 0 to cpu 0
igb0: Bound queue 1 to cpu 1
igb0: Bound queue 2 to cpu 2
igb0: Bound queue 3 to cpu 3
igb0: Bound queue 4 to cpu 4
igb0: Bound queue 5 to cpu 5
igb0: Bound queue 6 to cpu 6
igb0: Bound queue 7 to cpu 7
igb0: netmap queues/slots: TX 8/1024, RX 8/1024
igb1: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe0a0-0xe0bf mem 0xdf240000-0xdf25ffff,0xdf2e8000-0xdf2ebfff irq 21 at device 20.1 on pci0
igb1: Using MSIX interrupts with 9 vectors
igb1: Ethernet address: 00:25:90:f1:cf:c1
igb1: Bound queue 0 to cpu 0
igb1: Bound queue 1 to cpu 1
igb1: Bound queue 2 to cpu 2
igb1: Bound queue 3 to cpu 3
igb1: Bound queue 4 to cpu 4
igb1: Bound queue 5 to cpu 5
igb1: Bound queue 6 to cpu 6
igb1: Bound queue 7 to cpu 7
igb1: netmap queues/slots: TX 8/1024, RX 8/1024
igb2: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe080-0xe09f mem 0xdf220000-0xdf23ffff,0xdf2e4000-0xdf2e7fff irq 22 at device 20.2 on pci0
igb2: Using MSIX interrupts with 9 vectors
igb2: Ethernet address: 00:25:90:f1:cf:c2
igb2: Bound queue 0 to cpu 0
igb2: Bound queue 1 to cpu 1
igb2: Bound queue 2 to cpu 2
igb2: Bound queue 3 to cpu 3
igb2: Bound queue 4 to cpu 4
igb2: Bound queue 5 to cpu 5
igb2: Bound queue 6 to cpu 6
igb2: Bound queue 7 to cpu 7
igb2: netmap queues/slots: TX 8/1024, RX 8/1024
igb3: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe060-0xe07f mem 0xdf200000-0xdf21ffff,0xdf2e0000-0xdf2e3fff irq 23 at device 20.3 on pci0
igb3: Using MSIX interrupts with 9 vectors
igb3: Ethernet address: 00:25:90:f1:cf:c3
igb3: Bound queue 0 to cpu 0
igb3: Bound queue 1 to cpu 1
igb3: Bound queue 2 to cpu 2
igb3: Bound queue 3 to cpu 3
igb3: Bound queue 4 to cpu 4
igb3: Bound queue 5 to cpu 5
igb3: Bound queue 6 to cpu 6
igb3: Bound queue 7 to cpu 7
igb3: netmap queues/slots: TX 8/1024, RX 8/1024
ehci0: <Intel Avoton USB 2.0 controller> mem 0xdf2f3000-0xdf2f33ff irq 23 at device 22.0 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
ahci0: <Intel Avoton AHCI SATA controller> port 0xe150-0xe157,0xe140-0xe143,0xe130-0xe137,0xe120-0xe123,0xe040-0xe05f mem 0xdf2f2000-0xdf2f27ff irq 19 at device 23.0 on pci0
ahci0: AHCI v1.30 with 4 3Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahci1: <Intel Avoton AHCI SATA controller> port 0xe110-0xe117,0xe100-0xe103,0xe0f0-0xe0f7,0xe0e0-0xe0e3,0xe020-0xe03f mem 0xdf2f1000-0xdf2f17ff irq 19 at device 24.0 on pci0
ahci1: AHCI v1.30 with 2 6Gbps ports, Port Multiplier not supported
ahcich4: <AHCI channel> at channel 0 on ahci1
ahcich5: <AHCI channel> at channel 1 on ahci1
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
ppc0: cannot reserve I/O port range
coretemp0: <CPU On-Die Thermal Sensors> on cpu0
est0: <Enhanced SpeedStep Frequency Control> on cpu0
coretemp1: <CPU On-Die Thermal Sensors> on cpu1
est1: <Enhanced SpeedStep Frequency Control> on cpu1
coretemp2: <CPU On-Die Thermal Sensors> on cpu2
est2: <Enhanced SpeedStep Frequency Control> on cpu2
coretemp3: <CPU On-Die Thermal Sensors> on cpu3
est3: <Enhanced SpeedStep Frequency Control> on cpu3
coretemp4: <CPU On-Die Thermal Sensors> on cpu4
est4: <Enhanced SpeedStep Frequency Control> on cpu4
coretemp5: <CPU On-Die Thermal Sensors> on cpu5
est5: <Enhanced SpeedStep Frequency Control> on cpu5
coretemp6: <CPU On-Die Thermal Sensors> on cpu6
est6: <Enhanced SpeedStep Frequency Control> on cpu6
coretemp7: <CPU On-Die Thermal Sensors> on cpu7
est7: <Enhanced SpeedStep Frequency Control> on cpu7
usbus0: 5.0Gbps Super Speed USB v3.0
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
Timecounters tick every 1.000 msec
nvme cam probe device init
usbus1: 480Mbps High Speed USB v2.0
ugen0.1: <0x1912> at usbus0
uhub0: <0x1912 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
ugen1.1: <Intel> at usbus1
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ada0 at ahcich1 bus 0 scbus1 target 0 lun 0
ada0: <ST8000NM0055-1RM112 SN02> ACS-3 ATA SATA 3.x device
ada0: Serial Number ZA11E7R9
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 7630885MB (15628053168 512 byte sectors)
ada1 at ahcich4 bus 0 scbus4 target 0 lun 0
ada1: <INTEL SSDSC2BP240G4 L2010410> ATA8-ACS SATA 3.x device
ada1: Serial Number BTJR408202C3240AGN
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada1: Command Queueing enabled
ada1: 228936MB (468862128 512 byte sectors)
ada2 at ahcich5 bus 0 scbus5 target 0 lun 0
ada2: <INTEL SSDSC2BP240G4 L2010410> ATA8-ACS SATA 3.x device
ada2: Serial Number BTJR40820CQN240AGN
ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada2: Command Queueing enabled
ada2: 228936MB (468862128 512 byte sectors)
SMP: AP CPU #6 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #1 Launched!
Timecounter "TSC-low" frequency 1200035112 Hz quality 1000
Trying to mount root from zfs:zroot/ROOT/default []...
Root mount waiting for: usbus1 usbus0
uhub0: 8 ports with 8 removable, self powered
Root mount waiting for: usbus1
Root mount waiting for: usbus1
uhub1: 8 ports with 8 removable, self powered
Root mount waiting for: usbus1
ugen1.2: <vendor 0x8087> at usbus1
uhub2: <vendor 0x8087 product 0x07db, class 9/0, rev 2.00/0.02, addr 2> on usbus1
Root mount waiting for: usbus1
uhub2: 4 ports with 4 removable, self powered
ugen1.3: <vendor 0x0000> at usbus1
uhub3: <vendor 0x0000 product 0x0001, class 9/0, rev 2.00/0.00, addr 3> on usbus1
Root mount waiting for: usbus1
uhub3: 4 ports with 3 removable, self powered
ugen1.4: <vendor 0x0557> at usbus1
ukbd0: <vendor 0x0557 product 0x2419, class 0/0, rev 1.10/1.00, addr 4> on usbus1
kbd2 at ukbd0
igb0: link state changed to UP
ums0: <vendor 0x0557 product 0x2419, class 0/0, rev 1.10/1.00, addr 4> on usbus1
ums0: 3 buttons and [Z] coordinates ID=0
pflog0: promiscuous mode enabled
igb1: link state changed to UP