Bug 261233 - Frequent kernel panics during disk access after upgrading from 12.2-RELEASE-p7 to 12.2-p12 or 12.3 on i386
Summary: Frequent kernel panics during disk access after upgrading from 12.2-RELEASE-p...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: i386 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-15 17:49 UTC by Yann C.
Modified: 2022-02-11 16:11 UTC (History)
2 users (show)

See Also:


Attachments
coredump1 (373 bytes, text/plain)
2022-01-18 10:07 UTC, Leonid Nevecherya
no flags Details
coredump2 (373 bytes, text/plain)
2022-01-18 10:07 UTC, Leonid Nevecherya
no flags Details
coredump3 (376 bytes, text/plain)
2022-01-18 10:07 UTC, Leonid Nevecherya
no flags Details
coredump4 (409 bytes, text/plain)
2022-01-18 10:09 UTC, Leonid Nevecherya
no flags Details
coredump5 (429 bytes, text/plain)
2022-01-18 10:09 UTC, Leonid Nevecherya
no flags Details
coredump6 (409 bytes, text/plain)
2022-01-18 10:10 UTC, Leonid Nevecherya
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yann C. 2022-01-15 17:49:39 UTC
I updated my server 2 days ago using freebsd-update from a 12.2-RELEASE-p7 kernel to p12 and since then it panicked something like 20 times in 24h as soon as there's filesystem activity. gmirror repairs seemed to have no impact though. Reverting to old p7 kernel it's now been stable again for 24h. I tried upgrading to 12.3 but it crashed like 12.2p12 after the kernel upgrade step and even before I got a chance to upgrade binaries.

I managed to snap a picture of the panic stacktrace right before it rebooted after one of the crashes (see transcript below).

I read the changes from p7 to p12 and I think it's related to changes made in the pmap code by this commit (1) as the stacktrace starts in PTDpde.

I'll admit, that while I've been running FreeBSD servers since, I think, 2.7 I'm not familiar with the kernel code but it reeks of race condition under load.

The machine is a Core i5 2400 (see details below), and yes I know I could be running the 64 bits version, but I never needed to go above 4GB of RAM and it's been running the 32 bits version of the OS flawlessly for years now.

Sorry if this is a duplicate but I searched and could not find anything related.

(1) https://github.com/freebsd/freebsd-src/commit/a165b4591e48cd2adce8215fca73147c016e6cea#diff-b34ee41e14f87fb2b18fdf77337237f336830ae88aac2a02e1c32aa45e43b4de

panic: vm_fault: fault on nofault entry, addr: 0
cpuid = 1
time = 1642161900
KDB: stack backtrace:
#0 0x10327ae at kdb_backtrace+0x4e
#1 0xfed128 at vpanic+0x118
#2 0xfeda4 at panic+0x14
#3 0x12e5733 at vm_fault+0x2613
#4 0x12e3832 at vm_fault_trap+0x42
#5 0x154c0f5 at trap_pfault+0x115
#6 0x154b71f at trap+0x36f
#7 0xffc0319d at PTDpde-0x41a5
#8 0x18bbaa3 at _umtx_op_nwake_private+0x93
#9 0x154c7b9 at syscall+0×3e9
#10 0xffc033e7 at PTDpde+0x43ef
Uptime:3m20s


CPU: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (3093.04-MHz 686-class CPU)
  Origin="GenuineIntel"  Id=0x206a7  Family=0x6  Model=0x2a  Stepping=7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x1fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
  AMD Features=0x28100000<NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  XSAVE Features=0x1<XSAVEOPT>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
Comment 1 Leonid Nevecherya 2022-01-18 10:07:05 UTC
Created attachment 231117 [details]
coredump1
Comment 2 Leonid Nevecherya 2022-01-18 10:07:26 UTC
Created attachment 231118 [details]
coredump2
Comment 3 Leonid Nevecherya 2022-01-18 10:07:40 UTC
Created attachment 231119 [details]
coredump3
Comment 4 Leonid Nevecherya 2022-01-18 10:09:41 UTC
Created attachment 231120 [details]
coredump4
Comment 5 Leonid Nevecherya 2022-01-18 10:09:59 UTC
Created attachment 231121 [details]
coredump5
Comment 6 Leonid Nevecherya 2022-01-18 10:10:15 UTC
Created attachment 231122 [details]
coredump6
Comment 7 Leonid Nevecherya 2022-01-18 10:10:32 UTC
I upgraded 1 server from AMD64 12.3-RELEASE to AMD64 12.3-RELEASE-p1. This server is stable.
I also upgraded 2 servers from i386 12.2-RELEASE-p11 to i386 12.3-RELEASE-p1. These servers crash several times.
Comment 8 Richard Frewin 2022-01-20 11:06:52 UTC
Just a +1 for this.

Upgraded from kernel 12.2-RELEASE-p7 to 12.2-RELEASE-p12 with freebsd-update on several systems both amd64 and i386.  Some of the i386 systems now panic under disk load (backups!).  These are otherwise very lightly loaded web servers or firewalls.

Is this related to Bug 261338 ?

Example panics from three systems (cut-and-paste from dmesg) below:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0x1b2de880
frame pointer           = 0x28:0x1b2de8bc
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 836 (bacula-fd)
trap number             = 12
panic: page fault
cpuid = 0
time = 1642636089
KDB: stack backtrace:
#0 0x10327ae at kdb_backtrace+0x4e
#1 0xfed128 at vpanic+0x118
#2 0xfed004 at panic+0x14
#3 0x154bfd5 at trap_fatal+0x335
#4 0x154c013 at trap_pfault+0x33
#5 0x154b71f at trap+0x36f
#6 0xffc0319d at PTDpde+0x41a5
#7 0x1534905 at copyout+0xa5
#8 0x154d0f9 at uiomove_fromphys+0x159
#9 0x12c3483 at ffs_read+0x3d3
#10 0x157b77d at VOP_READ_APV+0x5d
#11 0x10b980a at vn_read+0x18a
#12 0x10b95da at vn_io_fault_doio+0x3a
#13 0x10b7232 at vn_io_fault1+0x162
#14 0x10b548b at vn_io_fault+0x1cb
#15 0x104c580 at dofileread+0x70
#16 0x104c1d8 at sys_read+0x78
#17 0x154c7b9 at syscall+0x3e9
Uptime: 23h37m45s
Physical memory: 2009 MB


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0x1afd6880
frame pointer           = 0x28:0x1afd68bc
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2134 (bacula-fd)
trap number             = 12
panic: page fault
cpuid = 0
time = 1642547014
KDB: stack backtrace:
#0 0x10327ae at kdb_backtrace+0x4e
#1 0xfed128 at vpanic+0x118
#2 0xfed004 at panic+0x14
#3 0x154bfd5 at trap_fatal+0x335
#4 0x154c013 at trap_pfault+0x33
#5 0x154b71f at trap+0x36f
#6 0xffc0319d at PTDpde+0x41a5
#7 0x1534905 at copyout+0xa5
#8 0x154d0f9 at uiomove_fromphys+0x159
#9 0x12c3483 at ffs_read+0x3d3
#10 0x157b77d at VOP_READ_APV+0x5d
#11 0x10b980a at vn_read+0x18a
#12 0x10b95da at vn_io_fault_doio+0x3a
#13 0x10b7232 at vn_io_fault1+0x162
#14 0x10b548b at vn_io_fault+0x1cb
#15 0x104c580 at dofileread+0x70
#16 0x104c1d8 at sys_read+0x78
#17 0x154c7b9 at syscall+0x3e9
Uptime: 1d2h4m47s


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0x1df3a8d4
frame pointer           = 0x28:0x1df3a90c
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 932 (bacula-fd)
trap number             = 12
panic: page fault
cpuid = 0
time = 1642636048
KDB: stack backtrace:
#0 0x10327ae at kdb_backtrace+0x4e
#1 0xfed128 at vpanic+0x118
#2 0xfed004 at panic+0x14
#3 0x154bfd5 at trap_fatal+0x335
#4 0x154c013 at trap_pfault+0x33
#5 0x154b71f at trap+0x36f
#6 0xffc0319d at PTDpde+0x41a5
#7 0x12c33ce at ffs_read+0x31e
#8 0x157b77d at VOP_READ_APV+0x5d
#9 0x10b980a at vn_read+0x18a
#10 0x10b95da at vn_io_fault_doio+0x3a
#11 0x10b7232 at vn_io_fault1+0x162
#12 0x10b548b at vn_io_fault+0x1cb
#13 0x104c580 at dofileread+0x70
#14 0x104c1d8 at sys_read+0x78
#15 0x154c7b9 at syscall+0x3e9
#16 0xffc033e7 at PTDpde+0x43ef
Uptime: 23h37m42s
Physical memory: 3026 MB
Comment 9 Yann C. 2022-01-20 18:54:25 UTC
Yes Richard, it seems like the other ticket is closely related to this one. I was not able to propose a kernel patch, but I will follow the other ticket.

Thanks for pointing it out.

Yann
Comment 10 Yann C. 2022-02-11 16:11:05 UTC
Updating to the latests 12.3 with the patch included in Bug 261338 fixes the issue. I'm going to cancel this ticket