Hi. I get random kernel panic on different servers after upgrading to FreeBSD 14. server1 Fatal trap 12: page fault while in kernel mode cpuid = 8; apic id = 08 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff82a80571 stack pointer = 0x28:0xfffffe018c3728c0 frame pointer = 0x28:0xfffffe018c372940 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 60 (txg_thread_enter) rdi: 0000000000000000 rsi: fffff801b4ff25f0 rdx: 0000000000000040 rcx: 0000000000000040 r8: 0000000000000000 r9: fffff80176965c00 rax: fffff80176965c00 rbx: fffff801b4ff25f0 rbp: fffffe018c372940 r10: 0000000000000000 r11: 0000000000000000 r12: fffff80024489000 r13: fffff80176965c00 r14: fffff801b4ff26d0 r15: fffff801b4ff2680 trap number = 12 panic: page fault cpuid = 8 time = 1719965340 KDB: stack backtrace: #0 0xffffffff80b9009d at kdb_backtrace+0x5d #1 0xffffffff80b431a2 at vpanic+0x132 #2 0xffffffff80b43063 at panic+0x43 #3 0xffffffff8100c85c at trap_fatal+0x40c #4 0xffffffff8100c8af at trap_pfault+0x4f #5 0xffffffff80fe3ac8 at calltrap+0x8 #6 0xffffffff82a972e9 at dbuf_dirty+0x269 #7 0xffffffff82aa4e19 at dmu_write+0x119 #8 0xffffffff82b335be at space_map_write+0x15e #9 0xffffffff82b00ee6 at metaslab_flush+0x236 #10 0xffffffff82b2aa18 at spa_flush_metaslabs+0x1c8 #11 0xffffffff82b1e391 at spa_sync+0xce1 #12 0xffffffff82b353ab at txg_sync_thread+0x26b #13 0xffffffff80afdb7f at fork_exit+0x7f #14 0xffffffff80fe4b2e at fork_trampoline+0xe Uptime: 1d15h17m18s --------------------------------------------------------------------------- server2 Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 06 fault virtual address = 0x30 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ffaec0 stack pointer = 0x28:0xfffffe013d903bc0 frame pointer = 0x28:0xfffffe013d903bf0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 89561 (sshd) rdi: 0000000000000002 rsi: 0000000000000008 rdx: 0000000000000198 rcx: 000000fffffffe00 r8: 000007fffffff000 r9: fffff801269c7398 rax: fffff803d874fd00 rbx: 0000000000000000 rbp: fffffe013d903bf0 r10: fffff803de201500 r11: fffff80000000000 r12: 00003f98edea1000 r13: 0000000000000000 r14: fffff803d874fd00 r15: fffffe013d903c48 trap number = 12 panic: page fault cpuid = 3 time = 1718738075 KDB: stack backtrace: #0 0xffffffff80b9009d at kdb_backtrace+0x5d #1 0xffffffff80b431a2 at vpanic+0x132 #2 0xffffffff80b43063 at panic+0x43 #3 0xffffffff8100c85c at trap_fatal+0x40c #4 0xffffffff8100c8af at trap_pfault+0x4f #5 0xffffffff80fe3ac8 at calltrap+0x8 #6 0xffffffff80ffa796 at pmap_copy+0x546 #7 0xffffffff80ebe604 at vmspace_fork+0xc84 #8 0xffffffff80afbcdf at fork1+0x54f #9 0xffffffff80afb764 at sys_fork+0x54 #10 0xffffffff8100d119 at amd64_syscall+0x109 #11 0xffffffff80fe43db at fast_syscall_common+0xf8 Uptime: 1d8h1m23s ---------------------------------------------------------------- server3 kgdb /boot/kernel/kernel /var/crash/vmcore.0 GNU gdb (GDB) 14.1 [GDB v14.1 for FreeBSD] Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 06 fault virtual address = 0xfffffa5541417730 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ff156c stack pointer = 0x28:0xfffffe0104b0aa70 frame pointer = 0x28:0xfffffe0104b0aaf0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 8 (dom0) rdi: fffff801c0000000 rsi: fffffa5541417730 rdx: 0000000000000042 rcx: 0000000000000730 r8: 000000000000005e r9: fffffa5541417000 rax: 0000000000000020 rbx: 0000000000000000 rbp: fffffe0104b0aaf0 r10: 000007fffffff000 r11: fffff80000000000 r12: fffff80015987130 r13: fffff80015987148 r14: fffff8081efb2508 r15: fffff80012fd5000 trap number = 12 panic: page fault cpuid = 3 time = 1720520306 KDB: stack backtrace: #0 0xffffffff80b7fbfd at kdb_backtrace+0x5d #1 0xffffffff80b32961 at vpanic+0x131 #2 0xffffffff80b32823 at panic+0x43 #3 0xffffffff80fff91b at trap_fatal+0x40b #4 0xffffffff80fff966 at trap_pfault+0x46 #5 0xffffffff80fd6a48 at calltrap+0x8 #6 0xffffffff80ec5e9b at vm_pageout_worker+0xb4b #7 0xffffffff80ec5307 at vm_pageout+0x1d7 #8 0xffffffff80aecd1f at fork_exit+0x7f #9 0xffffffff80fd7aae at fork_trampoline+0xe Uptime: 1d20h43m11s Dumping 2393 out of 32640 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_http.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug... Reading symbols from /boot/kernel/accf_data.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug... Reading symbols from /boot/kernel/accf_dns.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_dns.ko.debug... Reading symbols from /boot/kernel/acpi_wmi.ko... Reading symbols from /usr/lib/debug//boot/kernel/acpi_wmi.ko.debug... Reading symbols from /boot/kernel/ipfw.ko... Reading symbols from /usr/lib/debug//boot/kernel/ipfw.ko.debug... __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 warning: Source file is more recent than executable. 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff80b324f7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:523 #3 0xffffffff80b329ce in vpanic (fmt=0xffffffff8115edb8 "%s", ap=ap@entry=0xfffffe0104b0a8d0) at /usr/src/sys/kern/kern_shutdown.c:967 #4 0xffffffff80b32823 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:891 #5 0xffffffff80fff91b in trap_fatal (frame=0xfffffe0104b0a9b0, eva=18446737842806814512) at /usr/src/sys/amd64/amd64/trap.c:952 #6 0xffffffff80fff966 in trap_pfault (frame=<unavailable>, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760 #7 <signal handler called> #8 pmap_ts_referenced (m=m@entry=0xfffffe002fdf7950) at /usr/src/sys/amd64/amd64/pmap.c:9108 #9 0xffffffff80ec5e9b in vm_pageout_scan_active (vmd=0xffffffff81c03c80 <vm_dom>, page_shortage=-21422016) at /usr/src/sys/vm/vm_pageout.c:1274 #10 vm_pageout_worker (arg=arg@entry=0x0) at /usr/src/sys/vm/vm_pageout.c:2173 #11 0xffffffff80ec5307 in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:2395 #12 0xffffffff80aecd1f in fork_exit (callout=0xffffffff80ec5130 <vm_pageout>, arg=0x0, frame=0xfffffe0104b0af40) at /usr/src/sys/kern/kern_fork.c:1164 #13 <signal handler called> (kgdb) list *0xffffffff80ff156c 0xffffffff80ff156c is in pmap_ts_referenced (/usr/src/sys/amd64/amd64/pmap.c:9108). warning: Source file is more recent than executable. 9103 pde = pmap_pde(pmap, pv->pv_va); 9104 KASSERT((*pde & PG_PS) == 0, 9105 ("pmap_ts_referenced: found a 2mpage in page %p's pv list", 9106 m)); 9107 pte = pmap_pde_to_pte(pde, pv->pv_va); 9108 if ((*pte & (PG_M | PG_RW)) == (PG_M | PG_RW)) 9109 vm_page_dirty(m); 9110 if ((*pte & PG_A) != 0) { 9111 if (safe_to_clear_referenced(pmap, *pte)) { 9112 atomic_clear_long(pte, PG_A); Could you help with that?
Do the panics go away if you downgrade? These panics are all over the place, they might be the result of faulty RAM. A fault address of 0xfffffa5541417730 is particularly strange on amd64.
This happens on dozens of servers, I don't think that much memory was damaged after the update.
Yet another panic: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ff0066 stack pointer = 0x28:0xfffffe00fd374540 frame pointer = 0x28:0xfffffe00fd374590 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 16 (dom1) rdi: 0000000000000000 rsi: 0000000000000000 rdx: 0000000000000001 rcx: 0000000000000002 r8: 0000000000000f80 r9: fffff80000000000 rax: 0000000000000000 rbx: fffffe00147055c0 rbp: fffffe00fd374590 r10: fffff80009bc8740 r11: fffff80000000000 r12: 0000000000000042 r13: fffff803993a3d50 r14: fffff802a4f68a48 r15: fffff803993a3d38 trap number = 12 panic: page fault cpuid = 2 time = 1720527313 KDB: stack backtrace: #0 0xffffffff80b7fbfd at kdb_backtrace+0x5d #1 0xffffffff80b32961 at vpanic+0x131 #2 0xffffffff80b32823 at panic+0x43 #3 0xffffffff80fff91b at trap_fatal+0x40b #4 0xffffffff80fff966 at trap_pfault+0x46 #5 0xffffffff80fd6a48 at calltrap+0x8 #6 0xffffffff80ec3a94 at vm_page_test_dirty+0x14 #7 0xffffffff80ec7718 at vm_pageout_scan_inactive+0x498 #8 0xffffffff80ec58c4 at vm_pageout_worker+0x574 #9 0xffffffff80aecd1f at fork_exit+0x7f #10 0xffffffff80fd7aae at fork_trampoline+0xe Uptime: 1d22h4m50s Dumping 3207 out of 16256 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_dns.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_dns.ko.debug... Reading symbols from /boot/kernel/accf_http.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug... Reading symbols from /boot/kernel/accf_data.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug... Reading symbols from /boot/kernel/zfs.ko... Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug... Reading symbols from /boot/kernel/acpi_wmi.ko... Reading symbols from /usr/lib/debug//boot/kernel/acpi_wmi.ko.debug... Reading symbols from /boot/kernel/ipfw.ko... Reading symbols from /usr/lib/debug//boot/kernel/ipfw.ko.debug... __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 warning: Source file is more recent than executable. 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) list *0xffffffff80ff0066 0xffffffff80ff0066 is in pmap_page_test_mappings (/usr/src/sys/amd64/amd64/pmap.c:8734). warning: Source file is more recent than executable. 8729 mask |= PG_RW | PG_M; 8730 } 8731 if (accessed) { 8732 PG_A = pmap_accessed_bit(pmap); 8733 PG_V = pmap_valid_bit(pmap); 8734 mask |= PG_V | PG_A; 8735 } 8736 rv = (*pte & mask) == mask; 8737 PMAP_UNLOCK(pmap); 8738 if (rv) (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff80b324f7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:523 #3 0xffffffff80b329ce in vpanic (fmt=0xffffffff8115edb8 "%s", ap=ap@entry=0xfffffe00fd3743a0) at /usr/src/sys/kern/kern_shutdown.c:967 #4 0xffffffff80b32823 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:891 #5 0xffffffff80fff91b in trap_fatal (frame=0xfffffe00fd374480, eva=0) at /usr/src/sys/amd64/amd64/trap.c:952 #6 0xffffffff80fff966 in trap_pfault (frame=<unavailable>, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760 #7 <signal handler called> #8 0xffffffff80ff0066 in pmap_page_test_mappings (m=0xfffffe00147055c0, accessed=accessed@entry=0, modified=modified@entry=1) at /usr/src/sys/amd64/amd64/pmap.c:8734 #9 0xffffffff80fefda8 in pmap_is_modified (m=0x0) at /usr/src/sys/amd64/amd64/pmap.c:8798 #10 0xffffffff80ec3a94 in vm_page_test_dirty (m=0x0, m@entry=0xfffffe00147055c0) at /usr/src/sys/vm/vm_page.c:5516 #11 0xffffffff80ec7718 in vm_pageout_scan_inactive (vmd=vmd@entry=0xffffffff81c04300 <vm_dom+1664>, page_shortage=1856) at /usr/src/sys/vm/vm_pageout.c:1583 #12 0xffffffff80ec58c4 in vm_pageout_inactive_dispatch (vmd=0xffffffff81c04300 <vm_dom+1664>, shortage=1980) at /usr/src/sys/vm/vm_pageout.c:1673 #13 vm_pageout_inactive (vmd=0xffffffff81c04300 <vm_dom+1664>, shortage=<optimized out>, addl_shortage=<optimized out>) at /usr/src/sys/vm/vm_pageout.c:1722 #14 vm_pageout_worker (arg=arg@entry=0x1) at /usr/src/sys/vm/vm_pageout.c:2162 #15 0xffffffff80aecd1f in fork_exit (callout=0xffffffff80ec5350 <vm_pageout_worker>, arg=0x1, frame=0xfffffe00fd374f40) at /usr/src/sys/kern/kern_fork.c:1164 #16 <signal handler called> (kgdb)
Same server, but fresh dump: Fatal trap 12: page fault while in kernel mode cpuid = 10; apic id = 10 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ff0066 stack pointer = 0x28:0xfffffe00fd3e9540 frame pointer = 0x28:0xfffffe00fd3e9590 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 17 (dom1) rdi: 0000000000000000 rsi: 0000000000000000 rdx: 0000000000000001 rcx: 0000000000000002 r8: 0000000000000f10 r9: fffff80000000000 rax: 0000000000000000 rbx: fffffe0011734f20 rbp: fffffe00fd3e9590 r10: fffff80009b49740 r11: fffff80000000000 r12: 0000000000000042 r13: fffff802876023b0 r14: fffff803e06e0508 r15: fffff80287602398 trap number = 12 panic: page fault cpuid = 10 time = 1722494601 KDB: stack backtrace: #0 0xffffffff80b7fbfd at kdb_backtrace+0x5d #1 0xffffffff80b32961 at vpanic+0x131 #2 0xffffffff80b32823 at panic+0x43 #3 0xffffffff80fff91b at trap_fatal+0x40b #4 0xffffffff80fff966 at trap_pfault+0x46 #5 0xffffffff80fd6a48 at calltrap+0x8 #6 0xffffffff80ec3a94 at vm_page_test_dirty+0x14 #7 0xffffffff80ec7718 at vm_pageout_scan_inactive+0x498 #8 0xffffffff80ec58c4 at vm_pageout_worker+0x574 #9 0xffffffff80aecd1f at fork_exit+0x7f #10 0xffffffff80fd7aae at fork_trampoline+0xe Uptime: 22d18h25m39s Dumping 3549 out of 16250 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_dns.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_dns.ko.debug... Reading symbols from /boot/kernel/accf_data.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug... Reading symbols from /boot/kernel/zfs.ko... Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug... Reading symbols from /boot/kernel/accf_http.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug... Reading symbols from /boot/kernel/acpi_wmi.ko... Reading symbols from /usr/lib/debug//boot/kernel/acpi_wmi.ko.debug... Reading symbols from /boot/kernel/ipfw.ko... Reading symbols from /usr/lib/debug//boot/kernel/ipfw.ko.debug... __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 warning: Source file is more recent than executable. 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) list *0xffffffff80ff0066 0xffffffff80ff0066 is in pmap_page_test_mappings (/usr/src/sys/amd64/amd64/pmap.c:8734). warning: Source file is more recent than executable. 8729 mask |= PG_RW | PG_M; 8730 } 8731 if (accessed) { 8732 PG_A = pmap_accessed_bit(pmap); 8733 PG_V = pmap_valid_bit(pmap); 8734 mask |= PG_V | PG_A; 8735 } 8736 rv = (*pte & mask) == mask; 8737 PMAP_UNLOCK(pmap); 8738 if (rv) (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405 #2 0xffffffff80b324f7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:523 #3 0xffffffff80b329ce in vpanic (fmt=0xffffffff8115edb8 "%s", ap=ap@entry=0xfffffe00fd3e93a0) at /usr/src/sys/kern/kern_shutdown.c:967 #4 0xffffffff80b32823 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:891 #5 0xffffffff80fff91b in trap_fatal (frame=0xfffffe00fd3e9480, eva=0) at /usr/src/sys/amd64/amd64/trap.c:952 #6 0xffffffff80fff966 in trap_pfault (frame=<unavailable>, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760 #7 <signal handler called> #8 0xffffffff80ff0066 in pmap_page_test_mappings (m=0xfffffe0011734f20, accessed=accessed@entry=0, modified=modified@entry=1) at /usr/src/sys/amd64/amd64/pmap.c:8734 #9 0xffffffff80fefda8 in pmap_is_modified (m=0x0) at /usr/src/sys/amd64/amd64/pmap.c:8798 #10 0xffffffff80ec3a94 in vm_page_test_dirty (m=0x0, m@entry=0xfffffe0011734f20) at /usr/src/sys/vm/vm_page.c:5516 #11 0xffffffff80ec7718 in vm_pageout_scan_inactive (vmd=vmd@entry=0xffffffff81c04300 <vm_dom+1664>, page_shortage=250) at /usr/src/sys/vm/vm_pageout.c:1583 #12 0xffffffff80ec58c4 in vm_pageout_inactive_dispatch (vmd=0xffffffff81c04300 <vm_dom+1664>, shortage=818) at /usr/src/sys/vm/vm_pageout.c:1673 #13 vm_pageout_inactive (vmd=0xffffffff81c04300 <vm_dom+1664>, shortage=<optimized out>, addl_shortage=<optimized out>) at /usr/src/sys/vm/vm_pageout.c:1722 #14 vm_pageout_worker (arg=arg@entry=0x1) at /usr/src/sys/vm/vm_pageout.c:2162 #15 0xffffffff80aecd1f in fork_exit (callout=0xffffffff80ec5350 <vm_pageout_worker>, arg=0x1, frame=0xfffffe00fd3e9f40) at /usr/src/sys/kern/kern_fork.c:1164 #16 <signal handler called> #17 0xd4368bd7293a5080 in ?? () Backtrace stopped: Cannot access memory at address 0x11a719707dac43cc
(In reply to Kirill from comment #4) Are you able to test with a debug kernel built from the main branch?
Downgrading to FreeBSD 13.3 solves panic problem. I left a couple of servers for experimentation, will try them with a debug kernel later.
This seems to be a general software memory corruption issue. Sometimes files on ZFS get corrupted, after rebooting the server everything is fine.
(In reply to Kirill from comment #7) Did you have a chance to try a debug kernel? That will hopefully make it easier to see what's happening.