Bug 256816

Summary: 13.0-RELEASE-p2 ufs2+sujournal. Fatal trap 12: page fault while in kernel mode
Product: Base System Reporter: Igor Valkov <viaprog>
Component: kernAssignee: freebsd-fs (Nobody) <fs>
Status: Open ---    
Severity: Affects Some People CC: chris, crest, grahamperrin, kwiat3k, mckusick
Priority: --- Keywords: crash
Version: 13.0-RELEASE   
Hardware: amd64   
OS: Any   

Description Igor Valkov 2021-06-24 18:22:40 UTC
Fatal trap 12: page fault while in kernel mode
cpuid = 13; apic id = 0d
fault virtual address   = 0x38
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff8089486a
stack pointer           = 0x28:0xfffffe024d7d65c0
frame pointer           = 0x28:0xfffffe024d7d6630
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 4676 (mv)
trap number             = 12
panic: page fault
cpuid = 13
time = 1624484382
KDB: stack backtrace:
#0 0xffffffff8067e6a5 at kdb_backtrace+0x65
#1 0xffffffff806314a1 at vpanic+0x181
#2 0xffffffff80631313 at panic+0x43
#3 0xffffffff80975007 at trap_fatal+0x387
#4 0xffffffff8097505f at trap_pfault+0x4f
#5 0xffffffff809746bd at trap+0x27d
#6 0xffffffff8094d838 at calltrap+0x8
#7 0xffffffff808b25ec at ufs_dirrewrite+0x14c
#8 0xffffffff808ba48b at ufs_rename+0x138b
#9 0xffffffff809a2da7 at VOP_RENAME_APV+0x27
#10 0xffffffff8071ece8 at kern_renameat+0x3a8
#11 0xffffffff8097590c at amd64_syscall+0x10c
#12 0xffffffff8094e15e at fast_syscall_common+0xf8
Uptime: 6h57m43s
Dumping 4661 out of 130941 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80631096 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486
#3  0xffffffff80631510 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:919
#4  0xffffffff80631313 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:843
#5  0xffffffff80975007 in trap_fatal (frame=0xfffffe024d7d6500, eva=56) at /usr/src/sys/amd64/amd64/trap.c:915
#6  0xffffffff8097505f in trap_pfault (frame=frame@entry=0xfffffe024d7d6500, usermode=false, signo=<optimized out>, signo@entry=0x0, ucode=<optimized out>, ucode@entry=0x0)
    at /usr/src/sys/amd64/amd64/trap.c:732
#7  0xffffffff809746bd in trap (frame=0xfffffe024d7d6500) at /usr/src/sys/amd64/amd64/trap.c:398
#8  <signal handler called>
#9  0xffffffff8089486a in softdep_setup_directory_change (bp=<optimized out>, dp=dp@entry=0xfffff801079a0aa0, ip=ip@entry=0xfffff80367c0cbe0, newinum=newinum@entry=18446744071772970742, 
    isrmdir=<optimized out>, isrmdir@entry=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:9798
#10 0xffffffff808b25ec in ufs_dirrewrite (dp=dp@entry=0xfffff801079a0aa0, oip=oip@entry=0xfffff80367c0cbe0, newinum=18446744071772970742, newinum@entry=2358386422, 
    newtype=newtype@entry=4, isrmdir=isrmdir@entry=0) at /usr/src/sys/ufs/ufs/ufs_lookup.c:1311
#11 0xffffffff808ba48b in ufs_rename (ap=<optimized out>) at /usr/src/sys/ufs/ufs/ufs_vnops.c:1650
#12 0xffffffff809a2da7 in VOP_RENAME_APV (vop=0xffffffff80cdbcd8 <ffs_vnodeops2>, a=a@entry=0xfffffe024d7d6a40) at vnode_if.c:1678
#13 0xffffffff8071ece8 in VOP_RENAME (fdvp=0xfffff810b76dc000, fvp=<optimized out>, fcnp=<optimized out>, tdvp=<optimized out>, tvp=<optimized out>, tcnp=<optimized out>)
    at ./vnode_if.h:863
#14 kern_renameat (td=0xfffffe00e21cae00, oldfd=-100, old=0x7fffffffeb18 <error: Cannot access memory at address 0x7fffffffeb18>, newfd=-100, 
    new=0x7fffffffeb59 <error: Cannot access memory at address 0x7fffffffeb59>, pathseg=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:3690
#15 0xffffffff8097590c in syscallenter (td=0xfffffe00e21cae00) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
#16 amd64_syscall (td=0xfffffe00e21cae00, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156
#17 <signal handler called>
#18 0x0000001ffff8012a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffd388
Comment 1 Graham Perrin freebsd_committer freebsd_triage 2021-06-25 06:58:55 UTC
(In reply to Igor A. Valkov from comment #0)

> …
> #7 0xffffffff808b25ec at ufs_dirrewrite+0x14c
> #8 0xffffffff808ba48b at ufs_rename+0x138b
> …
> Uptime: 6h57m43s
> …

Please, can you recall what was occurring around that time?
Comment 2 Igor Valkov 2021-06-26 06:28:45 UTC
(In reply to Graham Perrin from comment #1)

After extracting many small files immediately moving one of them. Like this

tar xfz many-small-and-medium-files.tar.gz && mv many-small-and-medium-files/one_file.txt .

Partition size is 60TB, ufs+sujournal
Comment 3 Kirk McKusick freebsd_committer freebsd_triage 2021-06-28 19:12:03 UTC
This report is a new panic that we have not seen before now.
To be able to debug it, we will need some way to reproduce it.
So, if you can come up with a script that triggers it, please let us know.

It would also be helpful to know if journaling is what triggers the bug,
so if you can reproduce it, try disabling journaling (tunefs -j disable /fs)
to see if that solves the problem. That would help us narrow down the
search space for the bug.
Comment 4 Igor Valkov 2022-02-04 22:01:18 UTC
Without softupdates journaling (tunefs -j disable) works fine for six months.

The problem arises, as I understand it, when we try to move files that are not yet committed to disk and are being processed by soft-updates.

It occurred to me when unpacking an archive with several tens of thousands of small files and immediately trying to move them somewhere.