On a recent -CURRENT I get a kernel panic on arm64 while running the installworld target, which panics the system (RPi4b) after a few seconds with the following, yet incomplete, stacktrace. getblk ffs_balloc_ufs2 ffs_write VOP_WRITE_APV vn_write vn_io_fault_doio vn_io_fault1 vn_io_fault dofilewrite kern_writev sys_write do_el0_sync handle_el0_sync I was unable to extract a crashdump due to no swap partition and netdump wasn't supported by the genet0 driver. The GENERIC kernel from 2nd of July is working without problems and a custom kernel with some extra TCP options from the 14th of July was also working flawlessly. So I would assume that somewhere in the timeframe from 14th of July to the 24th of July a bug was introduced.
Created attachment 216745 [details] stacktrace arm64 UFS panic
After some fiddling with the ddb I was able to get the panic message. getnewbuf_empty: Locked buf 0xffff0000406b1390 on free queue.
Created attachment 216747 [details] second, somewhat different stacktrace on arm64 with heavy UFS writes
hmm, I just hit this as well while running GELI tests on an arm64 platform. It should be a recent regression.
(In reply to Mark Johnston from comment #5) This happened on an NFS root, no UFS involved. So presumably this is a bug in the buffer cache or lockmgr.
I think the problem is in r363415. It converted some lockmgr code to use atomic_fcmpset instead of atomic_cmpset. The former can fail spuriously on LL/SC platforms, so a tryxlock operation can fail even when the buf is unlocked. lockmgr should take care to retry if fcmpset fails but returns the "expected" value.
A commit references this bug: Author: mjg Date: Fri Jul 24 17:28:24 UTC 2020 New revision: 363480 URL: https://svnweb.freebsd.org/changeset/base/363480 Log: lockmgr: add missing 'continue' to account for spuriously failed fcmpset PR: 248245 Reported by: gbe Noted by: markj Fixes by: r363415 ("lockmgr: add adaptive spinning") Changes: head/sys/kern/kern_lock.c
Please try on r363480. I think it's high time we get a debug version of the routine for amd64 which fails at random. I'll hack it up later.
(In reply to Mark Johnston from comment #6) This could be indeed NFS related. I have the following build setup for the RPi4: A writable NFS share is exported from a FreeBSD 12-STABLE VM and mounted on the RPI4: /tank/nfs_public. This share has the following subdirectories that are symlinked on the RPi4 for /usr/src. /tank/nfs_public/tiny/src /tank/nfs_public/tiny/obj The obj directory is changed via MAKEOBJDIRPREFIX to the NFS share. This is mostly done to save disk space and writes on the RPi4 SDCard.
(In reply to commit-hook from comment #8) I have a build running and report back, if this revision solves the issue.
(In reply to commit-hook from comment #8) After your last commit I was able to successful build and install a more then recent kernel and world via NFS on the RPi4b. Thank You! :)