Bug 268333 - reboot: clearing /tmp: kernel panic: ZFS ; VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 97)
Summary: reboot: clearing /tmp: kernel panic: ZFS ; VERIFY3(0 == zap_add_int(zfsvfs->z...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL: https://codeberg.org/FreeBSD/freebsd-...
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2022-12-12 15:05 UTC by martin
Modified: 2023-02-08 02:53 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description martin 2022-12-12 15:05:50 UTC
FreeBSD 13.1, ZFS on geli, amd64: 

# freebsd-version -kru
13.1-RELEASE-p3
13.1-RELEASE-p3
13.1-RELEASE-p5

System panics on

<118>Clearing /tmp.
panic: VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 97)

With the stack trace:

cpuid = 2
time = 1670833309
KDB: stack backtrace:
#0 0xffffffff80c694a5 at kdb_backtrace+0x65
#1 0xffffffff80c1bb5f at vpanic+0x17f
#2 0xffffffff82177f2a at spl_panic+0x3a
#3 0xffffffff8218b923 at zfs_link_destroy+0x433
#4 0xffffffff82194a81 at zfs_rmdir_+0x1a1
#5 0xffffffff8117d9a7 at VOP_RMDIR_APV+0x27
#6 0xffffffff80d0ac33 at kern_frmdirat+0x343
#7 0xffffffff810b06ec at amd64_syscall+0x10c
#8 0xffffffff81087ecb at fast_syscall_common+0xf8
Uptime: 11s


__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
=> 0xffffffff80c1b91e <doadump+46>:	65 48 8b 04 25 00 00 00 00	mov    rax,QWORD PTR gs:0x0
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c1b75c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff80c1bbce in vpanic (fmt=0xffffffff823dd5c6 "VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == %lld)\n", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff82177f2a in spl_panic (file=<optimized out>, func=<optimized out>, line=<unavailable>, fmt=<unavailable>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_misc.c:107
#5  0xffffffff8218b923 in zfs_unlinked_add (zp=0xfffff80041109b10, tx=0xfffff80021e99600) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c:278
#6  zfs_link_destroy (dzp=dzp@entry=0xfffff8000ac03760, name=<optimized out>, name@entry=0xfffff800212d5c07 ".font-unix", zp=zp@entry=0xfffff80041109b10, tx=tx@entry=0xfffff80021e99600, flag=flag@entry=2, unlinkedp=unlinkedp@entry=0x0) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c:785
#7  0xffffffff82194a81 in zfs_rmdir_ (dvp=0xfffff8000aa40d58, vp=0xfffff800414d7988, name=0xfffff800212d5c07 ".font-unix", cr=<optimized out>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:1611
#8  0xffffffff8117d9a7 in VOP_RMDIR_APV (vop=0xffffffff82457340 <zfs_vnodeops>, a=a@entry=0xfffffe00c452bca8) at vnode_if.c:1804
#9  0xffffffff80d0ac33 in VOP_RMDIR (dvp=<unavailable>, vp=<unavailable>, cnp=0xfffffe00c452bd48) at ./vnode_if.h:910
#10 kern_frmdirat (td=0xfffffe00c5f11560, dfd=-100, path=0x801831000 <error: Cannot access memory at address 0x801831000>, fd=<optimized out>, pathseg=UIO_USERSPACE, flag=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:3939
#11 0xffffffff810b06ec in syscallenter (td=0xfffffe00c5f11560) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
#12 amd64_syscall (td=0xfffffe00c5f11560, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1185
#13 <signal handler called>
Comment 1 martin 2022-12-12 19:42:52 UTC
It seems bug is very similar to this KB:

https://support.oracle.com/knowledge/Sun%20Microsystems/2421977_1.html

(MOS is needed to read it).
Comment 2 Graham Perrin freebsd_committer freebsd_triage 2022-12-13 02:11:46 UTC
> zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx

Google finds this, from OpenZFS on OS X in 2015, closed but non-conclusive: 

<https://github.com/openzfsonosx/zfs/issues/313#issuecomment-102804769>

* a backtrace, but no address-to-symbol translation

<https://github.com/openzfsonosx/zfs/issues/313#issuecomment-102834366>

* two more panics, no backtraces.

Whilst the matches are remarkable (I'll add the issue (see also)), I doubt that it helps to progress things here. 

----

martin, when was the pool last scrubbed (and free from error at the finish)? 

I assume that boot following the panic did succeed. Can you now scrub the affected pool? 

zdb(8) might be more useful, but a scrub should be simple enough for starters. 

<https://openzfs.github.io/openzfs-docs/man/8/zdb.8.html>

Is the data backed up?

Can you describe the hardware? The storage media, in particular (HDD, SSHD, or solid state; and so on).


(In reply to martin from comment #0)

> during boot

Given lines #0–#13 (below the nine-line stack backtrace), I'll tentatively edit the summary line here. 

How exactly did you perform the restart? Through the GUI of a desktop environment, or at the command line? 

/usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c:278 in this case (kernel 13.1-RELEASE-p3) is <https://cgit.freebsd.org/src/tree/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c?h=releng%2F13.1#n278> (in the midst of <https://github.com/freebsd/freebsd-src/blob/releng/13.1/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c#L256-L281> | <https://codeberg.org/FreeBSD/freebsd-src/src/branch/releng/13.1/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c#L256-L281>).
Comment 3 Graham Perrin freebsd_committer freebsd_triage 2022-12-13 02:21:04 UTC
Suggested attention to hardware (comment #2) is partly based on <https://codeberg.org/FreeBSD/freebsd-src/src/branch/releng/13.1/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c#L266-L267>; the (remote) possibility of assertion due to an I/O error.
Comment 4 Graham Perrin freebsd_committer freebsd_triage 2022-12-13 03:12:10 UTC
> … assertion due to an I/O error.

Sorry, that was poorly paraphrased. The original lines (L266-L267): 

>>  * Therefore it is remotely possible for some of the assertions
>>  * regarding the unlinked set below to fail due to i/o error.  On a


Re: comment #2, <https://github.com/openzfsonosx/zfs/issues/313#issuecomment-1347666550> today describes the 2015 change to OpenZFS on OS X: 

> Looks like we never figured out how it can happen, and simply made it 
> a soft-panic: …

– with reference to a commit.
Comment 5 martin 2022-12-13 10:16:47 UTC
Yeah, I googled that. And even Oracle docs. But..

> Looks like we never figured out how it can happen, and simply made it 
> a soft-panic: …

I'd assume people want to hunt this bug down, hence the opened PR.

Pool was scrubbed, it didn't help. As it was the rpool/tmp dataset that has this corruption I was able to disable it (set mountpoint to none) and system booted up. 

According to docs I found if I can't fix the issue with scrub the whole pool should be restored, i.e. don't trust the pool if the dataset is in this state.