260884 – [zfs] Panic in zfs_onexit_destroy [fix available]

Bug 260884 - [zfs] Panic in zfs_onexit_destroy [fix available]

Summary: [zfs] Panic in zfs_onexit_destroy [fix available]

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	13.0-RELEASE
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	freebsd-bugs (Nobody)

URL:	https://www.freebsd.org/security/advi...
Keywords:

Depends on:
Blocks:

Reported:	2022-01-02 17:25 UTC by Michael Gmelin
Modified:	2022-06-08 08:05 UTC (History)
CC List:	6 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Gmelin freebsd_committer

2022-01-02 17:25:37 UTC

I see this problem on multiple hosts running a couple of ZFS clone based jails (orchestrated by nomad/pot). As pot calls `zfs list` once per second per running jail, this adds up to 10-30 calls to `zfs list` per second per node. After a few days, all hosts consistently crash with a panic, which seems to happen while calling `zfs`. This looks a lot like this bug reported in TrueNAS: https://jira.ixsystems.com/browse/NAS-108891

It seems like the underlying locking problem was already fixed in OpenZFS upstream, but FreeBSD 13.0-RELEASE is using an older version. As far as I can see it, would be very easy to apply the fix from here to resolve a potential errata and create 13.0-RELEASE-p6 from that: https://github.com/openzfs/zfs/commit/f845b2dd1c60

You can find more context about my use case here: https://github.com/pizzamig/pot/issues/195

Crashinfo output:

```
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80bffeca
stack pointer           = 0x28:0xfffffe01e0bd5820
frame pointer           = 0x28:0xfffffe01e0bd5830
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 91596 (zfs)
trap number             = 12
panic: page fault
cpuid = 3
time = 1641116990
KDB: stack backtrace:
#0 0xffffffff80c40295 at kdb_backtrace+0x65
#1 0xffffffff80bf5d91 at vpanic+0x181
#2 0xffffffff80bf5b63 at panic+0x43
#3 0xffffffff810878f7 at trap_fatal+0x387
#4 0xffffffff81087966 at trap_pfault+0x66
#5 0xffffffff81086f8b at trap+0x2ab
#6 0xffffffff8105b808 at calltrap+0x8
#7 0xffffffff822cabb0 at zfs_onexit_destroy+0x20
#8 0xffffffff82146768 at zfsdev_close+0x58
#9 0xffffffff80a98347 at devfs_destroy_cdevpriv+0x97
#10 0xffffffff80a9bf64 at devfs_close_f+0x64
#11 0xffffffff80b98d2b at _fdrop+0x1b
#12 0xffffffff80b9c5e9 at closef+0x1d9
#13 0xffffffff80ba0697 at closefp_impl+0x77
#15 0xffffffff8105c12e at fast_syscall_common+0xf8
Uptime: 3d16h29m24s
Dumping 7555 out of 65271 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80bf59bb in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:486
#3  0xffffffff80bf5e00 in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:919
#4  0xffffffff80bf5b63 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:843
#5  0xffffffff810878f7 in trap_fatal (frame=0xfffffe01e0bd5760, eva=24)
    at /usr/src/sys/amd64/amd64/trap.c:915
#6  0xffffffff81087966 in trap_pfault (frame=frame@entry=0xfffffe01e0bd5760, 
    usermode=false, signo=<optimized out>, signo@entry=0x0, 
    ucode=<optimized out>, ucode@entry=0x0)
    at /usr/src/sys/amd64/amd64/trap.c:732
#7  0xffffffff81086f8b in trap (frame=0xfffffe01e0bd5760)
    at /usr/src/sys/amd64/amd64/trap.c:398
#8  <signal handler called>
#9  _sx_xlock (sx=0x0, opts=opts@entry=0, 
    file=0xffffffff8239be7a "/usr/src/sys/contrib/openzfs/module/zfs/zfs_onexit.c", line=line@entry=89) at /usr/src/sys/kern/kern_sx.c:325
#10 0xffffffff822cabb0 in zfs_onexit_destroy (zo=0x0)
    at /usr/src/sys/contrib/openzfs/module/zfs/zfs_onexit.c:89
#11 0xffffffff82146768 in zfsdev_close (data=0xfffff8000822c700)
    at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:197
#12 0xffffffff80a98347 in devfs_destroy_cdevpriv (p=0xfffff8051eff9b40)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:197
#13 0xffffffff80a9bf64 in devfs_fpdrop (fp=0xfffff807882306e0)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:211
#14 devfs_close_f (fp=0xfffff807882306e0, td=<optimized out>)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:787
#15 0xffffffff80b98d2b in fo_close (fp=0xfffff807882306e0, 
    td=0xfffffe01e6a02300) at /usr/src/sys/sys/file.h:377
#16 _fdrop (fp=fp@entry=0xfffff807882306e0, td=td@entry=0xfffffe01e6a02300)
    at /usr/src/sys/kern/kern_descrip.c:3510
#17 0xffffffff80b9c5e9 in closef (fp=fp@entry=0xfffff807882306e0, 
    td=td@entry=0xfffffe01e6a02300) at /usr/src/sys/kern/kern_descrip.c:2828
#18 0xffffffff80ba0697 in closefp_impl (fdp=0xfffffe01ef4134f0, fd=5, 
    fp=0xfffff807882306e0, td=0xfffffe01e6a02300, audit=true)
    at /usr/src/sys/kern/kern_descrip.c:1271
#19 0xffffffff8108827e in syscallenter (td=<optimized out>)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
#20 amd64_syscall (td=0xfffffe01e6a02300, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:1156
#21 <signal handler called>
#22 0x00000008007bb40a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffe9c8
(kgdb) 
```

Comment 1 Michael Gmelin freebsd_committer

2022-01-02 20:41:56 UTC

I found a way to reproduce the panic within seconds:

```
$ cat >crashme.c<<EOF
#include <unistd.h>
#include <sys/stdtypes.h>
#include <libzfs_core.h>

int main(int argc, char** argv)
{
  fork(); fork(); fork(); fork();
  for (int i=0; i<1000000; ++i) {
    libzfs_core_init();
    lzc_exists(argc >= 2 ? argv[1] : "zroot");
    libzfs_core_fini();
  }
}
EOF

$ cc \
  -I/usr/src/sys/contrib/openzfs/include \
  -I/usr/src/sys/contrib/openzfs/lib/libspl/include \
  -lzfs_core -lzfs -o crashme crashme.c

$ ./crashme zroot
```

This doesn't require root privileges.

Applying the patch mentioned above fixes the problem:

```
# cd /usr/src/sys/contrib/openzfs
# fetch -o - \
  https://github.com/openzfs/zfs/commit/f845b2dd1c60.diff | patch -p1
# cd /usr/src
# make -j8 kernel
# reboot
...
$ ./crashme zroot && echo "I'm ok"
I'm ok
$ 
```

Given that this can be triggered by two unfortunately timed `zfs list` calls and that it actually happens in practice (like in my example, where I would see my hosts crash every few hours/days), I would like us to import this fix to release/13.0 and create an errata notice.

Comment 2 Ed Maste freebsd_committer

2022-01-02 20:55:15 UTC

Is the fix already in head / stable/13?
If not that's the first step to getting this into an errata.
If it is, could you add a reference to the associated commit hash(es)?

Comment 3 Michael Gmelin freebsd_committer

2022-01-02 21:18:31 UTC

(In reply to Ed Maste from comment #2)

The fix was imported into stable/13.0 back in March as part of a larger commit:

zfs: merge OpenZFS master-9305ff2ed
https://cgit.freebsd.org/src/commit/?h=stable/13&id=9db44a8e

This pulls in a lot of changes, but lists them separately, the one in question is: #11720 FreeBSD: Clean up zfsdev_close to match Linux
(which refers to https://github.com/openzfs/zfs/pull/11720), which is the patch I tested above (https://patch-diff.githubusercontent.com/raw/openzfs/zfs/pull/11720.diff == https://github.com/openzfs/zfs/commit/f845b2dd.diff).

To only get these changes from our source tree, look at:

A. https://cgit.freebsd.org/src/commit/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c?id=9db44a8e
B. https://cgit.freebsd.org/src/commit/sys/contrib/openzfs/include/sys/zfs_ioctl.h?id=9db44a8e

Pulling everything from 9db44a8e would basically be an OpenZFS update, but from what I can tell, just pulling in A. and B. from above is enough to correct the problem at hand and should work ok in isolation (second pair of eyes won't hurt).

Comment 4 Michael Gmelin freebsd_committer

2022-01-03 16:34:28 UTC

(In reply to Ed Maste from comment #2)

Cherry-picking the change could be done this way:

    cd $(git rev-parse --show-toplevel)
    git pull
    git checkout releng/13.0
    git pull
    git cherry-pick -n -m1 -X theirs -X subtree=sys/contrib/openzfs 9db44a8e
    git reset HEAD
    git add sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c
    git add sys/contrib/openzfs/include/sys/zfs_ioctl.h
    git checkout .
    git clean -fd
    git status
    git diff --staged
    # inspect changes
    git commit

Comment 5 Michael Gmelin freebsd_committer

2022-01-11 10:27:52 UTC

(In reply to Ed Maste from comment #2)

Hi Ed,

Is there anything else and should do or provide at this point?

Comment 6 Mark Johnston freebsd_committer

2022-01-31 15:33:18 UTC

(In reply to Michael Gmelin from comment #5)
Sorry for the delay.  I've queued this up for our next EN release.

Comment 7 Michael Gmelin freebsd_committer

2022-06-08 08:03:41 UTC

This was addressed as part of FreeBSD-EN-22:12.zfs[0]

[0]https://www.freebsd.org/security/advisories/FreeBSD-EN-22:12.zfs.asc

Comment 8 Michael Gmelin freebsd_committer

2022-06-08 08:05:06 UTC

(In reply to Michael Gmelin from comment #7)

> This was addressed as part of FreeBSD-EN-22:12.zfs[0]

Which is part of 13.0-RELEASE-p8