| Summary: | Page fault in zfsctl_snapdir_getattr | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Alan Somers <asomers> |
| Component: | kern | Assignee: | Alan Somers <asomers> |
| Status: | Closed FIXED | ||
| Severity: | Affects Many People | CC: | avg |
| Priority: | --- | ||
| Version: | 12.1-STABLE | ||
| Hardware: | Any | ||
| OS: | Any | ||
Just a note that in this case VOP_GETATTR seems to be called from VOP_VPTOCNP. Alan, I think that in zfsctl_snapdir_getattr
dmu_objset_ds(zfsvfs->z_os)
call should be moved from the initialization section to the section protected by ZFS_ENTER.
Good guess, Andriy. Next question: do you have any bright ideas about how to reproduce the bug faster? (In reply to Alan Somers from comment #3) As it happens because of a race, I do not see any other way as exercising the race over and over again. Maybe this will work. stat(1) .zfs/snapshot in a loop while in another loop do zfs rollback of the same filesystem. A commit references this bug: Author: asomers Date: Thu Jul 2 13:17:32 UTC 2020 New revision: 362891 URL: https://svnweb.freebsd.org/changeset/base/362891 Log: Fix page fault in zfsctl_snapdir_getattr Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I can't reproduce this panic on demand, but this looks like the correct solution. PR: 247668 Reviewed by: avg MFC after: 2 weeks Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D25543 Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c A commit references this bug: Author: asomers Date: Fri Jul 24 17:45:06 UTC 2020 New revision: 363484 URL: https://svnweb.freebsd.org/changeset/base/363484 Log: MFC r362891: Fix page fault in zfsctl_snapdir_getattr Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I can't reproduce this panic on demand, but this looks like the correct solution. PR: 247668 Reviewed by: avg Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D25543 Changes: _U stable/12/ stable/12/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c A commit references this bug: Author: asomers Date: Fri Jul 24 17:56:18 UTC 2020 New revision: 363485 URL: https://svnweb.freebsd.org/changeset/base/363485 Log: MFC r362891: Fix page fault in zfsctl_snapdir_getattr Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I can't reproduce this panic on demand, but this looks like the correct solution. PR: 247668 Reviewed by: avg Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D25543 Changes: _U stable/11/ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c |
On a very heavily loaded server I observed the following kernel-mode page fault. The offending process was a "procstat -af", which did VOP_GETATTR on every open file descriptor on the whole system, including the .zfs/snapshot directories. On one of those, it called dsl_dataset_phys, which tried to dereference a null pointer. There were also 5 "zfs destroy" processes, and dozens of "zfs list" and "zfs recv" running concurrently. I suspect that zfsctl_snapdir_getattr is missing some lock when it checks dsl_dataset_phys, while trying to calculate the directory's nlink attribute. But it's not clear what lock it ought to hold. It's worth noting that ZoL doesn't have this problem because it doesn't even try to calculate nlink; instead it always returns "2". Sadly, I haven't been able to reproduce the issue on any non-production machine. The server in question is running 12-STABLE at svn r346022. #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff80bbe655 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff80bbea96 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:880 #4 0xffffffff80bbe8b3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:807 #5 0xffffffff81090310 in trap_fatal (frame=0xfffffe04b95c08a0, eva=24) at /usr/src/sys/amd64/amd64/trap.c:925 #6 0xffffffff8109035f in trap_pfault (frame=0xfffffe04b95c08a0, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:743 #7 0xffffffff8108f9b8 in trap (frame=0xfffffe04b95c08a0) at /usr/src/sys/amd64/amd64/trap.c:407 #8 <signal handler called> #9 0xffffffff825f4cbc in dsl_dataset_phys (ds=0xfffff86821e72e10) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:257 #10 zfsctl_snapdir_getattr (ap=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:1133 #11 0xffffffff81211315 in VOP_GETATTR_APV ( vop=0xffffffff826be060 <zfsctl_ops_snapdir>, a=0xfffffe04b95c0a98) at vnode_if.c:733 #12 0xffffffff80c7bd29 in VOP_GETATTR (vp=0x1, vap=<optimized out>, cred=0xfffff88e58a45700) at ./vnode_if.h:309 #13 vop_stdvptocnp (ap=<optimized out>) at /usr/src/sys/kern/vfs_default.c:743 #14 0xffffffff8121495b in VOP_VPTOCNP_APV ( vop=0xffffffff81b281b8 <default_vnodeops>, a=0xfffffe04b95c0d90) at vnode_if.c:3718 #15 0xffffffff80c78304 in VOP_VPTOCNP (vp=0x0, vpp=<optimized out>, cred=0xfffff88e58a45700, buf=0xfffff86ed5d7d400 "", buflen=0xfffffe04b95c0e34) at ./vnode_if.h:1599 #16 vn_vptocnp (vp=0xfffffe04b95c0e28, cred=<optimized out>, buf=<optimized out>, buflen=<optimized out>) at /usr/src/sys/kern/vfs_cache.c:2296 #17 0xffffffff80c77db7 in vn_fullpath1 (td=0xfffff865848d7000, vp=0xfffff80e4a8a53c0, rdir=0xfffff860440f0b40, buf=0xfffff86ed5d7d400 "", retbuf=0xfffffe04b95c0fa8, buflen=1023) at /usr/src/sys/kern/vfs_cache.c:2392 #18 0xffffffff80c780f8 in vn_fullpath (td=0xfffff865848d7000, vn=0xfffff80e4a8a53c0, retbuf=0xfffff865848d75a0, freebuf=0xfffffe04b95c0fb0) at /usr/src/sys/kern/vfs_cache.c:2221 #19 0xffffffff80ca0635 in vn_fill_kinfo_vnode (vp=0xfffff80e4a8a53c0, kif=0xfffff831bcf5e818) at /usr/src/sys/kern/vfs_vnops.c:2352 #20 0xffffffff80c9d3f6 in vn_fill_kinfo (fp=<optimized out>, kif=0xfffff831bcf5e818, fdp=<optimized out>) at /usr/src/sys/kern/vfs_vnops.c:2318 #21 0xffffffff80b6ca25 in fo_fill_kinfo (fp=<optimized out>, kif=<optimized out>, fdp=<optimized out>) at /usr/src/sys/sys/file.h:407 #22 export_file_to_kinfo (fp=<optimized out>, fd=<optimized out>, rightsp=<optimized out>, kif=<optimized out>, fdp=0xfffff86618252450, flags=1) at /usr/src/sys/kern/kern_descrip.c:3494 #23 export_file_to_sb (fp=0xfffff8210a788460, fd=4, rightsp=<optimized out>, efbuf=<optimized out>) at /usr/src/sys/kern/kern_descrip.c:3560 #24 kern_proc_filedesc_out (p=<optimized out>, sb=<optimized out>, maxlen=<optimized out>, flags=-1124734960) at /usr/src/sys/kern/kern_descrip.c:3667 #25 0xffffffff80b6dbbd in sysctl_kern_proc_filedesc (oidp=<optimized out>, arg1=0xfffffe04b95c12bc, arg2=<optimized out>, req=<optimized out>) at /usr/src/sys/kern/kern_descrip.c:3701 #26 0xffffffff80bcd639 in sysctl_root_handler_locked ( oid=0xffffffff81b0a760 <sysctl___kern_proc_filedesc>, arg1=0xfffffe04b95c12bc, arg2=1, req=0xfffffe04b95c11f0, tracker=0xfffffe04b95c1168) at /usr/src/sys/kern/kern_sysctl.c:166 #27 0xffffffff80bcccf9 in sysctl_root (oidp=<optimized out>, arg1=0xfffffe04b95c12bc, arg2=1, req=0xfffffe04b95c11f0) at /usr/src/sys/kern/kern_sysctl.c:2062 #28 0xffffffff80bcd368 in userland_sysctl (td=0xfffff865848d7000, name=0xfffffe04b95c12b0, namelen=4, old=<optimized out>, oldlenp=<optimized out>, inkernel=<optimized out>, new=0x0, newlen=0, retval=0xfffffe04b95c1318, flags=0) at /usr/src/sys/kern/kern_sysctl.c:2157 #29 0xffffffff80bcd1af in sys___sysctl (td=0xfffff865848d7000, uap=0xfffff865848d73c0) at /usr/src/sys/kern/kern_sysctl.c:2092 #30 0xffffffff81090e87 in syscallenter (td=0xfffff865848d7000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #31 amd64_syscall (td=0xfffff865848d7000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1168 #32 <signal handler called> #33 0x000000080045789a in ?? ()