Summary: | [zfs] [panic] ZFS kernel panic, solaris assert on dsl_prop_unregister | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | James TD Smith <ahktenzero+freebsd> | ||||||
Component: | kern | Assignee: | Andriy Gapon <avg> | ||||||
Status: | Closed FIXED | ||||||||
Severity: | Affects Only Me | CC: | avg, spork | ||||||
Priority: | Normal | ||||||||
Version: | Unspecified | ||||||||
Hardware: | Any | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
James TD Smith
2013-06-28 16:20:00 UTC
Responsible Changed From-To: freebsd-bugs->freebsd-zfs-devel Over to maintainer(s). Do you have mountd running ? I have a similiar configuration and the same problem with FreeBSD dsssrvt4.incore 8.4-STABLE FreeBSD 8.4-STABLE #0 r253040:amd64. and found the panic is caused by mountd. This daemon starts working on every successfull mount (sends signal 1 to mountd) and tries to delete all existing exports for all local file systems using the flag MNT_DELEXPORT. mountd calls nmount() and sometimes I see messages like mountd[56874]: can't delete exports for /home/poudriere/data/build/j8sp32-PT2/02/new_packages: Invalid argument and once a week I have a panic: panic: solaris assert: dsl_prop_unregister(ds, "aclinherit", acl_inherit_changed_cb, zfsvfs) == 0, file: /usr/src/sys/module s/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c, line: 1279 cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1ce assfail() at assfail+0x1a zfs_unregister_callbacks() at zfs_unregister_callbacks+0x1c8 zfsvfs_teardown() at zfsvfs_teardown+0x179 zfs_suspend_fs() at zfs_suspend_fs+0x1b zfs_ioc_rollback() at zfs_ioc_rollback+0x24 zfsdev_ioctl() at zfsdev_ioctl+0x57f devfs_ioctl_f() at devfs_ioctl_f+0x7b kern_ioctl() at kern_ioctl+0x102 ioctl() at ioctl+0xfd amd64_syscall() at amd64_syscall+0x1f4(kgdb) where(kgdb) where Xfast_syscall() at Xfast_syscall+0xfc --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x180fd0c1c, rsp = 0x7fffffffce48, rbp = 0x7fffffffcf50 --- KDB: enter: panic This process panics on dsl_prop_unregister of "aclinherit" during a rollback: 39024 38985 81834 0 R+ CPU 0 initial thread (kgdb) where #9 0xffffffff804328fb in kdb_enter (why=0xffffffff8069543a "panic", msg=0xa <Address 0xa out of bounds>) at cpufunc.h:63 #10 0xffffffff803ff367 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:616 #11 0xffffffff80d5712a in assfail (a=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at /usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 #12 0xffffffff80cbd0b8 in zfs_unregister_callbacks (zfsvfs=0xffffff01f5664000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1278 #13 0xffffffff80cbd319 in zfsvfs_teardown (zfsvfs=0xffffff01f5664000, unmounting=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1919 #14 0xffffffff80cbd42b in zfs_suspend_fs (zfsvfs=0xffffff01f5664000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:2225 #15 0xffffffff80cb0364 in zfs_ioc_rollback (zc=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:3565 #16 0xffffffff80cb4abf in zfsdev_ioctl (dev=<value optimized out>, zcmd=<value optimized out>, arg=0xffffff00720a81a0 "\003", flag=3,td=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:5936 #17 0xffffffff8037703b in devfs_ioctl_f (fp=0xffffff0008cce0f0, com=3222821401, data=<value optimized out>, cred=<value optimized out>, td=0xffffff02439d5000) at /usr/src/sys/fs/devfs/devfs_vnops.c:695 #18 0xffffffff80444902 in kern_ioctl (td=<value optimized out>, fd=<value optimized out>, com=3222821401, data=0xffffff00720a81a0 "\003") at file.h:277 #19 0xffffffff80444b3d in ioctl (td=0xffffff02439d5000, uap=0xffffff84859e1bb0) at /usr/src/sys/kern/sys_generic.c:679 #20 0xffffffff805db824 in amd64_syscall (td=0xffffff02439d5000, traced=0) at subr_syscall.c:114 #21 0xffffffff805c2e9c in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 This process has just done dsl_prop_register "aclinherit" and sleeps: 56874 1 56874 0 Ss zio->io_ 0xffffff02f41a8e48 mountd (kgdb) thread 304 [Switching to thread 304 (Thread 100357)]#0 sched_switch (td=0xffffff0013717000, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932 1932 cpuid = PCPU_GET(cpuid); (kgdb) where #0 sched_switch (td=0xffffff0013717000, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932 #1 0xffffffff80407d16 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:466 #2 0xffffffff8043e5c2 in sleepq_wait (wchan=0xffffff02c000ee48, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:613 #3 0xffffffff803b9971 in _cv_wait (cvp=0xffffff02c000ee48, lock=0xffffff02c000ee28) at /usr/src/sys/kern/kern_condvar.c:139 #4 0xffffffff80c9e9f1 in zio_wait (zio=0xffffff02c000eb10) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1332 #5 0xffffffff80c35588 in dbuf_read (db=0xffffff01f7dbca80, zio=0x0, flags=10) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:639 #6 0xffffffff80c3bcc0 in dmu_buf_hold (os=<value optimized out>, object=<value optimized out>, offset=<value optimized out>, tag=0x0, dbp=0xffffff8484d1b150, flags=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:151 #7 0xffffffff80c8f4f8 in zap_lockdir (os=0xffffff0013274000, obj=2558, tx=0x0, lti=RW_READER, fatreader=1, adding=0,zapp=0xffffff8484d1b1b0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:467 #8 0xffffffff80c90b15 in zap_lookup_norm (os=<value optimized out>, zapobj=<value optimized out>, name=0xffffffff80d31204 "aclinherit", integer_size=8, num_integers=1, buf=0xffffff8484d1b340, mt=MT_EXACT, realname=0x0, rn_len=0, ncp=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:788 #9 0xffffffff80c90cbe in zap_lookup (os=<value optimized out>, zapobj=<value optimized out>, name=<value optimized out>, integer_size=<value optimized out>, num_integers=<value optimized out>, buf=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:773 9 0xffffffff80c90cbe in zap_lookup (os=<value optimized out>, zapobj=<value optimized out>, name=<value optimized out>, integer_size=<value optimized out>, num_integers=<value optimized out>, buf=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:773 #10 0xffffffff80c5e6d0 in dsl_prop_get_dd (dd=0xffffff03d070c800, propname=0xffffffff80d31204 "aclinherit", intsz=8, numints=1, buf=0xffffff8484d1b340, setpoint=0x0, snapshot=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:107 #11 0xffffffff80c5e90b in dsl_prop_get_ds (ds=0xffffff0126912c00, propname=0xffffffff80d31204 "aclinherit", intsz=8, numints=1, buf=0xffffff8484d1b340, setpoint=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:216 #12 0xffffffff80c5f496 in dsl_prop_register (ds=0xffffff0126912c00, propname=0xffffffff80d31204 "aclinherit", callback=0xffffffff80cbca40 <acl_inherit_changed_cb>, cbarg=0xffffff01f5664000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:239 #13 0xffffffff80cbdd6b in zfs_register_callbacks (vfsp=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:518 #14 0xffffffff80cc0048 in zfs_mount (vfsp=0xffffff03e49398d0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1706 #15 0xffffffff8048d0a7 in vfs_donmount (td=<value optimized out>, fsflags=268505104, fsoptions=0xffffff03b8026c00) at /usr/src/sys/kern/vfs_mount.c:1011 #16 0xffffffff8048dc53 in nmount (td=0xffffff0013717000, uap=0xffffff8484d1bbb0) at /usr/src/sys/kern/vfs_mount.c:447 #17 0xffffffff805db824 in amd64_syscall (td=0xffffff0013717000, traced=0) at subr_syscall.c:114 #18 0xffffffff805c2e9c in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:387 (kgdb) f 15 #15 0xffffffff8048d0a7 in vfs_donmount (td=<value optimized out>, fsflags=268505104, fsoptions=0xffffff03b8026c00) at /usr/src/sys/kern/vfs_mount.c:1011 1011 error = VFS_MOUNT(mp); (kgdb) info local optlist = (struct vfsoptlist *) 0xffffff008edc20b0 opt = <value optimized out> tmp_opt = <value optimized out> fstype = 0xffffff008eda9050 "zfs" fspath = 0xffffff03011760c0 "/home/poudriere/data/build/j8sp32-PT2/01" errmsg = 0xffffff03abfae800 "" error = <value optimized out> fstypelen = 4 fspathlen = 41 errmsg_len = 255 errmsg_pos = 5 has_rw = 0 has_noro = 0 I can give more information from ddb output or kernel dump. -- Andreas Longwitz On 2013-07-25 23:52:54(+0200), Andreas Longwitz wrote: > Do you have mountd running ? Yes. > I have a similiar configuration and the same problem with > > FreeBSD dsssrvt4.incore 8.4-STABLE FreeBSD 8.4-STABLE #0 r253040:amd64. > > and found the panic is caused by mountd. This daemon starts working on > every successfull mount (sends signal 1 to mountd) and tries to delete > all existing exports for all local file systems using the flag > MNT_DELEXPORT. mountd calls nmount() and sometimes I see messages like > > mountd[56874]: can't delete exports for > /home/poudriere/data/build/j8sp32-PT2/02/new_packages: Invalid argument > > and once a week I have a panic: I think I also had errors from mountd around the time of the panics but I haven't had one for a while (mainly due to reducing the frequency of poudriere runs) so I can't be sure. I will try disabling mountd and see if that eliminates the problem, all my NFS clients are using NFSv4 so I don't need it running. James -- }-<James TD Smith>-<email/ahktenzero@mohorovi.cc>-{ In the meantime I did some more analysis of the problem and can explain why the panic happens. Two threads are involved in the problem, one runs the zfs command and wants to do a rollback using an ioctl() on /dev/zfs, the other runs mountd and tries to do an "unmount exports" with "export.ex_flags = MNT_DELEXPORT". Both threads are working on the same dataset ds=0xffffff0126912c00. The rollback thread wants to unregister his aclinherit property and panics, because this property does not exist in the list of properties anymore: (kgdb) f 12 #12 0xffffffff80cbd0b8 in zfs_unregister_callbacks (zfsvfs=0xffffff01f5664000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1278 1278 VERIFY(dsl_prop_unregister(ds, "aclinherit", (kgdb) l 1273 zfsvfs) == 0); 1274 1275 VERIFY(dsl_prop_unregister(ds, "aclmode", acl_mode_changed_cb, 1276 zfsvfs) == 0); 1277 1278 VERIFY(dsl_prop_unregister(ds, "aclinherit", 1279 acl_inherit_changed_cb, zfsvfs) == 0); 1280 1281 VERIFY(dsl_prop_unregister(ds, "vscan", 1282 vscan_changed_cb, zfsvfs) == 0); The mountd thread wants to register some properties, but he first unregisters everything (see comment in the source): (kgdb) f 14 #14 0xffffffff80cc0048 in zfs_mount (vfsp=0xffffff03e49398d0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1706 1706 error = zfs_register_callbacks(vfsp); (kgdb) list 1700 * When doing a remount, we simply refresh our temporary properties 1701 * according to those options set in the current VFS options. 1702 */ 1703 if (vfsp->vfs_flag & MS_REMOUNT) { 1704 /* refresh mount options */ 1705 zfs_unregister_callbacks(vfsp->vfs_data); 1706 error = zfs_register_callbacks(vfsp); 1707 goto out; 1708 } 1709 There is a little time gap between line 1705 and 1706 where the property aclinherit is not registered. If another thread tries to unregister this property during this gap he will panic. I don't know how to fix this proper. Simple to remove the VERIFY on dsl_prop_unregister() is easy, but I hope that one of the ZFS gurus will have a look at this and we will get a better solution. -- Andreas Longwitz Please see the following suggestion: http://article.gmane.org/gmane.os.illumos.zfs/2833 Will you be able to work on a patch that implements the idea? -- Andriy Gapon Andriy Gapon wrote
> Please see the following suggestion:
> http://article.gmane.org/gmane.os.illumos.zfs/2833
> Will you be able to work on a patch that implements the idea?
Sorry, but that is out of my scope.
Andreas Longwitz
Created attachment 157329 [details]
console screenshot of panic
Created attachment 157799 [details] proposed patch for testing commit 7fb40408aea1be27984cc25dd80e707c9dbc2f05 Author: Andriy Gapon <avg@icyb.net.ua> If anyone is still running into the issue, could you please test the attached patch? zfs_mount(MS_REMOUNT): protect zfs_(un)register_callbacks calls We now take z_teardown_lock as a writer to ensure that there is no I/O while the filesystem state is in a flux. Also, zfs_suspend_fs() -> zfsvfs_teardown() call zfs_unregister_callbacks() and zfs_resume_fs() -> zfsvfs_setup() call zfs_unregister_callbacks(). Previously there was no synchronization between those calls and the calls in the re-mounting case. That could lead to concurrent execution and a crash. PR: 180060 Suggested by: mahrens See review D2865 A commit references this bug: Author: avg Date: Thu Jul 2 08:32:03 UTC 2015 New revision: 285021 URL: https://svnweb.freebsd.org/changeset/base/285021 Log: zfs_mount(MS_REMOUNT): protect zfs_(un)register_callbacks calls We now take z_teardown_lock as a writer to ensure that there is no I/O while the filesystem state is in a flux. Also, zfs_suspend_fs() -> zfsvfs_teardown() call zfs_unregister_callbacks() and zfs_resume_fs() -> zfsvfs_setup() call zfs_unregister_callbacks(). Previously there was no synchronization between those calls and the calls in the re-mounting case. That could lead to concurrent execution and a crash. PR: 180060 Differential Revision: https://reviews.freebsd.org/D2865 Suggested by: mahrens Reviewed by: delphij, pho, mahrens, will MFC after: 13 days Sponsored by: ClusterHQ Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c Should be fixed now. |