| Summary: | [zfs] [panic] ZFS kernel panic, solaris assert on dsl_prop_unregister | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | James TD Smith <ahktenzero+freebsd> | ||||||
| Component: | kern | Assignee: | Andriy Gapon <avg> | ||||||
| Status: | Closed FIXED | ||||||||
| Severity: | Affects Only Me | CC: | avg, spork | ||||||
| Priority: | Normal | ||||||||
| Version: | Unspecified | ||||||||
| Hardware: | Any | ||||||||
| OS: | Any | ||||||||
| Attachments: |
|
||||||||
|
Description
James TD Smith
2013-06-28 16:20:00 UTC
Responsible Changed From-To: freebsd-bugs->freebsd-zfs-devel Over to maintainer(s). Do you have mountd running ?
I have a similiar configuration and the same problem with
FreeBSD dsssrvt4.incore 8.4-STABLE FreeBSD 8.4-STABLE #0 r253040:amd64.
and found the panic is caused by mountd. This daemon starts working on
every successfull mount (sends signal 1 to mountd) and tries to delete
all existing exports for all local file systems using the flag
MNT_DELEXPORT. mountd calls nmount() and sometimes I see messages like
mountd[56874]: can't delete exports for
/home/poudriere/data/build/j8sp32-PT2/02/new_packages: Invalid argument
and once a week I have a panic:
panic: solaris assert: dsl_prop_unregister(ds, "aclinherit",
acl_inherit_changed_cb, zfsvfs) == 0, file: /usr/src/sys/module
s/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c,
line: 1279
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1ce
assfail() at assfail+0x1a
zfs_unregister_callbacks() at zfs_unregister_callbacks+0x1c8
zfsvfs_teardown() at zfsvfs_teardown+0x179
zfs_suspend_fs() at zfs_suspend_fs+0x1b
zfs_ioc_rollback() at zfs_ioc_rollback+0x24
zfsdev_ioctl() at zfsdev_ioctl+0x57f
devfs_ioctl_f() at devfs_ioctl_f+0x7b
kern_ioctl() at kern_ioctl+0x102
ioctl() at ioctl+0xfd
amd64_syscall() at amd64_syscall+0x1f4(kgdb) where(kgdb) where
Xfast_syscall() at Xfast_syscall+0xfc
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x180fd0c1c, rsp =
0x7fffffffce48, rbp = 0x7fffffffcf50 ---
KDB: enter: panic
This process panics on dsl_prop_unregister of "aclinherit" during a
rollback:
39024 38985 81834 0 R+ CPU 0 initial thread
(kgdb) where
#9 0xffffffff804328fb in kdb_enter (why=0xffffffff8069543a "panic",
msg=0xa <Address 0xa out of bounds>) at cpufunc.h:63
#10 0xffffffff803ff367 in panic (fmt=<value optimized out>) at
/usr/src/sys/kern/kern_shutdown.c:616
#11 0xffffffff80d5712a in assfail (a=<value optimized out>, f=<value
optimized out>, l=<value optimized out>) at
/usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
#12 0xffffffff80cbd0b8 in zfs_unregister_callbacks
(zfsvfs=0xffffff01f5664000) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1278
#13 0xffffffff80cbd319 in zfsvfs_teardown (zfsvfs=0xffffff01f5664000,
unmounting=0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1919
#14 0xffffffff80cbd42b in zfs_suspend_fs (zfsvfs=0xffffff01f5664000) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:2225
#15 0xffffffff80cb0364 in zfs_ioc_rollback (zc=<value optimized out>) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:3565
#16 0xffffffff80cb4abf in zfsdev_ioctl (dev=<value optimized out>,
zcmd=<value optimized out>, arg=0xffffff00720a81a0 "\003",
flag=3,td=<value optimized out>) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:5936
#17 0xffffffff8037703b in devfs_ioctl_f (fp=0xffffff0008cce0f0,
com=3222821401, data=<value optimized out>,
cred=<value optimized out>, td=0xffffff02439d5000) at
/usr/src/sys/fs/devfs/devfs_vnops.c:695
#18 0xffffffff80444902 in kern_ioctl (td=<value optimized out>,
fd=<value optimized out>, com=3222821401,
data=0xffffff00720a81a0 "\003") at file.h:277
#19 0xffffffff80444b3d in ioctl (td=0xffffff02439d5000,
uap=0xffffff84859e1bb0) at /usr/src/sys/kern/sys_generic.c:679
#20 0xffffffff805db824 in amd64_syscall (td=0xffffff02439d5000,
traced=0) at subr_syscall.c:114
#21 0xffffffff805c2e9c in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:387
This process has just done dsl_prop_register "aclinherit" and sleeps:
56874 1 56874 0 Ss zio->io_ 0xffffff02f41a8e48 mountd
(kgdb) thread 304
[Switching to thread 304 (Thread 100357)]#0 sched_switch
(td=0xffffff0013717000, newtd=<value optimized out>,
flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932
1932 cpuid = PCPU_GET(cpuid);
(kgdb) where
#0 sched_switch (td=0xffffff0013717000, newtd=<value optimized out>,
flags=<value optimized out>)
at /usr/src/sys/kern/sched_ule.c:1932
#1 0xffffffff80407d16 in mi_switch (flags=260, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:466
#2 0xffffffff8043e5c2 in sleepq_wait (wchan=0xffffff02c000ee48, pri=0)
at /usr/src/sys/kern/subr_sleepqueue.c:613
#3 0xffffffff803b9971 in _cv_wait (cvp=0xffffff02c000ee48,
lock=0xffffff02c000ee28) at /usr/src/sys/kern/kern_condvar.c:139
#4 0xffffffff80c9e9f1 in zio_wait (zio=0xffffff02c000eb10) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1332
#5 0xffffffff80c35588 in dbuf_read (db=0xffffff01f7dbca80, zio=0x0,
flags=10) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:639
#6 0xffffffff80c3bcc0 in dmu_buf_hold (os=<value optimized out>,
object=<value optimized out>, offset=<value optimized out>,
tag=0x0, dbp=0xffffff8484d1b150, flags=<value optimized out>) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:151
#7 0xffffffff80c8f4f8 in zap_lockdir (os=0xffffff0013274000, obj=2558,
tx=0x0, lti=RW_READER, fatreader=1, adding=0,zapp=0xffffff8484d1b1b0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:467
#8 0xffffffff80c90b15 in zap_lookup_norm (os=<value optimized out>,
zapobj=<value optimized out>, name=0xffffffff80d31204 "aclinherit",
integer_size=8, num_integers=1, buf=0xffffff8484d1b340, mt=MT_EXACT,
realname=0x0, rn_len=0, ncp=0x0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:788
#9 0xffffffff80c90cbe in zap_lookup (os=<value optimized out>,
zapobj=<value optimized out>, name=<value optimized out>,
integer_size=<value optimized out>, num_integers=<value optimized
out>, buf=<value optimized out>) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:773
9 0xffffffff80c90cbe in zap_lookup (os=<value optimized out>,
zapobj=<value optimized out>, name=<value optimized out>,
integer_size=<value optimized out>, num_integers=<value optimized
out>, buf=<value optimized out>)
at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:773
#10 0xffffffff80c5e6d0 in dsl_prop_get_dd (dd=0xffffff03d070c800,
propname=0xffffffff80d31204 "aclinherit", intsz=8, numints=1,
buf=0xffffff8484d1b340, setpoint=0x0, snapshot=0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:107
#11 0xffffffff80c5e90b in dsl_prop_get_ds (ds=0xffffff0126912c00,
propname=0xffffffff80d31204 "aclinherit", intsz=8, numints=1,
buf=0xffffff8484d1b340, setpoint=0x0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:216
#12 0xffffffff80c5f496 in dsl_prop_register (ds=0xffffff0126912c00,
propname=0xffffffff80d31204 "aclinherit", callback=0xffffffff80cbca40
<acl_inherit_changed_cb>, cbarg=0xffffff01f5664000) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_prop.c:239
#13 0xffffffff80cbdd6b in zfs_register_callbacks (vfsp=<value optimized
out>) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:518
#14 0xffffffff80cc0048 in zfs_mount (vfsp=0xffffff03e49398d0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1706
#15 0xffffffff8048d0a7 in vfs_donmount (td=<value optimized out>,
fsflags=268505104, fsoptions=0xffffff03b8026c00)
at /usr/src/sys/kern/vfs_mount.c:1011
#16 0xffffffff8048dc53 in nmount (td=0xffffff0013717000,
uap=0xffffff8484d1bbb0) at /usr/src/sys/kern/vfs_mount.c:447
#17 0xffffffff805db824 in amd64_syscall (td=0xffffff0013717000,
traced=0) at subr_syscall.c:114
#18 0xffffffff805c2e9c in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:387
(kgdb) f 15
#15 0xffffffff8048d0a7 in vfs_donmount (td=<value optimized out>,
fsflags=268505104, fsoptions=0xffffff03b8026c00)
at /usr/src/sys/kern/vfs_mount.c:1011
1011 error = VFS_MOUNT(mp);
(kgdb) info local
optlist = (struct vfsoptlist *) 0xffffff008edc20b0
opt = <value optimized out>
tmp_opt = <value optimized out>
fstype = 0xffffff008eda9050 "zfs"
fspath = 0xffffff03011760c0 "/home/poudriere/data/build/j8sp32-PT2/01"
errmsg = 0xffffff03abfae800 ""
error = <value optimized out>
fstypelen = 4
fspathlen = 41
errmsg_len = 255
errmsg_pos = 5
has_rw = 0
has_noro = 0
I can give more information from ddb output or kernel dump.
--
Andreas Longwitz
On 2013-07-25 23:52:54(+0200), Andreas Longwitz wrote: > Do you have mountd running ? Yes. > I have a similiar configuration and the same problem with > > FreeBSD dsssrvt4.incore 8.4-STABLE FreeBSD 8.4-STABLE #0 r253040:amd64. > > and found the panic is caused by mountd. This daemon starts working on > every successfull mount (sends signal 1 to mountd) and tries to delete > all existing exports for all local file systems using the flag > MNT_DELEXPORT. mountd calls nmount() and sometimes I see messages like > > mountd[56874]: can't delete exports for > /home/poudriere/data/build/j8sp32-PT2/02/new_packages: Invalid argument > > and once a week I have a panic: I think I also had errors from mountd around the time of the panics but I haven't had one for a while (mainly due to reducing the frequency of poudriere runs) so I can't be sure. I will try disabling mountd and see if that eliminates the problem, all my NFS clients are using NFSv4 so I don't need it running. James -- }-<James TD Smith>-<email/ahktenzero@mohorovi.cc>-{ In the meantime I did some more analysis of the problem and can explain
why the panic happens. Two threads are involved in the problem, one runs
the zfs command and wants to do a rollback using an ioctl() on /dev/zfs,
the other runs mountd and tries to do an "unmount exports" with
"export.ex_flags = MNT_DELEXPORT". Both threads are working on the same
dataset ds=0xffffff0126912c00.
The rollback thread wants to unregister his aclinherit property and
panics, because this property does not exist in the list of properties
anymore:
(kgdb) f 12
#12 0xffffffff80cbd0b8 in zfs_unregister_callbacks
(zfsvfs=0xffffff01f5664000) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1278
1278 VERIFY(dsl_prop_unregister(ds, "aclinherit",
(kgdb) l
1273 zfsvfs) == 0);
1274
1275 VERIFY(dsl_prop_unregister(ds, "aclmode", acl_mode_changed_cb,
1276 zfsvfs) == 0);
1277
1278 VERIFY(dsl_prop_unregister(ds, "aclinherit",
1279 acl_inherit_changed_cb, zfsvfs) == 0);
1280
1281 VERIFY(dsl_prop_unregister(ds, "vscan",
1282 vscan_changed_cb, zfsvfs) == 0);
The mountd thread wants to register some properties, but he first
unregisters everything (see comment in the source):
(kgdb) f 14
#14 0xffffffff80cc0048 in zfs_mount (vfsp=0xffffff03e49398d0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1706
1706 error = zfs_register_callbacks(vfsp);
(kgdb) list
1700 * When doing a remount, we simply refresh our temporary properties
1701 * according to those options set in the current VFS options.
1702 */
1703 if (vfsp->vfs_flag & MS_REMOUNT) {
1704 /* refresh mount options */
1705 zfs_unregister_callbacks(vfsp->vfs_data);
1706 error = zfs_register_callbacks(vfsp);
1707 goto out;
1708 }
1709
There is a little time gap between line 1705 and 1706 where the property
aclinherit is not registered. If another thread tries to unregister this
property during this gap he will panic.
I don't know how to fix this proper. Simple to remove the VERIFY on
dsl_prop_unregister() is easy, but I hope that one of the ZFS gurus will
have a look at this and we will get a better solution.
--
Andreas Longwitz
Please see the following suggestion: http://article.gmane.org/gmane.os.illumos.zfs/2833 Will you be able to work on a patch that implements the idea? -- Andriy Gapon Andriy Gapon wrote
> Please see the following suggestion:
> http://article.gmane.org/gmane.os.illumos.zfs/2833
> Will you be able to work on a patch that implements the idea?
Sorry, but that is out of my scope.
Andreas Longwitz
Created attachment 157329 [details]
console screenshot of panic
Created attachment 157799 [details] proposed patch for testing commit 7fb40408aea1be27984cc25dd80e707c9dbc2f05 Author: Andriy Gapon <avg@icyb.net.ua> If anyone is still running into the issue, could you please test the attached patch? zfs_mount(MS_REMOUNT): protect zfs_(un)register_callbacks calls We now take z_teardown_lock as a writer to ensure that there is no I/O while the filesystem state is in a flux. Also, zfs_suspend_fs() -> zfsvfs_teardown() call zfs_unregister_callbacks() and zfs_resume_fs() -> zfsvfs_setup() call zfs_unregister_callbacks(). Previously there was no synchronization between those calls and the calls in the re-mounting case. That could lead to concurrent execution and a crash. PR: 180060 Suggested by: mahrens See review D2865 A commit references this bug: Author: avg Date: Thu Jul 2 08:32:03 UTC 2015 New revision: 285021 URL: https://svnweb.freebsd.org/changeset/base/285021 Log: zfs_mount(MS_REMOUNT): protect zfs_(un)register_callbacks calls We now take z_teardown_lock as a writer to ensure that there is no I/O while the filesystem state is in a flux. Also, zfs_suspend_fs() -> zfsvfs_teardown() call zfs_unregister_callbacks() and zfs_resume_fs() -> zfsvfs_setup() call zfs_unregister_callbacks(). Previously there was no synchronization between those calls and the calls in the re-mounting case. That could lead to concurrent execution and a crash. PR: 180060 Differential Revision: https://reviews.freebsd.org/D2865 Suggested by: mahrens Reviewed by: delphij, pho, mahrens, will MFC after: 13 days Sponsored by: ClusterHQ Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c Should be fixed now. |