In pre-production testing of FreeBSD 8.0 with ZFS/raidz I found a 100% reproducible panic in zfs. The problem was initially found after two hours of benchmarks/dbench testing on 8.0-RELEASE. The problem was reproduced on 8.0-STABLE csuped at Feb 28, 2010. FreeBSD 8.0-RELEASE #0: Sun Feb 28 15:40:09 UTC 2010 bakhtin@tarzan-new.private.flydrag.ru:/zfs/obj/usr/src.old/sys/DEBUG CPU: Intel(R) Pentium(R) CPU E5200 @ 2.50GHz (2536.15-MHz K8-class CPU) real memory = 8589934592 (8192 MB) tarzan-new# zpool status pool: zfs state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM zfs ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 ad16 ONLINE 0 0 0 errors: No known data errors tarzan-new# tarzan-new# zdb -vvv zfs version=13 name='zfs' state=0 txg=342 pool_guid=14801748754090954299 hostid=4266611921 hostname='tarzan-new.private.flydrag.ru' vdev_tree type='root' id=0 guid=14801748754090954299 children[0] type='raidz' id=0 guid=11719031541734505632 nparity=1 metaslab_array=23 metaslab_shift=31 ashift=9 asize=240064659456 is_log=0 children[0] type='disk' id=0 guid=541462146913312867 path='/dev/ad10' whole_disk=0 children[1] type='disk' id=1 guid=14783639361535716946 path='/dev/ad12' whole_disk=0 children[2] type='disk' id=2 guid=8087457233125113893 path='/dev/ad16' whole_disk=0 tarzan-new# Crash data: 60 405488 53.18 MB/sec execute 400 sec 60 405664 53.18 MB/sec execute 401 sec panic: existing znode 0xffffff0103514468 for dbuf 0xffffff00d494ea80^M cpuid = 1^M KDB: enter: panic^M [thread pid 1113 tid 100202 ] Stopped at kdb_enter+0x3d: movq $0,0x69a270(%rip) db:0:kdb.enter.panic> bt Tracing pid 1113 tid 100202 td 0xffffff00196703a0 kdb_enter() at kdb_enter+0x3d panic() at panic+0x17b zfs_znode_dmu_init() at zfs_znode_dmu_init+0xb5 zfs_znode_alloc() at zfs_znode_alloc+0xa0 zfs_mknode() at zfs_mknode+0x204 zfs_freebsd_create() at zfs_freebsd_create+0x594 VOP_CREATE_APV() at VOP_CREATE_APV+0xb3 vn_open_cred() at vn_open_cred+0x473 kern_openat() at kern_openat+0x179 syscall() at syscall+0x118 Xfast_syscall() at Xfast_syscall+0xe1 --- syscall (5, FreeBSD ELF64, open), rip = 0x80073075c, rsp = 0x7fffffffdd98, rbp = 0x800a05100 --- Console logs: http://flydrag.dyndns.org:9090/freebsd/zfs-panic/1/console.txt crashinfo: http://flydrag.dyndns.org:9090/freebsd/zfs-panic/1/core.txt vmcore: http://flydrag.dyndns.org:9090/freebsd/zfs-panic/1/vmcore.1.gz How-To-Repeat: Install FreeBSD/amd64 (not tested on i386) 8.0-RELEASE, csup (optionally) to STABLE. Create zpool (tested on raidz1). Install /usr/ports/benchmarks/dbench Run dbench -t 10000 -D /zfs/bench 60
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
Responsible Changed From-To: freebsd-fs->pjd I'll look into this one.
Pawel, Sorry for a long delay, it took a lot of time to test this patch. It seems that the problem was fixed by your patch. I made the following testing: 1. Reproduced the problem with znode again. Unfortunately it took a little bit more than 10000 second for now: 80 3751510 56.69 MB/sec execute 11724 sec panic: existing znode 0xffffff0114206000 for dbuf 0xffffff013685d540 2. With your patch I made two tests: a. dbench -t 86400 -D /zfs/bench/ 80 b. dbench -t 259200 -D /zfs/bench/ 80 Both tests completed successfully: 80 13090087 53.99 MB/sec cleanup 86405 sec Throughput 53.993 MB/sec 80 procs 80 -17242124 50.79 MB/sec cleanup 259210 sec Throughput 50.7938 MB/sec 80 procs 3. After that I switched back to non-patched kernel and recreated this problem again: 80 2701171 49.69 MB/sec execute 4767 sec panic: existing znode 0xffffff015c7aa2f0 for dbuf 0xffffff0114edac40 Alex Bakhtin
Author: pjd Date: Wed Apr 28 18:29:48 2010 New Revision: 207334 URL: http://svn.freebsd.org/changeset/base/207334 Log: Backport fix for 'zfs_znode_dmu_init: existing znode for dbuf' panic from OpenSolaris. PR: kern/144402 Reported by: Alex Bakhtin <alex.bakhtin@gmail.com> Tested by: Alex Bakhtin <alex.bakhtin@gmail.com> Obtained from: OpenSolaris, Bug ID 6895088 MFC after: 3 days Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c ============================================================================== --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Wed Apr 28 18:29:44 2010 (r207333) +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Wed Apr 28 18:29:48 2010 (r207334) @@ -704,6 +704,8 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d DMU_OT_ZNODE, sizeof (znode_phys_t) + bonuslen, tx); } } + + ZFS_OBJ_HOLD_ENTER(zfsvfs, obj); VERIFY(0 == dmu_bonus_hold(zfsvfs->z_os, obj, NULL, &db)); dmu_buf_will_dirty(db, tx); @@ -765,9 +767,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d pzp->zp_mode = MAKEIMODE(vap->va_type, vap->va_mode); if (!(flag & IS_ROOT_NODE)) { - ZFS_OBJ_HOLD_ENTER(zfsvfs, obj); *zpp = zfs_znode_alloc(zfsvfs, db, 0); - ZFS_OBJ_HOLD_EXIT(zfsvfs, obj); } else { /* * If we are creating the root node, the "parent" we @@ -776,6 +776,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d *zpp = dzp; } zfs_perm_init(*zpp, dzp, flag, vap, tx, cr, setaclp, fuidp); + ZFS_OBJ_HOLD_EXIT(zfsvfs, obj); if (!(flag & IS_ROOT_NODE)) { vnode_t *vp; @@ -939,19 +940,31 @@ again: /* * Not found create new znode/vnode + * but only if file exists. + * + * There is a small window where zfs_vget() could + * find this object while a file create is still in + * progress. Since a gen number can never be zero + * we will check that to determine if its an allocated + * file. */ - zp = zfs_znode_alloc(zfsvfs, db, doi.doi_data_block_size); - - vp = ZTOV(zp); - vp->v_vflag |= VV_FORCEINSMQ; - err = insmntque(vp, zfsvfs->z_vfs); - vp->v_vflag &= ~VV_FORCEINSMQ; - KASSERT(err == 0, ("insmntque() failed: error %d", err)); - VOP_UNLOCK(vp, 0); + if (((znode_phys_t *)db->db_data)->zp_gen != 0) { + zp = zfs_znode_alloc(zfsvfs, db, doi.doi_data_block_size); + *zpp = zp; + vp = ZTOV(zp); + vp->v_vflag |= VV_FORCEINSMQ; + err = insmntque(vp, zfsvfs->z_vfs); + vp->v_vflag &= ~VV_FORCEINSMQ; + KASSERT(err == 0, ("insmntque() failed: error %d", err)); + VOP_UNLOCK(vp, 0); + err = 0; + } else { + dmu_buf_rele(db, NULL); + err = ENOENT; + } ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); - *zpp = zp; - return (0); + return (err); } int @@ -1440,6 +1453,7 @@ zfs_create_fs(objset_t *os, cred_t *cr, uint64_t norm = 0; nvpair_t *elem; int error; + int i; znode_t *rootzp = NULL; vnode_t vnode; vattr_t vattr; @@ -1537,6 +1551,9 @@ zfs_create_fs(objset_t *os, cred_t *cr, list_create(&zfsvfs.z_all_znodes, sizeof (znode_t), offsetof(znode_t, z_link_node)); + for (i = 0; i != ZFS_OBJ_MTX_SZ; i++) + mutex_init(&zfsvfs.z_hold_mtx[i], NULL, MUTEX_DEFAULT, NULL); + ASSERT(!POINTER_IS_VALID(rootzp->z_zfsvfs)); rootzp->z_zfsvfs = &zfsvfs; zfs_mknode(rootzp, &vattr, tx, cr, IS_ROOT_NODE, &zp, 0, NULL, NULL); @@ -1547,6 +1564,8 @@ zfs_create_fs(objset_t *os, cred_t *cr, dmu_buf_rele(rootzp->z_dbuf, NULL); rootzp->z_dbuf = NULL; + for (i = 0; i != ZFS_OBJ_MTX_SZ; i++) + mutex_destroy(&zfsvfs.z_hold_mtx[i]); mutex_destroy(&zfsvfs.z_znodes_lock); rootzp->z_vnode = NULL; kmem_cache_free(znode_cache, rootzp); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Author: pjd Date: Sat May 1 19:00:33 2010 New Revision: 207477 URL: http://svn.freebsd.org/changeset/base/207477 Log: MFC r207068,r207334: r207068: Allow to modify directory's content even if the ZFS_NOUNLINK (SF_NOUNLINK, sunlnk) flag is set. We only deny dirctory's removal or rename. PR: kern/143343 Reported by: marck r207334: Backport fix for 'zfs_znode_dmu_init: existing znode for dbuf' panic from OpenSolaris. PR: kern/144402 Reported by: Alex Bakhtin <alex.bakhtin@gmail.com> Tested by: Alex Bakhtin <alex.bakhtin@gmail.com> Obtained from: OpenSolaris, Bug ID 6895088 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Directory Properties: stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) stable/8/sys/dev/xen/xenpci/ (props changed) stable/8/sys/geom/sched/ (props changed) Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c Sat May 1 18:56:45 2010 (r207476) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c Sat May 1 19:00:33 2010 (r207477) @@ -2235,11 +2235,24 @@ zfs_zaccess_common(znode_t *zp, uint32_t return (EPERM); } +#ifdef sun if ((v4_mode & (ACE_DELETE | ACE_DELETE_CHILD)) && (zp->z_phys->zp_flags & ZFS_NOUNLINK)) { *check_privs = B_FALSE; return (EPERM); } +#else + /* + * In FreeBSD we allow to modify directory's content is ZFS_NOUNLINK + * (sunlnk) is set. We just don't allow directory removal, which is + * handled in zfs_zaccess_delete(). + */ + if ((v4_mode & ACE_DELETE) && + (zp->z_phys->zp_flags & ZFS_NOUNLINK)) { + *check_privs = B_FALSE; + return (EPERM); + } +#endif if (((v4_mode & (ACE_READ_DATA|ACE_EXECUTE)) && (zp->z_phys->zp_flags & ZFS_AV_QUARANTINED))) { Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Sat May 1 18:56:45 2010 (r207476) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Sat May 1 19:00:33 2010 (r207477) @@ -704,6 +704,8 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d DMU_OT_ZNODE, sizeof (znode_phys_t) + bonuslen, tx); } } + + ZFS_OBJ_HOLD_ENTER(zfsvfs, obj); VERIFY(0 == dmu_bonus_hold(zfsvfs->z_os, obj, NULL, &db)); dmu_buf_will_dirty(db, tx); @@ -765,9 +767,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d pzp->zp_mode = MAKEIMODE(vap->va_type, vap->va_mode); if (!(flag & IS_ROOT_NODE)) { - ZFS_OBJ_HOLD_ENTER(zfsvfs, obj); *zpp = zfs_znode_alloc(zfsvfs, db, 0); - ZFS_OBJ_HOLD_EXIT(zfsvfs, obj); } else { /* * If we are creating the root node, the "parent" we @@ -776,6 +776,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d *zpp = dzp; } zfs_perm_init(*zpp, dzp, flag, vap, tx, cr, setaclp, fuidp); + ZFS_OBJ_HOLD_EXIT(zfsvfs, obj); if (!(flag & IS_ROOT_NODE)) { vnode_t *vp; @@ -939,19 +940,31 @@ again: /* * Not found create new znode/vnode + * but only if file exists. + * + * There is a small window where zfs_vget() could + * find this object while a file create is still in + * progress. Since a gen number can never be zero + * we will check that to determine if its an allocated + * file. */ - zp = zfs_znode_alloc(zfsvfs, db, doi.doi_data_block_size); - - vp = ZTOV(zp); - vp->v_vflag |= VV_FORCEINSMQ; - err = insmntque(vp, zfsvfs->z_vfs); - vp->v_vflag &= ~VV_FORCEINSMQ; - KASSERT(err == 0, ("insmntque() failed: error %d", err)); - VOP_UNLOCK(vp, 0); + if (((znode_phys_t *)db->db_data)->zp_gen != 0) { + zp = zfs_znode_alloc(zfsvfs, db, doi.doi_data_block_size); + *zpp = zp; + vp = ZTOV(zp); + vp->v_vflag |= VV_FORCEINSMQ; + err = insmntque(vp, zfsvfs->z_vfs); + vp->v_vflag &= ~VV_FORCEINSMQ; + KASSERT(err == 0, ("insmntque() failed: error %d", err)); + VOP_UNLOCK(vp, 0); + err = 0; + } else { + dmu_buf_rele(db, NULL); + err = ENOENT; + } ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); - *zpp = zp; - return (0); + return (err); } int @@ -1440,6 +1453,7 @@ zfs_create_fs(objset_t *os, cred_t *cr, uint64_t norm = 0; nvpair_t *elem; int error; + int i; znode_t *rootzp = NULL; vnode_t vnode; vattr_t vattr; @@ -1537,6 +1551,9 @@ zfs_create_fs(objset_t *os, cred_t *cr, list_create(&zfsvfs.z_all_znodes, sizeof (znode_t), offsetof(znode_t, z_link_node)); + for (i = 0; i != ZFS_OBJ_MTX_SZ; i++) + mutex_init(&zfsvfs.z_hold_mtx[i], NULL, MUTEX_DEFAULT, NULL); + ASSERT(!POINTER_IS_VALID(rootzp->z_zfsvfs)); rootzp->z_zfsvfs = &zfsvfs; zfs_mknode(rootzp, &vattr, tx, cr, IS_ROOT_NODE, &zp, 0, NULL, NULL); @@ -1547,6 +1564,8 @@ zfs_create_fs(objset_t *os, cred_t *cr, dmu_buf_rele(rootzp->z_dbuf, NULL); rootzp->z_dbuf = NULL; + for (i = 0; i != ZFS_OBJ_MTX_SZ; i++) + mutex_destroy(&zfsvfs.z_hold_mtx[i]); mutex_destroy(&zfsvfs.z_znodes_lock); rootzp->z_vnode = NULL; kmem_cache_free(znode_cache, rootzp); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: patched->closed Fix MFCed to stable/8. Thanks!
State Changed From-To: open->feedback Unfortunately I was unable to reproduce the panic using proposed test, but this bug seems to be already fixed in OpenSolaris (bugid: 6895088). I back-ported the fix to FreeBSD, could you try it and see if it helps? http://people.freebsd.org/~pjd/patches/zfs_znode.c.3.patch
State Changed From-To: feedback->patched Thanks a lot for the report and testing. I just committed fix to HEAD and will MFC within few days.