Bug 144402 - [zfs] [panic] panic at zfs_znode_dmu_init: existing znode for dbuf
Summary: [zfs] [panic] panic at zfs_znode_dmu_init: existing znode for dbuf
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.0-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Pawel Jakub Dawidek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-01 22:20 UTC by Alex.Bakhtin
Modified: 2010-05-01 20:24 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex.Bakhtin 2010-03-01 22:20:05 UTC
In pre-production testing of FreeBSD 8.0 with ZFS/raidz I found a 100% reproducible panic in zfs. The problem was initially found after two hours of benchmarks/dbench testing on 8.0-RELEASE. The problem was reproduced on 8.0-STABLE csuped at Feb 28, 2010.

FreeBSD 8.0-RELEASE #0: Sun Feb 28 15:40:09 UTC 2010
    bakhtin@tarzan-new.private.flydrag.ru:/zfs/obj/usr/src.old/sys/DEBUG

CPU: Intel(R) Pentium(R)  CPU       E5200  @ 2.50GHz (2536.15-MHz K8-class CPU)
  real memory  = 8589934592 (8192 MB)

tarzan-new# zpool status
  pool: zfs
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zfs         ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad10    ONLINE       0     0     0
            ad12    ONLINE       0     0     0
            ad16    ONLINE       0     0     0

errors: No known data errors
tarzan-new#

tarzan-new# zdb -vvv
zfs
    version=13
    name='zfs'
    state=0
    txg=342
    pool_guid=14801748754090954299
    hostid=4266611921
    hostname='tarzan-new.private.flydrag.ru'
    vdev_tree
        type='root'
        id=0
        guid=14801748754090954299
        children[0]
                type='raidz'
                id=0
                guid=11719031541734505632
                nparity=1
                metaslab_array=23
                metaslab_shift=31
                ashift=9
                asize=240064659456
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=541462146913312867
                        path='/dev/ad10'
                        whole_disk=0
                children[1]
                        type='disk'
                        id=1
                        guid=14783639361535716946
                        path='/dev/ad12'
                        whole_disk=0
                children[2]
                        type='disk'
                        id=2
                        guid=8087457233125113893
                        path='/dev/ad16'
                        whole_disk=0
tarzan-new#


Crash data:

  60    405488    53.18 MB/sec  execute 400 sec
  60    405664    53.18 MB/sec  execute 401 sec
panic: existing znode 0xffffff0103514468 for dbuf 0xffffff00d494ea80^M
cpuid = 1^M
KDB: enter: panic^M
[thread pid 1113 tid 100202 ]
Stopped at      kdb_enter+0x3d: movq    $0,0x69a270(%rip)
db:0:kdb.enter.panic>  bt
Tracing pid 1113 tid 100202 td 0xffffff00196703a0
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
zfs_znode_dmu_init() at zfs_znode_dmu_init+0xb5
zfs_znode_alloc() at zfs_znode_alloc+0xa0
zfs_mknode() at zfs_mknode+0x204
zfs_freebsd_create() at zfs_freebsd_create+0x594
VOP_CREATE_APV() at VOP_CREATE_APV+0xb3
vn_open_cred() at vn_open_cred+0x473
kern_openat() at kern_openat+0x179
syscall() at syscall+0x118
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (5, FreeBSD ELF64, open), rip = 0x80073075c, rsp = 0x7fffffffdd98, rbp = 0x800a05100 ---


Console logs:
http://flydrag.dyndns.org:9090/freebsd/zfs-panic/1/console.txt

crashinfo:
http://flydrag.dyndns.org:9090/freebsd/zfs-panic/1/core.txt

vmcore:
http://flydrag.dyndns.org:9090/freebsd/zfs-panic/1/vmcore.1.gz

How-To-Repeat: 
Install FreeBSD/amd64 (not tested on i386) 8.0-RELEASE, csup (optionally) to STABLE.

Create zpool (tested on raidz1).

Install /usr/ports/benchmarks/dbench

Run dbench -t 10000 -D /zfs/bench 60
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2010-03-02 00:40:49 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2010-03-19 22:51:58 UTC
Responsible Changed
From-To: freebsd-fs->pjd

I'll look into this one.
Comment 3 Alex.Bakhtin 2010-04-28 06:34:45 UTC
Pawel,

     Sorry for a long delay, it took a lot of time to test this patch.
It seems that the problem was fixed by your patch.

    I made the following testing:

1. Reproduced the problem with znode again. Unfortunately it took a
little bit more than 10000 second for now:

  80   3751510    56.69 MB/sec  execute 11724 sec
panic: existing znode 0xffffff0114206000 for dbuf 0xffffff013685d540

2. With your patch I made two tests:
    a. dbench -t 86400 -D /zfs/bench/ 80
    b. dbench -t 259200 -D /zfs/bench/ 80

    Both tests completed successfully:

  80  13090087    53.99 MB/sec  cleanup 86405 sec

Throughput 53.993 MB/sec 80 procs

  80  -17242124    50.79 MB/sec  cleanup 259210 sec

Throughput 50.7938 MB/sec 80 procs

3. After that I switched back to non-patched kernel and recreated this
problem again:

  80   2701171    49.69 MB/sec  execute 4767 sec
panic: existing znode 0xffffff015c7aa2f0 for dbuf 0xffffff0114edac40

Alex Bakhtin
Comment 4 dfilter service freebsd_committer freebsd_triage 2010-04-28 19:29:58 UTC
Author: pjd
Date: Wed Apr 28 18:29:48 2010
New Revision: 207334
URL: http://svn.freebsd.org/changeset/base/207334

Log:
  Backport fix for 'zfs_znode_dmu_init: existing znode for dbuf' panic from OpenSolaris.
  
  PR:		kern/144402
  Reported by:	Alex Bakhtin <alex.bakhtin@gmail.com>
  Tested by:	Alex Bakhtin <alex.bakhtin@gmail.com>
  Obtained from:	OpenSolaris, Bug ID 6895088
  MFC after:	3 days

Modified:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	Wed Apr 28 18:29:44 2010	(r207333)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	Wed Apr 28 18:29:48 2010	(r207334)
@@ -704,6 +704,8 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d
 			    DMU_OT_ZNODE, sizeof (znode_phys_t) + bonuslen, tx);
 		}
 	}
+
+	ZFS_OBJ_HOLD_ENTER(zfsvfs, obj);
 	VERIFY(0 == dmu_bonus_hold(zfsvfs->z_os, obj, NULL, &db));
 	dmu_buf_will_dirty(db, tx);
 
@@ -765,9 +767,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d
 
 	pzp->zp_mode = MAKEIMODE(vap->va_type, vap->va_mode);
 	if (!(flag & IS_ROOT_NODE)) {
-		ZFS_OBJ_HOLD_ENTER(zfsvfs, obj);
 		*zpp = zfs_znode_alloc(zfsvfs, db, 0);
-		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj);
 	} else {
 		/*
 		 * If we are creating the root node, the "parent" we
@@ -776,6 +776,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d
 		*zpp = dzp;
 	}
 	zfs_perm_init(*zpp, dzp, flag, vap, tx, cr, setaclp, fuidp);
+	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj);
 	if (!(flag & IS_ROOT_NODE)) {
 		vnode_t *vp;
 
@@ -939,19 +940,31 @@ again:
 
 	/*
 	 * Not found create new znode/vnode
+	 * but only if file exists.
+	 *
+	 * There is a small window where zfs_vget() could
+	 * find this object while a file create is still in
+	 * progress.  Since a gen number can never be zero
+	 * we will check that to determine if its an allocated
+	 * file.
 	 */
-	zp = zfs_znode_alloc(zfsvfs, db, doi.doi_data_block_size);
-
-	vp = ZTOV(zp);
-	vp->v_vflag |= VV_FORCEINSMQ;
-	err = insmntque(vp, zfsvfs->z_vfs);
-	vp->v_vflag &= ~VV_FORCEINSMQ;
-	KASSERT(err == 0, ("insmntque() failed: error %d", err));
-	VOP_UNLOCK(vp, 0);
 
+	if (((znode_phys_t *)db->db_data)->zp_gen != 0) {
+		zp = zfs_znode_alloc(zfsvfs, db, doi.doi_data_block_size);
+		*zpp = zp;
+		vp = ZTOV(zp);
+		vp->v_vflag |= VV_FORCEINSMQ;
+		err = insmntque(vp, zfsvfs->z_vfs);
+		vp->v_vflag &= ~VV_FORCEINSMQ;
+		KASSERT(err == 0, ("insmntque() failed: error %d", err));
+		VOP_UNLOCK(vp, 0);
+		err = 0;
+	} else {
+		dmu_buf_rele(db, NULL);
+		err = ENOENT;
+	}
 	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
-	*zpp = zp;
-	return (0);
+	return (err);
 }
 
 int
@@ -1440,6 +1453,7 @@ zfs_create_fs(objset_t *os, cred_t *cr, 
 	uint64_t	norm = 0;
 	nvpair_t	*elem;
 	int		error;
+	int		i;
 	znode_t		*rootzp = NULL;
 	vnode_t		vnode;
 	vattr_t		vattr;
@@ -1537,6 +1551,9 @@ zfs_create_fs(objset_t *os, cred_t *cr, 
 	list_create(&zfsvfs.z_all_znodes, sizeof (znode_t),
 	    offsetof(znode_t, z_link_node));
 
+	for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
+		mutex_init(&zfsvfs.z_hold_mtx[i], NULL, MUTEX_DEFAULT, NULL);
+
 	ASSERT(!POINTER_IS_VALID(rootzp->z_zfsvfs));
 	rootzp->z_zfsvfs = &zfsvfs;
 	zfs_mknode(rootzp, &vattr, tx, cr, IS_ROOT_NODE, &zp, 0, NULL, NULL);
@@ -1547,6 +1564,8 @@ zfs_create_fs(objset_t *os, cred_t *cr, 
 
 	dmu_buf_rele(rootzp->z_dbuf, NULL);
 	rootzp->z_dbuf = NULL;
+	for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
+		mutex_destroy(&zfsvfs.z_hold_mtx[i]);
 	mutex_destroy(&zfsvfs.z_znodes_lock);
 	rootzp->z_vnode = NULL;
 	kmem_cache_free(znode_cache, rootzp);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 5 dfilter service freebsd_committer freebsd_triage 2010-05-01 20:00:48 UTC
Author: pjd
Date: Sat May  1 19:00:33 2010
New Revision: 207477
URL: http://svn.freebsd.org/changeset/base/207477

Log:
  MFC r207068,r207334:
  
  r207068:
  
  Allow to modify directory's content even if the ZFS_NOUNLINK (SF_NOUNLINK,
  sunlnk) flag is set. We only deny dirctory's removal or rename.
  
  PR:		kern/143343
  Reported by:	marck
  
  r207334:
  
  Backport fix for 'zfs_znode_dmu_init: existing znode for dbuf' panic from OpenSolaris.
  
  PR:		kern/144402
  Reported by:	Alex Bakhtin <alex.bakhtin@gmail.com>
  Tested by:	Alex Bakhtin <alex.bakhtin@gmail.com>
  Obtained from:	OpenSolaris, Bug ID 6895088

Modified:
  stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c
  stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
Directory Properties:
  stable/8/sys/   (props changed)
  stable/8/sys/amd64/include/xen/   (props changed)
  stable/8/sys/cddl/contrib/opensolaris/   (props changed)
  stable/8/sys/contrib/dev/acpica/   (props changed)
  stable/8/sys/contrib/pf/   (props changed)
  stable/8/sys/dev/xen/xenpci/   (props changed)
  stable/8/sys/geom/sched/   (props changed)

Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c
==============================================================================
--- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c	Sat May  1 18:56:45 2010	(r207476)
+++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c	Sat May  1 19:00:33 2010	(r207477)
@@ -2235,11 +2235,24 @@ zfs_zaccess_common(znode_t *zp, uint32_t
 		return (EPERM);
 	}
 
+#ifdef sun
 	if ((v4_mode & (ACE_DELETE | ACE_DELETE_CHILD)) &&
 	    (zp->z_phys->zp_flags & ZFS_NOUNLINK)) {
 		*check_privs = B_FALSE;
 		return (EPERM);
 	}
+#else
+	/*
+	 * In FreeBSD we allow to modify directory's content is ZFS_NOUNLINK
+	 * (sunlnk) is set. We just don't allow directory removal, which is
+	 * handled in zfs_zaccess_delete().
+	 */
+	if ((v4_mode & ACE_DELETE) &&
+	    (zp->z_phys->zp_flags & ZFS_NOUNLINK)) {
+		*check_privs = B_FALSE;
+		return (EPERM);
+	}
+#endif
 
 	if (((v4_mode & (ACE_READ_DATA|ACE_EXECUTE)) &&
 	    (zp->z_phys->zp_flags & ZFS_AV_QUARANTINED))) {

Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
==============================================================================
--- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	Sat May  1 18:56:45 2010	(r207476)
+++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	Sat May  1 19:00:33 2010	(r207477)
@@ -704,6 +704,8 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d
 			    DMU_OT_ZNODE, sizeof (znode_phys_t) + bonuslen, tx);
 		}
 	}
+
+	ZFS_OBJ_HOLD_ENTER(zfsvfs, obj);
 	VERIFY(0 == dmu_bonus_hold(zfsvfs->z_os, obj, NULL, &db));
 	dmu_buf_will_dirty(db, tx);
 
@@ -765,9 +767,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d
 
 	pzp->zp_mode = MAKEIMODE(vap->va_type, vap->va_mode);
 	if (!(flag & IS_ROOT_NODE)) {
-		ZFS_OBJ_HOLD_ENTER(zfsvfs, obj);
 		*zpp = zfs_znode_alloc(zfsvfs, db, 0);
-		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj);
 	} else {
 		/*
 		 * If we are creating the root node, the "parent" we
@@ -776,6 +776,7 @@ zfs_mknode(znode_t *dzp, vattr_t *vap, d
 		*zpp = dzp;
 	}
 	zfs_perm_init(*zpp, dzp, flag, vap, tx, cr, setaclp, fuidp);
+	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj);
 	if (!(flag & IS_ROOT_NODE)) {
 		vnode_t *vp;
 
@@ -939,19 +940,31 @@ again:
 
 	/*
 	 * Not found create new znode/vnode
+	 * but only if file exists.
+	 *
+	 * There is a small window where zfs_vget() could
+	 * find this object while a file create is still in
+	 * progress.  Since a gen number can never be zero
+	 * we will check that to determine if its an allocated
+	 * file.
 	 */
-	zp = zfs_znode_alloc(zfsvfs, db, doi.doi_data_block_size);
-
-	vp = ZTOV(zp);
-	vp->v_vflag |= VV_FORCEINSMQ;
-	err = insmntque(vp, zfsvfs->z_vfs);
-	vp->v_vflag &= ~VV_FORCEINSMQ;
-	KASSERT(err == 0, ("insmntque() failed: error %d", err));
-	VOP_UNLOCK(vp, 0);
 
+	if (((znode_phys_t *)db->db_data)->zp_gen != 0) {
+		zp = zfs_znode_alloc(zfsvfs, db, doi.doi_data_block_size);
+		*zpp = zp;
+		vp = ZTOV(zp);
+		vp->v_vflag |= VV_FORCEINSMQ;
+		err = insmntque(vp, zfsvfs->z_vfs);
+		vp->v_vflag &= ~VV_FORCEINSMQ;
+		KASSERT(err == 0, ("insmntque() failed: error %d", err));
+		VOP_UNLOCK(vp, 0);
+		err = 0;
+	} else {
+		dmu_buf_rele(db, NULL);
+		err = ENOENT;
+	}
 	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
-	*zpp = zp;
-	return (0);
+	return (err);
 }
 
 int
@@ -1440,6 +1453,7 @@ zfs_create_fs(objset_t *os, cred_t *cr, 
 	uint64_t	norm = 0;
 	nvpair_t	*elem;
 	int		error;
+	int		i;
 	znode_t		*rootzp = NULL;
 	vnode_t		vnode;
 	vattr_t		vattr;
@@ -1537,6 +1551,9 @@ zfs_create_fs(objset_t *os, cred_t *cr, 
 	list_create(&zfsvfs.z_all_znodes, sizeof (znode_t),
 	    offsetof(znode_t, z_link_node));
 
+	for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
+		mutex_init(&zfsvfs.z_hold_mtx[i], NULL, MUTEX_DEFAULT, NULL);
+
 	ASSERT(!POINTER_IS_VALID(rootzp->z_zfsvfs));
 	rootzp->z_zfsvfs = &zfsvfs;
 	zfs_mknode(rootzp, &vattr, tx, cr, IS_ROOT_NODE, &zp, 0, NULL, NULL);
@@ -1547,6 +1564,8 @@ zfs_create_fs(objset_t *os, cred_t *cr, 
 
 	dmu_buf_rele(rootzp->z_dbuf, NULL);
 	rootzp->z_dbuf = NULL;
+	for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
+		mutex_destroy(&zfsvfs.z_hold_mtx[i]);
 	mutex_destroy(&zfsvfs.z_znodes_lock);
 	rootzp->z_vnode = NULL;
 	kmem_cache_free(znode_cache, rootzp);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 6 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2010-05-01 20:24:16 UTC
State Changed
From-To: patched->closed

Fix MFCed to stable/8. Thanks!
Comment 7 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2014-06-01 07:15:47 UTC
State Changed
From-To: open->feedback

Unfortunately I was unable to reproduce the panic using proposed test, 
but this bug seems to be already fixed in OpenSolaris (bugid: 6895088). 
I back-ported the fix to FreeBSD, could you try it and see if it helps? 

http://people.freebsd.org/~pjd/patches/zfs_znode.c.3.patch
Comment 8 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2014-06-01 07:15:47 UTC
State Changed
From-To: feedback->patched

Thanks a lot for the report and testing. I just committed fix to HEAD 
and will MFC within few days.