Bug 102335 - [devfs] sx vnode deadlock
Summary: [devfs] sx vnode deadlock
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 6.1-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Konstantin Belousov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-08-21 02:20 UTC by Alex Unleashed
Modified: 2006-10-31 03:09 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Unleashed 2006-08-21 02:20:14 UTC
I've been able to reproduce both in 6.1-RELEASE (including -p3) and 6-STABLE
(snapshot from August 16th 2006 ~01:00 GMT) a deadlock in devfs code which
leaves the system unable to access the disk. I've come up with some
interesting debugging info, and it looks to me like there are vnode problems
while a sx lock is being held.

My take at it is that the deadlock occurs when a process gets a lock on a
vnode (tagged "devfs") and another process xlocks an sx lock ("devfsmount").
For some reason the one holding the sx lock wants to get the lock on the
vnode through devfs_allocv(), and the other process wants to get the sx
lock through devfs_lookup(). From this point on, pretty much anything
wanting to touch the filesystem waits forever on devfs_root() for another
vnode flagged as VV_ROOT and locked by the process holding the sx lock.

Patching the devfs code with fixes from -CURRENT didn't work out.

Here's some info from a DDB session after the deadlock had just occurred:

db> ps
  pid   proc     uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
17072 c4145624   0  17070  1071 0004002 [SLPQ piperd     0xc39904c8][SLP] gawk-3.1.5
17071 c4149c48   0  17070  1071 0000002 [SLPQ devfs      0xc41435c8][SLP] sh
17070 c4145830   0  17068  1071 0000002 [SLPQ wait       0xc4145830][SLP] sh
17069 c414920c   0  17067  1071 0004002 [SLPQ devfsmount 0xc396d540][SLP] xgcc
<<snip>>

db> trace 17069
Tracing pid 17069 tid 100071 td 0xc4146600
sched_switch(c4146600,0,1,c09545dc,c4146600) at sched_switch+0x2ea
mi_switch(1,0,c0955f54,1ba,c091b594) at mi_switch+0x425
sleepq_switch(c396d540,0,c0955f54,20c,dad529f0) at sleepq_switch+0x110
sleepq_wait(c396d540,c0a36da0,c0918064,1,c09c1b6c) at sleepq_wait+0x65
cv_wait(c396d540,c0a36da0,c094f9cd,23f,dad52ad4) at cv_wait+0x1e6
_sx_xlock(c396d510,c094f9c4,23f,c09b8d80,dad52ad4) at _sx_xlock+0x99
devfs_lookup(dad52ad4,c092ddcb,37,c096da3c,c4143570) at devfs_lookup+0x36
VOP_LOOKUP_APV(c09b8d80,dad52ad4,c4146600,c4146600,0) at VOP_LOOKUP_APV+0xa4
lookup(dad52b80,0,c09204ab,b6,c070487d) at lookup+0x4ed
namei(dad52b80,c091c646,275,c09550c4,c4146600) at namei+0x43c
kern_lstat(c4146600,807a800,0,dad52c54,c091b47f) at kern_lstat+0x4f
lstat(c4146600,dad52d04,3c4,c096ba20,c4146600) at lstat+0x2f
syscall(3b,3b,3b,28232cb2,bfbfa2c1) at syscall+0x31f
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (190, FreeBSD ELF32, lstat), eip = 0x282251bb, esp = 0xbfbf9e7c, ebp = 0xbfbfab38 ---

db> trace 17071
Tracing pid 17071 tid 100076 td 0xc394cd80
sched_switch(c394cd80,0,1,c09545dc,c394cd80) at sched_switch+0x2ea
mi_switch(1,0,c0955f54,1ba,c091b594) at mi_switch+0x425
sleepq_switch(c41435c8,0,c0955f54,20c,d9bb6768) at sleepq_switch+0x110
sleepq_wait(c41435c8,0,c091bebc,c8,c394cd80) at sleepq_wait+0x65
msleep(c41435c8,c0a3743c,50,c0917eb4,0) at msleep+0x342
acquire(d9bb6840,80,60000,b1,0) at acquire+0x11c
lockmgr(c41435c8,2002,c4143638,c394cd80,d9bb6888) at lockmgr+0x576
vop_stdlock(d9bb68b0,c08f40c3,42af,c091d728,1) at vop_stdlock+0x32
VOP_LOCK_APV(c09b8d80,d9bb68b0,c4143638,8,c0920936) at VOP_LOCK_APV+0xa4
vn_lock(c4143570,2002,c394cd80,7a7,2) at vn_lock+0xd2
vget(c4143570,2,c394cd80,c3972005,c396c460) at vget+0xf0
devfs_allocv(c396c400,c3889800,d9bb6b94,c394cd80,d9bb6988) at devfs_allocv+0x43
devfs_lookupx(d9bb6ad4,c094f9c4,23f,c09b8d80,d9bb6ad4) at devfs_lookupx+0x5f5
devfs_lookup(d9bb6ad4,c092ddcb,37,c096dac3,c3971000) at devfs_lookupx+0x3e
VOP_LOOKUP_APV(c09b8d80,d9bb6ad4,c394cd80,c394cd80,0) at VOP_LOOKUP_APV+0xa4
lookup(d9bb6b80,0,c09204ab,b6,c0a82d70) at lookup+0x4ed
namei(d9bb6b80,c091c646,275,c09550c4,c394cd80) at namei+0x43c
kern_lstat(c394cd80,806d800,0,d9bb6c54,c091b47f) at kern_lstat+0x4f
lstat(c394cd80,d9bb6d04,3c4,c096ba20,c394cd80) at lstat+0x2f
syscall(3b,3b,3b,281b2cb2,bfbf9731) at syscall+0x31f
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (190, FreeBSD ELF32, lstat), eip = 0x281a51bb, esp = 0xbfbf92ec, ebp = 0xbfbf9fa8 ---

db> show alllocks
Process 17071 (sh) thread 0xc394cd80 (100076)
exclusive sx devfsmount r = 0 (0xc396d510) locked @ fs/devfs/devfs_vnops:575
Process 34 (irq1: atkbd0) thread 0xc3687a80 (100031)
exclusive sleep mutex Giant r = 0 (0xc0a37ac0) locked @ kern/kern_intr.c:683

db> show lock 0xc396d510
 class: sx
 name: devfsmount
 state: XLOCK: 0xc394cd80 (tid 100076, pid 17071, "sh")
 waiters: 0 shared, 1 exclusive

db> show lockedvnods
Locked vnodes

0xc3971000: tag devfs, type VDIR
    usecount 3, writecount 0, refcount 3 mountedhere 0
    flags (VV_ROOT)
     lock type devfs: EXCL (count 1) by thread 0xc394cd80 (pid 17071)
#0 0xc06903a1 at lockmgr+0x621
#1 0xc06f98db at vop_stdlock+0x32
#2 0xc08e18fb at VOP_LOCK_APV+0xa4
#3 0xc0712460 at vn_lock+0xd2
#4 0xc07040ff at vget+0xf0
#5 0xc0645702 at devfs_allocv+0x43
#6 0xc064550e at devfs_root+0x2c
#7 0xc06fc023 at lookup+0x829
#8 0xc06fb521 at namei+0x43c
#9 0xc070ba21 at kern_lstat+0x4f
#10 0xc070b9ad at lstat+0x2f
#11 0xc08cbb89 at syscall+0x31f
#12 0xc08b890f at Xint0x80_syscall+0x1f

0xc4143570: tag devfs, type VDIR
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags ()
     lock type devfs: EXCL (count 1) by thread 0xc4146600 (pid 17069) with 1 pending
#0 0xc06903a1 at lockmgr+0x621
#1 0xc06f98db at vop_stdlock+0x32
#2 0xc08e18fb at VOP_LOCK_APV+0xa4
#3 0xc0712460 at vn_lock+0xd2
#4 0xc07040ff at vget+0xf0
#5 0xc0645702 at devfs_allocv+0x43
#6 0xc064680e at devfs_lookupx+0x5f5
#7 0xc064685b at devfs_lookup+0x3e
#8 0xc08dea25 at VOP_LOOKUP_APV+0xa4
#9 0xc06fbce7 at lookup+0x4ed
#10 0xc06fb521 at namei+0x43c
#11 0xc070ba21 at kern_lstat+0x4f
#12 0xc070b9ad at lstat+0x2f
#13 0xc08cbb89 at syscall+0x31f
#14 0xc08b890f at Xint0x80_syscall+0x1f

After this the system still works, but it's unable to perform operations
on the filesystem, like launching a new process or saving to disk. When
this happens, the new process calls at some point a syscall (like open or
chflags, for example) which goes through devfs_root() (namei -> lookup ->
devfs_root -> devfs_allocv) and the vnode with the VV_ROOT flag gets its
refcount and pending status incremented, having the process waiting there
forever.

How-To-Repeat: This bug is cumbersome to reproduce. I'm mostly available to try whatever
fixes or commands or debugging code people want me to try, but I'm explaining
here how to reproduce it in case I'm MIA.

I ported a path sandbox application to FreeBSD from Gentoo Linux which
triggered this bug randomly. This means it can trigger it only 10 seconds
after starting running programs inside the sandbox or it can be quite a lot
of hours compiling software inside it without triggering the bug (both have
happened). However I've seen this enough times at different points to know
that the path sandbox makes it much easier to trigger it (just think that
the sandbox-wrapped syscalls multiply the real number of fs-related system
calls performed overall, mainly to check access permissions). Compiling
lots of packages is probably the best way to get it to trigger the bug.
Building kernel and world can also help.

Note that this doesn't mean you can't get the same results by some other
means. Just having some specific tests stressing the system long enough
should also work.

You can reproduce it by running Gentoo/FreeBSD, which btw *doesn't* modify
FreeBSD sources, and unmasking and installing latest sandbox version, OR
you can do so under a vanilla FreeBSD install. If the latter, you can install
and run the path sandbox by following these steps: 

- Download http://unleashed.kicks-ass.org/gentoo/sandbox/sandbox-1.2.20alpha2.tar.bz2 --or download it from Gentoo mirrors-- and make sure you installed bash.
- Make /bin/bash point to the real bash executable, if not already there.
- Extract the tarball and run "./configure --host=i386-unknown-freebsd6.1 --build=i386-unknown-freebsd6.1 --prefix=/usr/local/sandbox --sysconfdir=/usr/local/sandbox/etc --datadir=/usr/local/sandbox/usr/share --mandir=/usr/local/sandbox/usr/share/man --infodir=/usr/local/sandbox/usr/share/info --localstatedir=/usr/local/sandbox/var/lib --libdir=/usr/local/sandbox/usr/lib &&  gmake && gmake install && mkdir /var/log/sandbox" in the new directory. Make sure you have created the directory structure specified in the config options you want to use.
- Run /usr/local/sandbox/bin/sandbox. The bash prompt will change, starting with [s], so you know the sandbox is active. You can optionally play running the path sandbox using some environment variables specifying a space-separated list of directories where it should allow/disallow accesses (SANDBOX_DENY, SANDBOX_READ, SANDBOX_WRITE, SANDBOX_PREDICT) and activating debug info (SANDBOX_DEBUG=1).

From here on doesn't matter where you're trying to reproduce this: your best
bet is to start compiling software compulsively until the system's disk
apparently freezes and no new program can be launched (you can have "top"
running permanently and look for processes stalled in "devfs" and "devfsm"
waitchannels whenever you think the bug has been triggered).
Comment 1 Alex Unleashed 2006-09-04 11:32:17 UTC
Konstantin Belousov created the following patch against -CURRENT noting that
devfs from 6.X should be replaced by -CURRENT soon enough. This patch trades
the deadlock for a race at unmount time.

Index: devfs.h
===================================================================
RCS file: /usr/local/arch/ncvs/src/sys/fs/devfs/devfs.h,v
retrieving revision 1.29
diff -u -r1.29 devfs.h
--- devfs.h     12 Apr 2006 12:17:29 -0000      1.29
+++ devfs.h     1 Sep 2006 11:00:34 -0000
@@ -163,7 +163,7 @@
 void devfs_rules_apply(struct devfs_mount *dm, struct devfs_dirent *de);
 void devfs_rules_cleanup (struct devfs_mount *dm);
 int devfs_rules_ioctl(struct devfs_mount *dm, u_long cmd, caddr_t data,
struct thread *td);
-int devfs_allocv (struct devfs_dirent *de, struct mount *mp, struct vnode
**vpp, struct thread *td);
+int devfs_allocv (struct devfs_dirent *de, struct mount *mp, struct vnode
**vpp, int *dm_lock, struct thread *td);
 void devfs_delete(struct devfs_mount *dm, struct devfs_dirent *de);
 void devfs_populate (struct devfs_mount *dm);
 void devfs_cleanup (struct devfs_mount *dm);
Index: devfs_vfsops.c
===================================================================
RCS file: /usr/local/arch/ncvs/src/sys/fs/devfs/devfs_vfsops.c,v
retrieving revision 1.50
diff -u -r1.50 devfs_vfsops.c
--- devfs_vfsops.c      17 Jul 2006 09:07:01 -0000      1.50
+++ devfs_vfsops.c      1 Sep 2006 11:00:34 -0000
@@ -135,9 +135,11 @@
       int error;
       struct vnode *vp;
       struct devfs_mount *dmp;
+       int dm_lock;

       dmp = VFSTODEVFS(mp);
-       error = devfs_allocv(dmp->dm_rootdir, mp, &vp, td);
+       dm_lock = 0;
+       error = devfs_allocv(dmp->dm_rootdir, mp, &vp, &dm_lock, td);
       if (error)
               return (error);
       vp->v_vflag |= VV_ROOT;
Index: devfs_vnops.c
===================================================================
RCS file: /usr/local/arch/ncvs/src/sys/fs/devfs/devfs_vnops.c,v
retrieving revision 1.133
diff -u -r1.133 devfs_vnops.c
--- devfs_vnops.c       17 Jul 2006 09:07:01 -0000       1.133
+++ devfs_vnops.c       1 Sep 2006 11:00:34 -0000
@@ -126,20 +126,26 @@
 }

 int
-devfs_allocv(struct devfs_dirent *de, struct mount *mp, struct vnode **vpp,
struct thread *td)
+devfs_allocv(struct devfs_dirent *de, struct mount *mp, struct vnode **vpp,

+            int *dm_unlock, struct thread *td)
 {
       int error;
       struct vnode *vp;
       struct cdev *dev;
+       struct devfs_mount *dmp;

       KASSERT(td == curthread, ("devfs_allocv: td != curthread"));
+       dmp = VFSTODEVFS(mp);
 loop:
-
       mtx_lock(&devfs_de_interlock);
       vp = de->de_vnode;
       if (vp != NULL) {
               VI_LOCK(vp);
               mtx_unlock(&devfs_de_interlock);
+               if (*dm_unlock) {
+                       sx_xunlock(&dmp->dm_lock);
+                       *dm_unlock = 0;
+               }
               if (vget(vp, LK_EXCLUSIVE | LK_INTERLOCK, td))
                       goto loop;
               *vpp = vp;
@@ -182,6 +188,10 @@
       vp->v_data = de;
       de->de_vnode = vp;
       mtx_unlock(&devfs_de_interlock);
+       if (*dm_unlock) {
+               sx_xunlock(&dmp->dm_lock);
+               *dm_unlock = 0;
+       }
       vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
 #ifdef MAC
       mac_associate_vnode_devfs(mp, de, vp);
@@ -456,7 +466,7 @@
 }

 static int
-devfs_lookupx(struct vop_lookup_args *ap)
+devfs_lookupx(struct vop_lookup_args *ap, int *dm_unlock)
 {
       struct componentname *cnp;
       struct vnode *dvp, **vpp;
@@ -507,7 +517,7 @@
               de = TAILQ_FIRST(&dd->de_dlist);        /* "." */
               de = TAILQ_NEXT(de, de_list);           /* ".." */
               de = de->de_dir;
-               error = devfs_allocv(de, dvp->v_mount, vpp, td);
+               error = devfs_allocv(de, dvp->v_mount, vpp, dm_unlock, td);
               vn_lock(dvp, LK_EXCLUSIVE | LK_RETRY, td);
               return (error);
       }
@@ -564,7 +574,7 @@
                       return (0);
               }
       }
-       error = devfs_allocv(de, dvp->v_mount, vpp, td);
+       error = devfs_allocv(de, dvp->v_mount, vpp, dm_unlock, td);
       return (error);
 }

@@ -573,11 +583,14 @@
 {
       int j;
       struct devfs_mount *dmp;
+       int dm_unlock;

       dmp = VFSTODEVFS(ap->a_dvp->v_mount);
+       dm_unlock = 1;
       sx_xlock(&dmp->dm_lock);
-       j = devfs_lookupx(ap);
-       sx_xunlock(&dmp->dm_lock);
+       j = devfs_lookupx(ap, &dm_unlock);
+       if (dm_unlock == 1)
+               sx_xunlock(&dmp->dm_lock);
       return (j);
 }

@@ -589,6 +602,7 @@
       struct thread *td;
       struct devfs_dirent *dd, *de;
       struct devfs_mount *dmp;
+       int dm_unlock;
       int error;

       /*
@@ -600,6 +614,7 @@
       dvp = ap->a_dvp;
       dmp = VFSTODEVFS(dvp->v_mount);
       sx_xlock(&dmp->dm_lock);
+       dm_unlock = 1;

       cnp = ap->a_cnp;
       vpp = ap->a_vpp;
@@ -620,9 +635,10 @@
       if (de == NULL)
               goto notfound;
       de->de_flags &= ~DE_WHITEOUT;
-       error = devfs_allocv(de, dvp->v_mount, vpp, td);
+       error = devfs_allocv(de, dvp->v_mount, vpp, &dm_unlock, td);
 notfound:
-       sx_xunlock(&dmp->dm_lock);
+       if (dm_unlock == 1)
+               sx_xunlock(&dmp->dm_lock);
       return (error);
 }

@@ -1102,6 +1118,7 @@
       struct devfs_dirent *de;
       struct devfs_mount *dmp;
       struct thread *td;
+       int dm_unlock;

       td = ap->a_cnp->cn_thread;
       KASSERT(td == curthread, ("devfs_symlink: td != curthread"));
@@ -1120,12 +1137,14 @@
       de->de_symlink = malloc(i, M_DEVFS, M_WAITOK);
       bcopy(ap->a_target, de->de_symlink, i);
       sx_xlock(&dmp->dm_lock);
+       dm_unlock = 1;
 #ifdef MAC
       mac_create_devfs_symlink(ap->a_cnp->cn_cred, dmp->dm_mount, dd, de);
 #endif
       TAILQ_INSERT_TAIL(&dd->de_dlist, de, de_list);
-       devfs_allocv(de, ap->a_dvp->v_mount, ap->a_vpp, td);
-       sx_xunlock(&dmp->dm_lock);
+       devfs_allocv(de, ap->a_dvp->v_mount, ap->a_vpp, &dm_unlock, td);
+       if (dm_unlock == 1)
+               sx_xunlock(&dmp->dm_lock);
       return (0);
 }
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2006-09-14 06:03:50 UTC
Responsible Changed
From-To: freebsd-bugs->kib

Take the bug because I already do the work needed there.
Comment 3 dfilter service freebsd_committer freebsd_triage 2006-09-18 14:23:18 UTC
kib         2006-09-18 13:23:08 UTC

  FreeBSD src repository

  Modified files:
    sys/fs/devfs         devfs.h devfs_devs.c devfs_vfsops.c 
                         devfs_vnops.c 
  Log:
  Resolve the devfs deadlock caused by LOR between devfs_mount->dm_lock and
  vnode lock in devfs_allocv. Do this by temporary dropping dm_lock around
  vnode locking.
  
  For safe operation, add hold counters for both devfs_mount and devfs_dirent,
  and DE_DOOMED flag for devfs_dirent. The facilities allow to continue after
  dropping of the dm_lock, by making sure that referenced memory does not
  disappear.
  
  Reviewed by:    tegge
  Tested by:      kris
  Approved by:    kan (mentor)
  PR:             kern/102335
  
  Revision  Changes    Path
  1.30      +11 -0     src/sys/fs/devfs/devfs.h
  1.47      +12 -1     src/sys/fs/devfs/devfs_devs.c
  1.51      +20 -4     src/sys/fs/devfs/devfs_vfsops.c
  1.134     +70 -11    src/sys/fs/devfs/devfs_vnops.c
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
Comment 4 dfilter service freebsd_committer freebsd_triage 2006-10-30 15:33:49 UTC
kib         2006-10-30 15:33:38 UTC

  FreeBSD src repository

  Modified files:        (Branch: RELENG_6)
    sys/fs/devfs         devfs.h devfs_devs.c devfs_vfsops.c 
                         devfs_vnops.c 
  Log:
  MFC:
  rev. 1.30   src/sys/fs/devfs/devfs.h
  rev. 1.47   src/sys/fs/devfs/devfs_devs.c
  rev. 1.51   src/sys/fs/devfs/devfs_vfsops.c
  rev. 1.134  src/sys/fs/devfs/devfs_vnops.c
  
  Resolve the devfs deadlock caused by LOR between devfs_mount->dm_lock
  and vnode lock in devfs_allocv. Add hold counters for both devfs_mount
  and devfs_dirent, and DE_DOOMED flag for devfs_dirent.
  
  Approved by:    re (hrs)
  PR:             kern/102335
  
  Revision    Changes    Path
  1.22.2.4    +11 -0     src/sys/fs/devfs/devfs.h
  1.36.2.5    +12 -1     src/sys/fs/devfs/devfs_devs.c
  1.44.2.5    +20 -4     src/sys/fs/devfs/devfs_vfsops.c
  1.114.2.10  +70 -11    src/sys/fs/devfs/devfs_vnops.c
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2006-10-31 03:09:02 UTC
State Changed
From-To: open->closed

Fix committed to CURRENT and RELENG_6.