Bug 129148 - [zfs] [panic] panic on concurrent writing & rollback
Summary: [zfs] [panic] panic on concurrent writing & rollback
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.0-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: Pawel Jakub Dawidek
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-24 20:20 UTC by swell.k
Modified: 2009-09-22 13:40 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description swell.k 2008-11-24 20:20:03 UTC
	When doing rollbacks on snapshot multiple times there is
	chance to encounter a panic.
%%%
# sh crash.sh
lock order reversal:
 1st 0xffffff0002888638 vnode interlock (vnode interlock) @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3699
 2nd 0xffffff0002429710 struct mount mtx (struct mount mtx) @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1050
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x81e
_mtx_lock_flags() at _mtx_lock_flags+0x78
zfs_znode_free() at zfs_znode_free+0x84
zfs_freebsd_inactive() at zfs_freebsd_inactive+0x1a
VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0xb5
vinactive() at vinactive+0x90
vput() at vput+0x25c
vn_close() at vn_close+0xb9
vn_closefile() at vn_closefile+0x7d
_fdrop() at _fdrop+0x23
closef() at closef+0x4d
do_dup() at do_dup+0x351
syscall() at syscall+0x1e7
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (90, FreeBSD ELF64, dup2), rip = 0x80093b08c, rsp = 0x7fffffffe328, rbp = 0x800b0d0a0 ---
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
vfs_badlock() at vfs_badlock+0x95
VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0xc8
vinactive() at vinactive+0x90
vput() at vput+0x25c
vn_close() at vn_close+0xb9
vn_closefile() at vn_closefile+0x7d
_fdrop() at _fdrop+0x23
closef() at closef+0x4d
do_dup() at do_dup+0x351
syscall() at syscall+0x1e7
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (90, FreeBSD ELF64, dup2), rip = 0x80093b08c, rsp = 0x7fffffffe328, rbp = 0x800b0d0a0 ---
VOP_INACTIVE: 0xffffff00028884e0 interlock is locked but should not be
KDB: enter: lock violation
[thread pid 85 tid 100056 ]
Stopped at      kdb_enter+0x3d: movq    $0,0x65c598(%rip)

db> show all locks
Process 85 (sh) thread 0xffffff0002427390 (100056)
exclusive sleep mutex vnode interlock (vnode interlock) r = 0 (0xffffff0002888638) locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:3699
exclusive lockmgr zfs (zfs) r = 0 (0xffffff0002888578) locked @ /usr/src/sys/kern/vfs_vnops.c:293

db> show lockedvnods
Locked vnodes

0xffffff00028884e0: tag zfs, type VREG
    usecount 0, writecount 0, refcount 1 mountedhere 0
    flags (VI_DOINGINACT)
 VI_LOCKed    v_object 0xffffff0002886960 ref 0 pages 0
     lock type zfs: EXCL by thread 0xffffff0002427390 (pid 85)
#0 0xffffffff804dfc78 at __lockmgr_args+0x758
#1 0xffffffff8056de19 at vop_stdlock+0x39
#2 0xffffffff8080d77b at VOP_LOCK1_APV+0x9b
#3 0xffffffff805894a7 at _vn_lock+0x57
#4 0xffffffff8058a58e at vn_close+0x6e
#5 0xffffffff8058a6bd at vn_closefile+0x7d
#6 0xffffffff804c7443 at _fdrop+0x23
#7 0xffffffff804c8a6d at closef+0x4d
#8 0xffffffff804c9ed1 at do_dup+0x351
#9 0xffffffff807c9d27 at syscall+0x1e7
#10 0xffffffff807ac85b at Xfast_syscall+0xab

db> show all pcpu
Current CPU: 0

cpuid        = 0
curthread    = 0xffffff0002427390: pid 85 "sh"
curpcb       = 0xfffffffe40180d50
fpcurthread  = none
idlethread   = 0xffffff00021cc720: pid 11 "idle: cpu0"
spin locks held:
%%%

Complete msgbuf with ps and alltrace is here:
http://pastebin.com/f44ad88b3

It can occur with a slightly different message:

%%%
# sh crash.sh
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x70
fault code              = supervisor read data, page not present
instruction pointer     = 0x8:0xffffffff804fb57a
stack pointer           = 0x10:0xfffffffe401997a0
frame pointer           = 0x10:0xfffffffe401997e0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, IOPL = 0
current process         = 179 (zfs)
[thread pid 179 tid 100061 ]
Stopped at      _sx_xlock+0x3a: movq    0x18(%rdi),%rax

db> bt
Tracing pid 179 tid 100061 td 0xffffff000245dab0
_sx_xlock() at _sx_xlock+0x3a
dmu_buf_update_user() at dmu_buf_update_user+0x47
zfs_znode_dmu_fini() at zfs_znode_dmu_fini+0x38
zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0xbe
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xb5
vgonel() at vgonel+0x119
vflush() at vflush+0x284
zfs_umount() at zfs_umount+0x105
dounmount() at dounmount+0x2ed
unmount() at unmount+0x24b
syscall() at syscall+0x1e7
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (22, FreeBSD ELF64, unmount), rip = 0x800f401cc, rsp = 0x7fffffffe478, rbp = 0x801202300 ---

db> show all locks
Process 179 (zfs) thread 0xffffff000245dab0 (100061)
exclusive lockmgr zfs (zfs) r = 0 (0xffffff0002693098) locked @ /usr/src/sys/kern/vfs_subr.c:2358
exclusive sleep mutex Giant (Giant) r = 0 (0xffffffff80b5eea0) locked @ /usr/src/sys/kern/vfs_mount.c:1139
exclusive lockmgr zfs (zfs) r = 0 (0xffffff0002693a58) locked @ /usr/src/sys/kern/vfs_mount.c:1207

db> show lockedvnods
Locked vnodes

0xffffff00026939c0: tag zfs, type VDIR
    usecount 1, writecount 0, refcount 1 mountedhere 0xffffff0002432710
    flags ()
     lock type zfs: EXCL by thread 0xffffff000245dab0 (pid 179)
#0 0xffffffff804dfc78 at __lockmgr_args+0x758
#1 0xffffffff8056de19 at vop_stdlock+0x39
#2 0xffffffff8080d77b at VOP_LOCK1_APV+0x9b
#3 0xffffffff805894a7 at _vn_lock+0x57
#4 0xffffffff80577303 at dounmount+0x93
#5 0xffffffff80577adb at unmount+0x24b
#6 0xffffffff807c9d27 at syscall+0x1e7
#7 0xffffffff807ac85b at Xfast_syscall+0xab


0xffffff0002693000: tag zfs, type VREG
    usecount 0, writecount 0, refcount 1 mountedhere 0
    flags (VI_DOOMED)
     lock type zfs: EXCL by thread 0xffffff000245dab0 (pid 179)
#0 0xffffffff804dfc78 at __lockmgr_args+0x758
#1 0xffffffff8056de19 at vop_stdlock+0x39
#2 0xffffffff8080d77b at VOP_LOCK1_APV+0x9b
#3 0xffffffff805894a7 at _vn_lock+0x57
#4 0xffffffff8057fecf at vflush+0x20f
#5 0xffffffff80f5f175 at zfs_umount+0x105
#6 0xffffffff8057755d at dounmount+0x2ed
#7 0xffffffff80577adb at unmount+0x24b
#8 0xffffffff807c9d27 at syscall+0x1e7
#9 0xffffffff807ac85b at Xfast_syscall+0xab

db> show all pcpu
Current CPU: 0

cpuid        = 0
curthread    = 0xffffff000245dab0: pid 179 "zfs"
curpcb       = 0xfffffffe40199d50
fpcurthread  = none
idlethread   = 0xffffff00021cc720: pid 11 "idle: cpu0"
spin locks held:
%%%

Again, full session with ps and alltrace include is here:
http://pastebin.com/f21e46723

BTW, here is a backup of this message in case it's mangled:
http://pastebin.com/f46eeff65

How-To-Repeat: 
It's not very reliable but the following script triggers it very
often. If the panic don't occur within a minute then there is a chance
it will occur after you interrupt and restart the script.

%%%
#! /bin/sh
# crash.sh

PATH=/sbin:/bin

pool=q
dataset=test
snapshot=last
prefix=foo_
cycles=999999999

zfs destroy -r $pool/$dataset
zfs create $pool/$dataset
zfs snapshot $pool/$dataset@$snapshot

mountpoint=$(zfs get -Ho value mountpoint $pool/$dataset)

loop() {
    local i=0
    while [ $((i+=1)) -lt $cycles ]; do
	eval $@
    done &
    pids="$pids $!"
}
trap 'kill $pids' int term exit

# juggle these
loop : \>$mountpoint/$prefix\${i}
loop zfs rollback $pool/$dataset@$snapshot

wait
%%%
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2008-11-25 05:12:35 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2009-05-28 23:25:16 UTC
Responsible Changed
From-To: pjd->freebsd-fs

With pjd's permission, reassing ZFS-related PRs to freebsd-fs.
Comment 3 swell.k 2009-07-11 03:28:03 UTC
This panic makes rollback feature really unusable. I usually do:

    # cd /usr/src (svn checkout sources)
    # zfs snapshot q/usr/src@blah
    # zcat ~/some_big_patch.bz2 | patch -Efsp0 -F0
    ...
    # zfs rollback q/usr/src@blah
    # zcat ~/another_big_patch.bz2 | patch -Efsp0 -F0
  BANG! panics here

Here is recent one for
FreeBSD 8.0-BETA1 #0: Sat Jul  4 03:55:14 UTC 2009
    root@almeida.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386

--- panic begins here ---
$ qemu -no-kqemu -echr 3 -nographic \
     -hda /dev/zvol/h/home/luser/freebsd-i386 \
     -hdb /dev/zvol/h/home/luser/freebsd-i386-zpool
[...]
# zpool create q ad1
# sh crash.sh
cannot open 'q/test': dataset does not exist
load: 0.90  cmd: sh 66 [runnable] 2.53r 0.28u 1.92s 29% 1808k

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x4c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc087c9c3
stack pointer           = 0x28:0xc89f5790
frame pointer           = 0x28:0xc89f57b0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, IOPL = 0
current process         = 66 (sh)
[thread pid 66 tid 100041 ]
Stopped at      _sx_xlock+0x43: movl    0x10(%ebx),%eax
db> show all locks
Process 66 (sh) thread 0xc238db40 (100041)
exclusive lockmgr zfs (zfs) r = 0 (0xc2613270) locked @ /usr/src/sys/kern/vfs_subr.c:880
exclusive lockmgr zfs (zfs) r = 0 (0xc2613594) locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:856
db> show lockedvnods
Locked vnodes

0xc261353c: tag zfs, type VDIR
    usecount 2, writecount 0, refcount 2 mountedhere 0
    flags (VV_ROOT)
    lock type zfs: EXCL by thread 0xc238db40 (pid 66)

0xc2613218: tag zfs, type VREG
    usecount 0, writecount 0, refcount 1 mountedhere 0
    flags (VI_DOOMED)
    lock type zfs: EXCL by thread 0xc238db40 (pid 66)
db> show all pcpu
Current CPU: 0

cpuid        = 0
dynamic pcpu    = 0x6aed54
curthread    = 0xc238db40: pid 66 "sh"
curpcb       = 0xc89f5d90
fpcurthread  = 0xc238db40: pid 66 "sh"
idlethread   = 0xc2156b40: pid 11 "idle: cpu0"
APIC ID      = 0
currentldt   = 0x50
spin locks held:

db> bt
Tracing pid 66 tid 100041 td 0xc238db40
_sx_xlock(3c,0,c24ba14d,70f,c2617ae0,...) at _sx_xlock+0x43
dmu_buf_update_user(0,c2617ae0,0,0,0,...) at dmu_buf_update_user+0x35
zfs_znode_dmu_fini(c2617ae0,c24c2fed,1114,110b,c26d5000,...) at zfs_znode_dmu_fini+0x43
zfs_freebsd_reclaim(c89f5858,1,0,c2613218,c89f587c,...) at zfs_freebsd_reclaim+0xc0
VOP_RECLAIM_APV(c24c65a0,c89f5858,0,0,c261328c,...) at VOP_RECLAIM_APV+0xa5
vgonel(c261328c,0,c0c6567e,386,0,...) at vgonel+0x1a4
vnlru_free(c0f16ef0,0,c0c6567e,3a1,c2da47ac,...) at vnlru_free+0x2d5
getnewvnode(c24c0cfc,c237778c,c24c65a0,c89f58fc,c25baa80,...) at getnewvnode+0x4a
zfs_znode_cache_constructor(c25bbe00,2,c24c1357,2fd,c2db8880,...) at zfs_znode_cache_constructor+0x2e
zfs_znode_alloc(c26d5498,0,c24c1357,2fd,c89f5978,...) at zfs_znode_alloc+0x35
zfs_mknode(c2617bc8,c89f5a60,c2d97000,c2152080,0,...) at zfs_mknode+0x286
zfs_freebsd_create(c89f5ac8,c89f5ae0,0,0,c89f5ba8,...) at zfs_freebsd_create+0x722
VOP_CREATE_APV(c24c65a0,c89f5ac8,c89f5bd4,c89f5a60,0,...) at VOP_CREATE_APV+0xa5
vn_open_cred(c89f5ba8,c89f5c5c,1a4,0,c2152080,...) at vn_open_cred+0x200
vn_open(c89f5ba8,c89f5c5c,1a4,c238f310,14,...) at vn_open+0x3b
kern_openat(c238db40,ffffff9c,28304378,0,602,...) at kern_openat+0x118
kern_open(c238db40,28304378,0,601,1b6,...) at kern_open+0x35
open(c238db40,c89f5cf8,c,c0c5ee4b,c0d3c0ac,...) at open+0x30
syscall(c89f5d38) at syscall+0x2a3
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (5, FreeBSD ELF32, open), eip = 0x281d8fe3, esp = 0xbfbfe6cc, ebp = 0xbfbfe768 ---
db> ps
  pid  ppid  pgrp   uid   state   wmesg     wchan    cmd
   69     0     0     0  SL      tq->tq_d 0xc257141c [zil_clean]
   68    67    60     0  S+      tx->tx_s 0xc27599e0 zfs
   67    60    60     0  S+      wait     0xc25de7f8 sh
   66    60    60     0  R+      CPU 0               sh
   60    19    60     0  S+      wait     0xc25ded48 sh
   57     0     0     0  SL      tq->tq_d 0xc25715a4 [zil_clean]
   54     0     0     0  SL      zio->io_ 0xc265ab94 [txg_thread_enter]
   53     0     0     0  SL      tx->tx_q 0xc27599e8 [txg_thread_enter]
   52     0     0     0  RL                          [vdev:worker ad1]
   51     0     0     0  SL      tq->tq_d 0xc2571668 [spa_zio]
   50     0     0     0  SL      tq->tq_d 0xc257172c [spa_zio]
   49     0     0     0  SL      tq->tq_d 0xc25717f0 [spa_zio]
   48     0     0     0  SL      tq->tq_d 0xc25718b4 [spa_zio]
   47     0     0     0  SL      tq->tq_d 0xc2571978 [spa_zio]
   46     0     0     0  SL      tq->tq_d 0xc2571a3c [spa_zio]
   45     0     0     0  SL      tq->tq_d 0xc2571b00 [spa_zio]
   44     0     0     0  SL      tq->tq_d 0xc2571bc4 [spa_zio]
   43     0     0     0  SL      tq->tq_d 0xc2571bc4 [spa_zio]
   42     0     0     0  SL      tq->tq_d 0xc2571bc4 [spa_zio]
   41     0     0     0  SL      tq->tq_d 0xc2571bc4 [spa_zio]
   40     0     0     0  SL      tq->tq_d 0xc2571bc4 [spa_zio]
   39     0     0     0  SL      tq->tq_d 0xc2571bc4 [spa_zio]
   38     0     0     0  SL      tq->tq_d 0xc2571bc4 [spa_zio]
   37     0     0     0  SL      tq->tq_d 0xc2571bc4 [spa_zio]
   36     0     0     0  SL      tq->tq_d 0xc2571c88 [spa_zio]
   35     0     0     0  SL      tq->tq_d 0xc2571c88 [spa_zio]
   34     0     0     0  SL      tq->tq_d 0xc2571c88 [spa_zio]
   33     0     0     0  SL      tq->tq_d 0xc2571c88 [spa_zio]
   32     0     0     0  SL      tq->tq_d 0xc2571c88 [spa_zio]
   31     0     0     0  SL      tq->tq_d 0xc2571c88 [spa_zio]
   30     0     0     0  SL      tq->tq_d 0xc2571c88 [spa_zio]
   29     0     0     0  SL      tq->tq_d 0xc2571c88 [spa_zio]
   28     0     0     0  SL      tq->tq_d 0xc2571d4c [spa_zio]
   27     0     0     0  SL      tq->tq_d 0xc2571e10 [spa_zio]
   26     0     0     0  SL      tq->tq_d 0xc2571ed4 [spa_zio]
   25     0     0     0  RL                          [l2arc_feed_thread]
   24     0     0     0  RL                          [arc_reclaim_thread]
   23     0     0     0  RL                          [vaclean]
   22     0     0     0  SL      tq->tq_d 0xc2572048 [system_taskq]
   19     1    19     0  Ss+     wait     0xc2376550 sh
   18     0     0     0  SL      flowclea 0xc0daa4a4 [flowcleaner]
   17     0     0     0  RL                          [softdepflush]
   16     0     0     0  RL                          [vnlru]
   15     0     0     0  RL                          [syncer]
   14     0     0     0  RL                          [bufdaemon]
    9     0     0     0  SL      pgzero   0xc0f23314 [pagezero]
    8     0     0     0  SL      psleep   0xc0f22f3c [vmdaemon]
    7     0     0     0  RL                          [pagedaemon]
    6     0     0     0  SL      waiting_ 0xc0f18d5c [sctp_iterator]
    5     0     0     0  SL      ccb_scan 0xc0d76fd4 [xpt_thrd]
   13     0     0     0  SL      -        0xc0daa4a4 [yarrow]
    4     0     0     0  SL      -        0xc0da8264 [g_down]
    3     0     0     0  SL      -        0xc0da8260 [g_up]
    2     0     0     0  SL      -        0xc0da8258 [g_event]
   12     0     0     0  WL      (threaded)          intr
100029                   I                           [swi0: uart]
100028                   I                           [irq7: ppc0]
100027                   I                           [irq12: psm0]
100026                   I                           [irq1: atkbd0]
100025                   I                           [irq11: ed0]
100024                   I                           [irq15: ata1]
100023                   I                           [irq14: ata0]
100022                   I                           [irq9: acpi0]
100021                   I                           [swi6: task queue]
100020                   I                           [swi6: Giant taskq]
100018                   I                           [swi5: +]
100013                   I                           [swi2: cambio]
100006                   I                           [swi3: vm]
100005                   I                           [swi4: clock]
100004                   I                           [swi1: netisr 0]
   11     0     0     0  RL                          [idle: cpu0]
    1     0     1     0  SLs     wait     0xc2154d48 [init]
   10     0     0     0  SL      audit_wo 0xc0f21f80 [audit]
    0     0     0     0  SLs     (threaded)          kernel
100019                   D       -        0xc2220d40 [thread taskq]
100017                   D       -        0xc2221100 [kqueue taskq]
100016                   D       -        0xc2221140 [acpi_task_2]
100015                   D       -        0xc2221140 [acpi_task_1]
100014                   D       -        0xc2221140 [acpi_task_0]
100010                   D       -        0xc2138940 [firmware taskq]
100000                   D       sched    0xc0da8320 [swapper]
--- panic ends here ---
Comment 4 swell.k 2009-09-14 18:51:57 UTC
pjd@FreeBSD.org writes:

> Synopsis: [zfs] [panic] panic on concurrent writing & rollback
>
> State-Changed-From-To: open->patched
> State-Changed-By: pjd
> State-Changed-When: pon 14 wrz 2009 07:02:21 UTC
> State-Changed-Why: 
> I believe it is already fixed in HEAD. Could you verify?
> If you can't try HEAD, it sould be in 8 soon.

I'm no longer able to reproduce it on r197192M neither with the script
nor with patch & rollback & patch sequence.
Comment 5 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2014-06-01 06:57:42 UTC
Responsible Changed
From-To: freebsd-fs->pjd

Let me see...
Comment 6 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2014-06-01 06:57:42 UTC
State Changed
From-To: open->patched

I believe it is already fixed in HEAD. Could you verify? 
If you can't try HEAD, it sould be in 8 soon. 


Comment 7 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2014-06-01 06:57:42 UTC
Responsible Changed
From-To: freebsd-fs->pjd

I'll take this one.
Comment 8 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2014-06-01 06:57:43 UTC
State Changed
From-To: patched->closed

Fix merged to stable/8.