Bug 145339 - [zfs] deadlock after detaching block device from raidz pool
Summary: [zfs] deadlock after detaching block device from raidz pool
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.0-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-03 08:47 UTC by Alex.Bakhtin
Modified: 2017-12-31 22:32 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex.Bakhtin 2010-04-03 08:47:07 UTC
Detaching (physically) block device when there is intensive writing to the pool causes deadlock. Tested on 8.0-STABLE/amd64 csuped at 02 Apr 2010. gmirror on the same system handles device detach properly. Detaching device when zfs is idle doesn't cause any problem.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x48
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff805815f9
stack pointer           = 0x28:0xffffff8000065b80
frame pointer           = 0x28:0xffffff8000065bb0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 3 (g_up)
exclusive spin mutex uart_hwmtx (uart_hwmtx) r = 0 (0xffffff0002a62838) locked @ /usr/src.old/sys/dev/uart/uart_cpu.h:92
exclusive lockmgr zfs (zfs) r = 0 (0xffffff0123079098) locked @ /usr/src.old/sys/kern/vfs_vnops.c:607
exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xffffff000c3728f0) locked @ /usr/src.old/sys/kern/uipc_sockbuf.c:148
exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xffffff000c35a8f0) locked @ /usr/src.old/sys/kern/uipc_sockbuf.c:148
exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xffffff0121a31648) locked @ /usr/src.old/sys/kern/uipc_sockbuf.c:148

0xffffff0123079000: tag zfs, type VREG
    usecount 1, writecount 1, refcount 1 mountedhere 0
    flags ()
    v_object 0xffffff0126114e58 ref 0 pages 0
    lock type zfs: EXCL by thread 0xffffff000c2d7740 (pid 2134)
#0 0xffffffff80579d27 at __lockmgr_args+0x777
#1 0xffffffff80613339 at vop_stdlock+0x39
#2 0xffffffff808d020b at VOP_LOCK1_APV+0x9b
#3 0xffffffff806300d7 at _vn_lock+0x57
#4 0xffffffff806316d8 at vn_write+0x218
#5 0xffffffff805d71e5 at dofilewrite+0x85
#6 0xffffffff805d89e0 at kern_writev+0x60
#7 0xffffffff805d8ae5 at write+0x55
#8 0xffffffff8087b488 at syscall+0x118
#9 0xffffffff80861611 at Xfast_syscall+0xe1


db:0:kdb.enter.default>  bt
Tracing pid 3 tid 100010 td 0xffffff0002899740
_mtx_lock_flags() at _mtx_lock_flags+0x39
vdev_geom_io_intr() at vdev_geom_io_intr+0x62
g_io_schedule_up() at g_io_schedule_up+0xed
g_up_procbody() at g_up_procbody+0x6f
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff8000065d30, rbp = 0 ---



tarzan-new# zdb -vvv
storage
    version=14
    name='storage'
    state=0
    txg=578
    pool_guid=3309800284037274155
    hostid=4266611921
    hostname='tarzan-new.private.flydrag.ru'
    vdev_tree
        type='root'
        id=0
        guid=3309800284037274155
        children[0]
                type='raidz'
                id=0
                guid=11076638880661644944
                nparity=1
                metaslab_array=23
                metaslab_shift=36
                ashift=9
                asize=10001970626560
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=134064330288565023
                        path='/dev/ad10'
                        whole_disk=0
                        DTL=33
                children[1]
                        type='disk'
                        id=1
                        guid=6567589632071309972
                        path='/dev/ad12'
                        whole_disk=0
                        DTL=32
                children[2]
                        type='disk'
                        id=2
                        guid=6024702546194706986
                        path='/dev/ad14'
                        whole_disk=0
                        DTL=27
                children[3]
                        type='disk'
                        id=3
                        guid=10837092740689261565
                        path='/dev/ad16'
                        whole_disk=0
                        DTL=31
                children[4]
                        type='disk'
                        id=4
                        guid=4165337351109841378
                        path='/dev/ad18'
                        whole_disk=0
                        DTL=30
tarzan-new#

Core:
http://flydrag.dyndns.org:9090/freebsd/zfs-deadlock/core.txt.9

How-To-Repeat: Install 8.0-STABLE, create raidz pool,
run command dd if=/dev/zero of=/zfs/test bs=10m
detach (physically) one hard disk when it is writing.
Comment 1 Remko Lodder freebsd_committer freebsd_triage 2010-04-03 08:53:02 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Reassign to FS team.
Comment 2 Andriy Gapon 2010-04-16 21:09:38 UTC
Are you sure that this is a deadlock?
If yes, could you please describe what you see in more details.

I am asking because to me it seems like a NULL pointer crash:
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> fault virtual address = 0x48

It looks like perhaps zio->io_vd became NULL while an I/O response was traveling
up and vdev_geom_io_intr was not prepared to handle that.

> _mtx_lock_flags() at _mtx_lock_flags+0x39
> vdev_geom_io_intr() at vdev_geom_io_intr+0x62
> g_io_schedule_up() at g_io_schedule_up+0xed
> g_up_procbody() at g_up_procbody+0x6f
> fork_exit() at fork_exit+0x12a
> fork_trampoline() at fork_trampoline+0xe


-- 
Andriy Gapon
Comment 3 Alex.Bakhtin 2010-04-21 20:42:20 UTC
Andriy,

    Sorry for delay, gmail put your mail into spam folder.

> Are you sure that this is a deadlock?

    Sorry, the problem description seems to be not 100 percent clear.

> If yes, could you please describe what you see in more details.

    On GENERIC I discovered a deadlock when I detach device from raidz
pool if there is intensive writing to the pool. The box responds to
pings but doesn't respond to power button (ACPI request ignored). I
built kernel with the following config:

> cat /sys/amd64/conf/DEBUG
include GENERIC

ident          DEBUG

options         ALT_BREAK_TO_DEBUGGER

options         INVARIANTS
options         INVARIANT_SUPPORT
options         WITNESS
options         DEBUG_LOCKS
options         DEBUG_VFS_LOCKS
options         DIAGNOSTIC
options         KDB
options         DDB

options         INCLUDE_CONFIG_FILE

and got this crash. After looking into crashinfo I assumed that it
crashes in _mtx_lock_flags because of debugging options (as I can see
- there are many asserts in this function) but probably I'm wrong. I
checked /mnt/crash directory and discovered that there is a full crash
info gathered by sysutils/bsdcrashtar. Probably, this info could help
to find the root cause?

 tar tvzf crash.10.tar.gz
drwxr-xr-x  0 root   wheel       0 Apr  3 06:13 crash.10/
lrwxr-xr-x  0 root   wheel       0 Apr  3 06:13 crash.10/machine ->
usr/src.old/sys/amd64/include
drwxr-xr-x  0 root   wheel       0 Apr  3 06:13 crash.10/mnt/
drwxr-xr-x  0 root   wheel       0 Apr  3 06:13 crash.10/usr/
drwxr-xr-x  0 root   wheel       0 Apr  3 06:13 crash.10/boot/
-rwxr-xr-x  0 root   wheel     211 Apr  3 06:13 crash.10/debug.sh
-rw-r--r--  0 root   wheel      50 Apr  3 06:13 crash.10/README
drwxr-xr-x  0 root   wheel       0 Apr  3 06:13 crash.10/boot/kernel/
-r-xr-xr-x  0 root   wheel 12581947 Apr  3 06:13 crash.10/boot/kernel/kernel
-r-xr-xr-x  0 root   wheel 44335787 Apr  3 06:13
crash.10/boot/kernel/kernel.symbols
-r-xr-xr-x  0 root   wheel  1532664 Apr  3 06:13 crash.10/boot/kernel/zfs.ko
-r-xr-xr-x  0 root   wheel 12693960 Apr  3 06:13
crash.10/boot/kernel/zfs.ko.symbols
-r-xr-xr-x  0 root   wheel     9832 Apr  3 06:13
crash.10/boot/kernel/opensolaris.ko
-r-xr-xr-x  0 root   wheel   145808 Apr  3 06:13
crash.10/boot/kernel/opensolaris.ko.symbols
-r-xr-xr-x  0 root   wheel   146048 Apr  3 06:13
crash.10/boot/kernel/geom_mirror.ko
-r-xr-xr-x  0 root   wheel   314512 Apr  3 06:13
crash.10/boot/kernel/geom_mirror.ko.symbols
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/amd64/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/cam/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/ddb/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/dev/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/fs/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/geom/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/kern/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/modules/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/cddl/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/net/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/nfs/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/nfsserver/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/rpc/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/security/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/sys/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/ufs/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/usr/src.old/sys/vm/
-rw-r--r--  0 root   wheel    83274 Apr  3 06:13
crash.10/usr/src.old/sys/vm/uma_core.c
-rw-r--r--  0 root   wheel    26799 Apr  3 06:13
crash.10/usr/src.old/sys/vm/vm_glue.c
-rw-r--r--  0 root   wheel   104178 Apr  3 06:13
crash.10/usr/src.old/sys/vm/vm_map.c
-rw-r--r--  0 root   wheel    45074 Apr  3 06:13
crash.10/usr/src.old/sys/vm/vm_pageout.c
-rw-r--r--  0 root   wheel     4978 Apr  3 06:13
crash.10/usr/src.old/sys/vm/vm_zeroidle.c
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/ufs/ffs/
-rw-r--r--  0 root   wheel   189645 Apr  3 06:13
crash.10/usr/src.old/sys/ufs/ffs/ffs_softdep.c
-rw-r--r--  0 root   wheel    18465 Apr  3 06:13
crash.10/usr/src.old/sys/sys/buf.h
-rw-r--r--  0 root   wheel     8941 Apr  3 06:13
crash.10/usr/src.old/sys/sys/file.h
-rw-r--r--  0 root   wheel    32408 Apr  3 06:13
crash.10/usr/src.old/sys/sys/mbuf.h
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/security/audit/
-rw-r--r--  0 root   wheel    15538 Apr  3 06:13
crash.10/usr/src.old/sys/security/audit/audit_worker.c
-rw-r--r--  0 root   wheel    30942 Apr  3 06:13
crash.10/usr/src.old/sys/rpc/svc.c
-rw-r--r--  0 root   wheel    13440 Apr  3 06:13
crash.10/usr/src.old/sys/nfsserver/nfs_srvkrpc.c
-rw-r--r--  0 root   wheel     4928 Apr  3 06:13
crash.10/usr/src.old/sys/nfs/nfs_nfssvc.c
-rw-r--r--  0 root   wheel    45178 Apr  3 06:13
crash.10/usr/src.old/sys/net/flowtable.c
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/compat/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/
-rw-r--r--  0 root   wheel   129959 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
-rw-r--r--  0 root   wheel    17453 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
-rw-r--r--  0 root   wheel   112698 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
-rw-r--r--  0 root   wheel    15102 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c
-rw-r--r--  0 root   wheel    15178 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
-rw-r--r--  0 root   wheel   120600 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
-rw-r--r--  0 root   wheel    63052 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/compat/opensolaris/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/compat/opensolaris/kern/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/compat/opensolaris/sys/
-rw-r--r--  0 root   wheel     3843 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/compat/opensolaris/sys/atomic.h
-rw-r--r--  0 root   wheel     3673 Apr  3 06:13
crash.10/usr/src.old/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/modules/zfs/
-rw-r--r--  0 root   wheel    21756 Apr  3 06:13
crash.10/usr/src.old/sys/kern/init_main.c
-rw-r--r--  0 root   wheel    11706 Apr  3 06:13
crash.10/usr/src.old/sys/kern/kern_condvar.c
-rw-r--r--  0 root   wheel    24056 Apr  3 06:13
crash.10/usr/src.old/sys/kern/kern_exit.c
-rw-r--r--  0 root   wheel    22111 Apr  3 06:13
crash.10/usr/src.old/sys/kern/kern_fork.c
-rw-r--r--  0 root   wheel    46996 Apr  3 06:13
crash.10/usr/src.old/sys/kern/kern_intr.c
-rw-r--r--  0 root   wheel    25236 Apr  3 06:13
crash.10/usr/src.old/sys/kern/kern_malloc.c
-rw-r--r--  0 root   wheel    23736 Apr  3 06:13
crash.10/usr/src.old/sys/kern/kern_mutex.c
-rw-r--r--  0 root   wheel    79669 Apr  3 06:13
crash.10/usr/src.old/sys/kern/kern_sig.c
-rw-r--r--  0 root   wheel    15720 Apr  3 06:13
crash.10/usr/src.old/sys/kern/kern_synch.c
-rw-r--r--  0 root   wheel    36031 Apr  3 06:13
crash.10/usr/src.old/sys/kern/kern_time.c
-rw-r--r--  0 root   wheel    71672 Apr  3 06:13
crash.10/usr/src.old/sys/kern/sched_ule.c
-rw-r--r--  0 root   wheel    12187 Apr  3 06:13
crash.10/usr/src.old/sys/kern/subr_kdb.c
-rw-r--r--  0 root   wheel    33114 Apr  3 06:13
crash.10/usr/src.old/sys/kern/subr_sleepqueue.c
-rw-r--r--  0 root   wheel    10565 Apr  3 06:13
crash.10/usr/src.old/sys/kern/subr_taskqueue.c
-rw-r--r--  0 root   wheel    34998 Apr  3 06:13
crash.10/usr/src.old/sys/kern/sys_generic.c
-rw-r--r--  0 root   wheel    48706 Apr  3 06:13
crash.10/usr/src.old/sys/kern/tty.c
-rw-r--r--  0 root   wheel    28015 Apr  3 06:13
crash.10/usr/src.old/sys/kern/tty_ttydisc.c
-rw-r--r--  0 root   wheel    93141 Apr  3 06:13
crash.10/usr/src.old/sys/kern/uipc_socket.c
-rw-r--r--  0 root   wheel   109887 Apr  3 06:13
crash.10/usr/src.old/sys/kern/vfs_bio.c
-rw-r--r--  0 root   wheel   110274 Apr  3 06:13
crash.10/usr/src.old/sys/kern/vfs_subr.c
-rw-r--r--  0 root   wheel    19944 Apr  3 06:13
crash.10/usr/src.old/sys/geom/geom_io.c
-rw-r--r--  0 root   wheel     6915 Apr  3 06:13
crash.10/usr/src.old/sys/geom/geom_kern.c
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/fs/devfs/
-rw-r--r--  0 root   wheel    36833 Apr  3 06:13
crash.10/usr/src.old/sys/fs/devfs/devfs_vnops.c
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/dev/fdc/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/dev/md/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/dev/random/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/dev/usb/
-rw-r--r--  0 root   wheel    13618 Apr  3 06:13
crash.10/usr/src.old/sys/dev/usb/usb_process.c
-rw-r--r--  0 root   wheel    11623 Apr  3 06:13
crash.10/usr/src.old/sys/dev/random/randomdev_soft.c
-rw-r--r--  0 root   wheel    31483 Apr  3 06:13
crash.10/usr/src.old/sys/dev/md/md.c
-rw-r--r--  0 root   wheel    49685 Apr  3 06:13
crash.10/usr/src.old/sys/dev/fdc/fdc.c
-rw-r--r--  0 root   wheel    17269 Apr  3 06:13
crash.10/usr/src.old/sys/ddb/db_command.c
-rw-r--r--  0 root   wheel     6005 Apr  3 06:13
crash.10/usr/src.old/sys/ddb/db_main.c
-rw-r--r--  0 root   wheel    15802 Apr  3 06:13
crash.10/usr/src.old/sys/ddb/db_script.c
-rw-r--r--  0 root   wheel   125529 Apr  3 06:13
crash.10/usr/src.old/sys/cam/cam_xpt.c
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/amd64/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/pc/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/xen/
-rw-r--r--  0 root   wheel     1848 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/_bus.h
-rw-r--r--  0 root   wheel     8622 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/_inttypes.h
-rw-r--r--  0 root   wheel     4152 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/_limits.h
-rw-r--r--  0 root   wheel     5605 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/_stdint.h
-rw-r--r--  0 root   wheel     4437 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/_types.h
-rw-r--r--  0 root   wheel     3188 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/acpica_machdep.h
-rw-r--r--  0 root   wheel    14393 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/apicreg.h
-rw-r--r--  0 root   wheel     9118 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/apicvar.h
-rw-r--r--  0 root   wheel     3183 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/asm.h
-rw-r--r--  0 root   wheel     7814 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/asmacros.h
-rw-r--r--  0 root   wheel    16060 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/atomic.h
-rw-r--r--  0 root   wheel    33225 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/bus.h
-rw-r--r--  0 root   wheel     1558 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/bus_dma.h
-rw-r--r--  0 root   wheel     1036 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/clock.h
-rw-r--r--  0 root   wheel     2850 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/cpu.h
-rw-r--r--  0 root   wheel    14534 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/cpufunc.h
-rw-r--r--  0 root   wheel     2218 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/cputypes.h
-rw-r--r--  0 root   wheel     3175 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/db_machdep.h
-rw-r--r--  0 root   wheel     3938 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/elf.h
-rw-r--r--  0 root   wheel     4572 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/endian.h
-rw-r--r--  0 root   wheel     1830 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/exec.h
-rw-r--r--  0 root   wheel     3135 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/float.h
-rw-r--r--  0 root   wheel     2099 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/floatingpoint.h
-rw-r--r--  0 root   wheel     3912 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/fpu.h
-rw-r--r--  0 root   wheel     2808 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/frame.h
-rw-r--r--  0 root   wheel     1867 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/gdb_machdep.h
-rw-r--r--  0 root   wheel     8880 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/ieeefp.h
-rw-r--r--  0 root   wheel     2951 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/in_cksum.h
-rw-r--r--  0 root   wheel     6089 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/intr_machdep.h
-rw-r--r--  0 root   wheel     1503 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/iodev.h
-rw-r--r--  0 root   wheel     1914 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/kdb.h
-rw-r--r--  0 root   wheel     2462 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/legacyvar.h
-rw-r--r--  0 root   wheel     1976 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/limits.h
-rw-r--r--  0 root   wheel     1898 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/mca.h
-rw-r--r--  0 root   wheel     3753 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/md_var.h
-rw-r--r--  0 root   wheel     1605 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/memdev.h
-rw-r--r--  0 root   wheel     1629 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/metadata.h
-rw-r--r--  0 root   wheel     1769 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/minidump.h
-rw-r--r--  0 root   wheel     1595 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/mp_watchdog.h
-rw-r--r--  0 root   wheel     3994 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/mptable.h
-rw-r--r--  0 root   wheel     1787 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/mutex.h
-rw-r--r--  0 root   wheel     1879 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/nexusvar.h
-rw-r--r--  0 root   wheel     5770 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/param.h
-rw-r--r--  0 root   wheel     3397 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/pcb.h
-rw-r--r--  0 root   wheel     2023 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/pci_cfgreg.h
-rw-r--r--  0 root   wheel     7786 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/pcpu.h
-rw-r--r--  0 root   wheel    10731 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/pmap.h
-rw-r--r--  0 root   wheel     4236 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/pmc_mdep.h
-rw-r--r--  0 root   wheel     1949 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/ppireg.h
-rw-r--r--  0 root   wheel     2939 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/proc.h
-rw-r--r--  0 root   wheel     3710 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/psl.h
-rw-r--r--  0 root   wheel     6071 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/profile.h
-rw-r--r--  0 root   wheel     1791 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/ptrace.h
-rw-r--r--  0 root   wheel     4453 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/reg.h
-rw-r--r--  0 root   wheel     2342 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/reloc.h
-rw-r--r--  0 root   wheel     1991 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/resource.h
-rw-r--r--  0 root   wheel     1918 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/runq.h
-rw-r--r--  0 root   wheel    10148 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/segments.h
-rw-r--r--  0 root   wheel     2237 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/setjmp.h
-rw-r--r--  0 root   wheel     2166 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/sf_buf.h
-rw-r--r--  0 root   wheel     1979 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/sigframe.h
-rw-r--r--  0 root   wheel     3377 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/signal.h
-rw-r--r--  0 root   wheel     2314 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/smp.h
-rw-r--r--  0 root   wheel    18029 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/specialreg.h
-rw-r--r--  0 root   wheel     1454 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/stack.h
-rw-r--r--  0 root   wheel     2647 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/stdarg.h
-rw-r--r--  0 root   wheel     3068 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/sysarch.h
-rw-r--r--  0 root   wheel     2125 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/timerreg.h
-rw-r--r--  0 root   wheel     4008 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/trap.h
-rw-r--r--  0 root   wheel     3001 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/tss.h
-rw-r--r--  0 root   wheel     3326 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/ucontext.h
-rw-r--r--  0 root   wheel     3451 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/varargs.h
-rw-r--r--  0 root   wheel     2094 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/vm.h
-rw-r--r--  0 root   wheel     7181 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/vmparam.h
-rw-r--r--  0 root   wheel    10175 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/xen/hypercall.h
-rw-r--r--  0 root   wheel     3418 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/xen/synch_bitops.h
-rw-r--r--  0 root   wheel     9309 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/xen/xen-os.h
-rw-r--r--  0 root   wheel     2728 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/xen/xenfunc.h
-rw-r--r--  0 root   wheel     7859 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/xen/xenpmap.h
-rw-r--r--  0 root   wheel     3537 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/xen/xenvar.h
-rw-r--r--  0 root   wheel     2791 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/pc/bios.h
-rw-r--r--  0 root   wheel     1013 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/include/pc/display.h
-rw-r--r--  0 root   wheel    21900 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/amd64/exception.S
-rw-r--r--  0 root   wheel     3094 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/amd64/locore.S
-rw-r--r--  0 root   wheel    35899 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/amd64/mp_machdep.c
-rw-r--r--  0 root   wheel   127991 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/amd64/pmap.c
-rw-r--r--  0 root   wheel    28692 Apr  3 06:13
crash.10/usr/src.old/sys/amd64/amd64/trap.c
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/mnt/obj/
drwxr-xr-x  0 root   wheel        0 Apr  3 06:13 crash.10/mnt/crash/
-rw-------  0 root   wheel 3331166208 Apr  3 06:13 crash.10/mnt/crash/vmcore.10
-rw-------  0 root   wheel        470 Apr  3 06:13 crash.10/mnt/crash/info.10
drwxr-xr-x  0 root   wheel          0 Apr  3 06:13 crash.10/mnt/obj/usr/
drwxr-xr-x  0 root   wheel          0 Apr  3 06:13 crash.10/mnt/obj/usr/src.old/
drwxr-xr-x  0 root   wheel          0 Apr  3 06:13
crash.10/mnt/obj/usr/src.old/sys/
drwxr-xr-x  0 root   wheel          0 Apr  3 06:13
crash.10/mnt/obj/usr/src.old/sys/DEBUG/
-rw-r--r--  0 root   wheel      92901 Apr  3 06:13
crash.10/mnt/obj/usr/src.old/sys/DEBUG/vnode_if.c


> I am asking because to me it seems like a NULL pointer crash:
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 1; apic id = 01
>> fault virtual address = 0x48
>
> It looks like perhaps zio->io_vd became NULL while an I/O response was traveling
> up and vdev_geom_io_intr was not prepared to handle that.
>
>> _mtx_lock_flags() at _mtx_lock_flags+0x39
>> vdev_geom_io_intr() at vdev_geom_io_intr+0x62
>> g_io_schedule_up() at g_io_schedule_up+0xed
>> g_up_procbody() at g_up_procbody+0x6f
>> fork_exit() at fork_exit+0x12a
>> fork_trampoline() at fork_trampoline+0xe

    If there is any info I can gather on GENERIC - please let me know.
The only crashinfo I have is on debug kernel.

Alex Bakhtin
Comment 4 korvus 2010-04-22 16:13:26 UTC
I've seen a similar issue in the past while testing hot-removal of RAIDZ 
members (glabeled siis(4)-attached devices).  After the /dev/ada* entry 
would disappear, the /dev/label/diskXX entry would remain and crash 
shortly down the line with ZFS IO.  Here's the panic info in case it is 
relevant:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 14
fault virtual address   = 0x48
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff8035f375
stack pointer           = 0x28:0xffffff800006db60
frame pointer           = 0x28:0xffffff800006db70
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2 (g_event)
[thread pid 2 tid 100014 ]
Stopped at      _mtx_lock_flags+0x15:   lock cmpxchgq   %rsi,0x18(%rdi)

db> bt
Tracing pid 2 tid 100014 td 0xffffff00014d4ab0
_mtx_lock_flags() at _mtx_lock_flags+0x15
vdev_geom_release() at vdev_geom_release+0x33
vdev_geom_orphan() at vdev_geom_orphan+0x15c
g_run_events() at g_run_events+0x104
g_event_procbody() at g_event_procbody+0x55
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff800006dd30, rbp = 0 ---
Comment 5 Andriy Gapon 2010-04-22 23:55:28 UTC
Can you try this patch?

--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
@@ -603,6 +603,9 @@ vdev_geom_io_intr(struct bio *bp)
 	zio = bp->bio_caller1;
 	ctx = zio->io_vd->vdev_tsd;

+	if (ctx == NULL)
+		return;
+
 	if ((zio->io_error = bp->bio_error) == 0 && bp->bio_resid != 0)
 		zio->io_error = EIO;


-- 
Andriy Gapon
Comment 6 Alex.Bakhtin 2010-05-04 00:23:35 UTC
Andriy,

     Upgraded to today's stable. Reproduced the problem. On GENERIC
the system just hangs with the following output:

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
ad12: FAILURE - WRITE_DMA48
status=3D7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=3D0
LBA=3D2312588250^



Fatal trap 12: page fault while in kernel mode
cpuid =3D 1; apic id =3D 01
fault virtual address   =3D 0x48
fault code              =3D supervisor write data, page not present
instruction pointer     =3D 0x20:0xffffffff80593e95
stack pointer           =3D 0x28:0xffffff8000065ba0
frame pointer           =3D 0x28:0xffffff8000065bb0
code segment            =3D base rx0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 3 (g_up)
trap number             =3D 12
panic: page fault
cpuid =3D 1

Fatal trap 12: page fault while in kernel mode
cpuid =3D 0; apic id =3D 00
fault virtual address   =3D 0x0
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff80545a28
stack pointer           =3D 0x28:0xffffff80eada2a40
frame pointer           =3D 0x28:0xffffff80eada2a90
code segment            =3D base rx0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 0 (spa_zio)
trap number             =3D 12
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D


With GENERIG + DDB/KDB enabled I got the following (it seems that
first time I detached the device when there was no active transaction
- can try to reproduce):
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
ad12: FAILURE - device detached


Fatal trap 12: page fault while in kernel mode
cpuid =3D 1; apic id =3D 01
fault virtual address   =3D 0x48
fault code              =3D supervisor write data, page not present

instruction pointer     =3D 0x20:0xffffffff805a0345
Fatal double fault
stack pointer           =3D 0x28:0xffffff800006aba0
rip =3D 0xffffffff808085ad
frame pointer           =3D 0x28:0xffffff800006abb0
rsp =3D 0xffffff80ead87000
code segment            =3D base rx0, limit 0xfffff, type 0x1b
rbp =3D 0xffffff80ead87070
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
cpuid =3D 0; processor eflags     =3D apic id =3D 00
interrupt enabled, panic: double fault
resume, cpuid =3D 0
IOPL =3D 0
KDB: enter: panic
c[thread pid 0 tid 100113 ]
Stopped at      kdb_enter+0x3d: movq    $0,0x69cee0(%rip)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D

And another one
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
ad12: FAILURE - WRITE_DMA
status=3D7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=3D0
LBA=3D111033498^M
^M
^M
Fatal trap 12: page fault while in kernel mode^M
cpuid =3D 1; apic id =3D 01^M
fault virtual address   =3D 0x48^M
fault code              =3D supervisor write data, page not present^M
instruction pointer     =3D 0x20:0xffffffff805a0345^M
stack pointer           =3D 0x28:0xffffff800006aba0^M
frame pointer           =3D 0x28:0xffffff800006abb0^M
code segment            =3D base rx0, limit 0xfffff, type 0x1b^M
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1^M
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0^M
current process         =3D 3 (g_up)^M
[thread pid 3 tid 100011 ]
Stopped at      _mtx_lock_flags+0x15:   lock cmpxchgq   %rsi,0x18(%rdi)
db:0:kdb.enter.default> capture on
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D

And with your patch the system doesn't detect that device is detached
and seems to be dead-locked (doesn't respond to power-button):

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
acpi0: suspend request ignored (not ready yet)
acpi0: request to enter state S5 failed (err 6)
acpi0: suspend request ignored (not ready yet)
acpi0: request to enter state S5 failed (err 6)

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D

    So, I can still easily reproduce this problem on 8-STABLE. Your
simple patch helps to avoid page fault but dead-locks the system. Are
you sure that you can just return at this point? Probably it make
sense to set some error flag before return?

Alex Bakhtin

2010/4/23 Andriy Gapon <avg@icyb.net.ua>:
>
> Can you try this patch?
>
> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
> @@ -603,6 +603,9 @@ vdev_geom_io_intr(struct bio *bp)
> =A0 =A0 =A0 =A0zio =3D bp->bio_caller1;
> =A0 =A0 =A0 =A0ctx =3D zio->io_vd->vdev_tsd;
>
> + =A0 =A0 =A0 if (ctx =3D=3D NULL)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
> +
> =A0 =A0 =A0 =A0if ((zio->io_error =3D bp->bio_error) =3D=3D 0 && bp->bio_=
resid !=3D 0)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0zio->io_error =3D EIO;
>
>
> --
> Andriy Gapon
>
Comment 7 Andriy Gapon 2010-05-13 06:44:52 UTC
on 04/05/2010 02:23 Alex Bakhtin said the following:
> 
>     So, I can still easily reproduce this problem on 8-STABLE. Your
> simple patch helps to avoid page fault but dead-locks the system. Are
> you sure that you can just return at this point? Probably it make
> sense to set some error flag before return?

You are correct, my simple patch is far from being correct.
And properly fixing the problem is not trivial.

Some issues:
1. vdev_geom_release() sets vdev_tsd to NULL before shutting down the
corresponding gc_queue; because of that, bios that may later come via
vdev_geom_io_intr() can not be mapped to their gc_queue and thus there is no
choice but to drop them on the floor.
2. Shutdown logic in vdev_geom_worker() does not seem to be reliable.  I think
that vdev thread may get stuck forever if a bio happens to be on gc_queue when
vdev_geom_release() is called.  In that case gc_state check may be skipped and
gc_queue may never be waken up again.
3. I am not sure if pending zios are taken care of when vdev_geom_release() is
called.  If not, then they may get stuck forever.

Hopefully Pawel can help us here.

> 2010/4/23 Andriy Gapon <avg@icyb.net.ua>:
>> Can you try this patch?
>>
>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>> @@ -603,6 +603,9 @@ vdev_geom_io_intr(struct bio *bp)
>>        zio = bp->bio_caller1;
>>        ctx = zio->io_vd->vdev_tsd;
>>
>> +       if (ctx == NULL)
>> +               return;
>> +
>>        if ((zio->io_error = bp->bio_error) == 0 && bp->bio_resid != 0)
>>                zio->io_error = EIO;
>>
>>
>> --
>> Andriy Gapon
>>


-- 
Andriy Gapon
Comment 8 dfilter service freebsd_committer freebsd_triage 2010-05-16 12:56:57 UTC
Author: pjd
Date: Sun May 16 11:56:42 2010
New Revision: 208142
URL: http://svn.freebsd.org/changeset/base/208142

Log:
  The whole point of having dedicated worker thread for each leaf VDEV was to
  avoid calling zio_interrupt() from geom_up thread context. It turns out that
  when provider is forcibly removed from the system and we kill worker thread
  there can still be some ZIOs pending. To complete pending ZIOs when there is
  no worker thread anymore we still have to call zio_interrupt() from geom_up
  context. To avoid this race just remove use of worker threads altogether.
  This should be more or less fine, because I also thought that zio_interrupt()
  does more work, but it only makes small UMA allocation with M_WAITOK.
  It also saves one context switch per I/O request.
  
  PR:		kern/145339
  Reported by:	Alex Bakhtin <Alex.Bakhtin@gmail.com>
  MFC after:	1 week

Modified:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
==============================================================================
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	Sun May 16 11:17:21 2010	(r208141)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	Sun May 16 11:56:42 2010	(r208142)
@@ -47,31 +47,6 @@ struct g_class zfs_vdev_class = {
 
 DECLARE_GEOM_CLASS(zfs_vdev_class, zfs_vdev);
 
-typedef struct vdev_geom_ctx {
-	struct g_consumer *gc_consumer;
-	int gc_state;
-	struct bio_queue_head gc_queue;
-	struct mtx gc_queue_mtx;
-} vdev_geom_ctx_t;
-
-static void
-vdev_geom_release(vdev_t *vd)
-{
-	vdev_geom_ctx_t *ctx;
-
-	ctx = vd->vdev_tsd;
-	vd->vdev_tsd = NULL;
-
-	mtx_lock(&ctx->gc_queue_mtx);
-	ctx->gc_state = 1;
-	wakeup_one(&ctx->gc_queue);
-	while (ctx->gc_state != 2)
-		msleep(&ctx->gc_state, &ctx->gc_queue_mtx, 0, "vgeom:w", 0);
-	mtx_unlock(&ctx->gc_queue_mtx);
-	mtx_destroy(&ctx->gc_queue_mtx);
-	kmem_free(ctx, sizeof(*ctx));
-}
-
 static void
 vdev_geom_orphan(struct g_consumer *cp)
 {
@@ -96,8 +71,7 @@ vdev_geom_orphan(struct g_consumer *cp)
 		ZFS_LOG(1, "Destroyed geom %s.", gp->name);
 		g_wither_geom(gp, error);
 	}
-	vdev_geom_release(vd);
-
+	vd->vdev_tsd = NULL;
 	vd->vdev_remove_wanted = B_TRUE;
 	spa_async_request(vd->vdev_spa, SPA_ASYNC_REMOVE);
 }
@@ -188,52 +162,6 @@ vdev_geom_detach(void *arg, int flag __u
 	}
 }
 
-static void
-vdev_geom_worker(void *arg)
-{
-	vdev_geom_ctx_t *ctx;
-	zio_t *zio;
-	struct bio *bp;
-
-	thread_lock(curthread);
-	sched_prio(curthread, PRIBIO);
-	thread_unlock(curthread);
-
-	ctx = arg;
-	for (;;) {
-		mtx_lock(&ctx->gc_queue_mtx);
-		bp = bioq_takefirst(&ctx->gc_queue);
-		if (bp == NULL) {
-			if (ctx->gc_state == 1) {
-				ctx->gc_state = 2;
-				wakeup_one(&ctx->gc_state);
-				mtx_unlock(&ctx->gc_queue_mtx);
-				kthread_exit();
-			}
-			msleep(&ctx->gc_queue, &ctx->gc_queue_mtx,
-			    PRIBIO | PDROP, "vgeom:io", 0);
-			continue;
-		}
-		mtx_unlock(&ctx->gc_queue_mtx);
-		zio = bp->bio_caller1;
-		zio->io_error = bp->bio_error;
-		if (bp->bio_cmd == BIO_FLUSH && bp->bio_error == ENOTSUP) {
-			vdev_t *vd;
-
-			/*
-			 * If we get ENOTSUP, we know that no future
-			 * attempts will ever succeed.  In this case we
-			 * set a persistent bit so that we don't bother
-			 * with the ioctl in the future.
-			 */
-			vd = zio->io_vd;
-			vd->vdev_nowritecache = B_TRUE;
-		}
-		g_destroy_bio(bp);
-		zio_interrupt(zio);
-	}
-}
-
 static uint64_t
 nvlist_get_guid(nvlist_t *list)
 {
@@ -488,7 +416,6 @@ vdev_geom_open_by_path(vdev_t *vd, int c
 static int
 vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
 {
-	vdev_geom_ctx_t *ctx;
 	struct g_provider *pp;
 	struct g_consumer *cp;
 	int error, owned;
@@ -557,19 +484,9 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi
 	}
 
 	cp->private = vd;
-
-	ctx = kmem_zalloc(sizeof(*ctx), KM_SLEEP);
-	bioq_init(&ctx->gc_queue);
-	mtx_init(&ctx->gc_queue_mtx, "zfs:vdev:geom:queue", NULL, MTX_DEF);
-	ctx->gc_consumer = cp;
-	ctx->gc_state = 0;
-
-	vd->vdev_tsd = ctx;
+	vd->vdev_tsd = cp;
 	pp = cp->provider;
 
-	kproc_kthread_add(vdev_geom_worker, ctx, &zfsproc, NULL, 0, 0,
-	    "zfskern", "vdev %s", pp->name);
-
 	/*
 	 * Determine the actual size of the device.
 	 */
@@ -592,50 +509,49 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi
 static void
 vdev_geom_close(vdev_t *vd)
 {
-	vdev_geom_ctx_t *ctx;
 	struct g_consumer *cp;
 
-	if ((ctx = vd->vdev_tsd) == NULL)
-		return;
-	if ((cp = ctx->gc_consumer) == NULL)
+	cp = vd->vdev_tsd;
+	if (cp == NULL)
 		return;
-	vdev_geom_release(vd);
+	vd->vdev_tsd = NULL;
 	g_post_event(vdev_geom_detach, cp, M_WAITOK, NULL);
 }
 
 static void
 vdev_geom_io_intr(struct bio *bp)
 {
-	vdev_geom_ctx_t *ctx;
 	zio_t *zio;
 
 	zio = bp->bio_caller1;
-	ctx = zio->io_vd->vdev_tsd;
-
-	if ((zio->io_error = bp->bio_error) == 0 && bp->bio_resid != 0)
+	zio->io_error = bp->bio_error;
+	if (zio->io_error == 0 && bp->bio_resid != 0)
 		zio->io_error = EIO;
+	if (bp->bio_cmd == BIO_FLUSH && bp->bio_error == ENOTSUP) {
+		vdev_t *vd;
 
-	mtx_lock(&ctx->gc_queue_mtx);
-	bioq_insert_tail(&ctx->gc_queue, bp);
-	wakeup_one(&ctx->gc_queue);
-	mtx_unlock(&ctx->gc_queue_mtx);
+		/*
+		 * If we get ENOTSUP, we know that no future
+		 * attempts will ever succeed.  In this case we
+		 * set a persistent bit so that we don't bother
+		 * with the ioctl in the future.
+		 */
+		vd = zio->io_vd;
+		vd->vdev_nowritecache = B_TRUE;
+	}
+	g_destroy_bio(bp);
+	zio_interrupt(zio);
 }
 
 static int
 vdev_geom_io_start(zio_t *zio)
 {
 	vdev_t *vd;
-	vdev_geom_ctx_t *ctx;
 	struct g_consumer *cp;
 	struct bio *bp;
 	int error;
 
-	cp = NULL;
-
 	vd = zio->io_vd;
-	ctx = vd->vdev_tsd;
-	if (ctx != NULL)
-		cp = ctx->gc_consumer;
 
 	if (zio->io_type == ZIO_TYPE_IOCTL) {
 		/* XXPOLICY */
@@ -664,6 +580,7 @@ vdev_geom_io_start(zio_t *zio)
 		return (ZIO_PIPELINE_CONTINUE);
 	}
 sendreq:
+	cp = vd->vdev_tsd;
 	if (cp == NULL) {
 		zio->io_error = ENXIO;
 		return (ZIO_PIPELINE_CONTINUE);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 9 Alex.Bakhtin 2010-05-17 16:36:39 UTC
Pawel,

   I tested your patch in the following zfs configuration (all on
5x2TB WD20EARS drivers):

1. raidz1 on top of physical disks.
2. raidz1 on top of geli
3. raidz2 on top of physical disks.

   In all three cases it seems that the problem was fixed - I can't
crash zfs in vdev_geom when unplugging the disk.

   Unfortunately, 3 times I got a deadlock in zfs after plugging
vdevs back under load. It happens several seconds after zpool online
command. I'm not 100 percent sure that deadlocks are related to this
patch, but... I'm going to make some additional testing with patched
and not patched kernels.

2010/5/13  <pjd@freebsd.org>:
> Synopsis: [zfs] deadlock after detaching block device from raidz pool
>
> State-Changed-From-To: open->feedback
> State-Changed-By: pjd
> State-Changed-When: czw 13 maj 2010 09:33:20 UTC
> State-Changed-Why:
> Could you try this patch:
>
> =A0 =A0 =A0 =A0http://people.freebsd.org/~pjd/patches/vdev_geom.c.3.patch
>
> It is against most recent HEAD. If it is rejected on 8-STABLE, just grab
> entire vdev_geom.c from HEAD and patch this.
>
>
> Responsible-Changed-From-To: freebsd-fs->pjd
> Responsible-Changed-By: pjd
> Responsible-Changed-When: czw 13 maj 2010 09:33:20 UTC
> Responsible-Changed-Why:
> I'll take this one.
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D145339
>
Comment 10 Alex.Bakhtin 2010-05-23 00:40:38 UTC
Pawel,

	I made some additional testing. Now I'm 95 percent sure that this
deadlock was introduced by this patch. I tried patched and non-patched
GENERIC kernel. It seems that it is harder to reproduce this deadlock on
raidz1 than raidz2. With raidz2 I tried to detach and reattach back two =
disk
at the same time, and deadlock is 100% reproducible on patched kernel =
but I
can't reproduce it on non-patched kernel.

	How to reproduce:

1. Create raiz2 pool
2. Detach two devices while the pool is idle.
3. Start writing to the pool (dd if=3D/dev/zero of=3D/storage/test =
bs=3D1m)
4. atacontrol detach/attach to have disks back.
5. Online two disks at the same time (zpool online storage adX adY).
6. Wait some time (in my testing - several seconds, less than one =
minute) -
all disk activity would be stopped. After that it's impossible to abort
zpool online or dd command. Also it's impossible to reboot without
hard-reset.

	If you need core from deadlocked kernel please let me know.

Alex Bakhtin

-----Original Message-----
From: Alex Bakhtin [mailto:alex.bakhtin@gmail.com]=20
Sent: Monday, May 17, 2010 7:37 PM
To: pjd@freebsd.org
Cc: freebsd-fs@freebsd.org; bug-followup@freebsd.org
Subject: Re: kern/145339: [zfs] deadlock after detaching block device =
from
raidz pool

Pawel,

   I tested your patch in the following zfs configuration (all on
5x2TB WD20EARS drivers):

1. raidz1 on top of physical disks.
2. raidz1 on top of geli
3. raidz2 on top of physical disks.

   In all three cases it seems that the problem was fixed - I can't
crash zfs in vdev_geom when unplugging the disk.

   Unfortunately, 3 times I got a deadlock in zfs after plugging
vdevs back under load. It happens several seconds after zpool online
command. I'm not 100 percent sure that deadlocks are related to this
patch, but... I'm going to make some additional testing with patched
and not patched kernels.

2010/5/13  <pjd@freebsd.org>:
> Synopsis: [zfs] deadlock after detaching block device from raidz pool
>
> State-Changed-From-To: open->feedback
> State-Changed-By: pjd
> State-Changed-When: czw 13 maj 2010 09:33:20 UTC
> State-Changed-Why:
> Could you try this patch:
>
> =9A =9A =9A =
=9Ahttp://people.freebsd.org/~pjd/patches/vdev_geom.c.3.patch
>
> It is against most recent HEAD. If it is rejected on 8-STABLE, just =
grab
> entire vdev_geom.c from HEAD and patch this.
>
>
> Responsible-Changed-From-To: freebsd-fs->pjd
> Responsible-Changed-By: pjd
> Responsible-Changed-When: czw 13 maj 2010 09:33:20 UTC
> Responsible-Changed-Why:
> I'll take this one.
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D145339
>
Comment 11 dfilter service freebsd_committer freebsd_triage 2010-05-24 11:09:46 UTC
Author: pjd
Date: Mon May 24 10:09:36 2010
New Revision: 208487
URL: http://svn.freebsd.org/changeset/base/208487

Log:
  MFC r207920,r207934,r207936,r207937,r207970,r208142,r208147,r208148,r208166,
  r208454,r208455,r208458:
  
  r207920:
  
  Back out r205134. It is not stable.
  
  r207934:
  
  Add missing new line characters to the warnings.
  
  r207936:
  
  Eventhough r203504 eliminates taste traffic provoked by vdev_geom.c,
  ZFS still like to open all vdevs, close them and open them again,
  which in turn provokes taste traffic anyway.
  
  I don't know of any clean way to fix it, so do it the hard way - if we can't
  open provider for writing just retry 5 times with 0.5 pauses. This should
  elimitate accidental races caused by other classes tasting providers created on
  top of our vdevs.
  
  Reported by:	James R. Van Artsdalen <james-freebsd-fs2@jrv.org>
  Reported by:	Yuri Pankov <yuri.pankov@gmail.com>
  
  r207937:
  
  I added vfs_lowvnodes event, but it was only used for a short while and now
  it is totally unused. Remove it.
  
  r207970:
  
  When there is no memory or KVA, try to help by reclaiming some vnodes.
  This helps with 'kmem_map too small' panics.
  
  No objections from:	kib
  Tested by:		Alexander V. Ribchansky <shurik@zk.informjust.ua>
  
  r208142:
  
  The whole point of having dedicated worker thread for each leaf VDEV was to
  avoid calling zio_interrupt() from geom_up thread context. It turns out that
  when provider is forcibly removed from the system and we kill worker thread
  there can still be some ZIOs pending. To complete pending ZIOs when there is
  no worker thread anymore we still have to call zio_interrupt() from geom_up
  context. To avoid this race just remove use of worker threads altogether.
  This should be more or less fine, because I also thought that zio_interrupt()
  does more work, but it only makes small UMA allocation with M_WAITOK.
  It also saves one context switch per I/O request.
  
  PR:		kern/145339
  Reported by:	Alex Bakhtin <Alex.Bakhtin@gmail.com>
  
  r208147:
  
  Add task structure to zio and use it instead of allocating one.
  This eliminates the only place where we can sleep when calling zio_interrupt().
  As a side-effect this can actually improve performance a little as we
  allocate one less thing for every I/O.
  
  Prodded by:	kib
  
  r208148:
  
  Allow to configure UMA usage for ZIO data via loader and turn it on by
  default for amd64. On i386 I saw performance degradation when UMA was used,
  but for amd64 it should help.
  
  r208166:
  
  Fix userland build by making io_task available only for the kernel and by
  providing taskq_dispatch_safe() macro.
  
  r208454:
  
  Remove ZIO_USE_UMA from arc.c as well.
  
  r208455:
  
  ZIO_USE_UMA is no longer used.
  
  r208458:
  
  Create UMA zones unconditionally.

Added:
  stable/8/sys/cddl/compat/opensolaris/sys/taskq.h
     - copied unchanged from r208147, head/sys/cddl/compat/opensolaris/sys/taskq.h
Modified:
  stable/8/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h
  stable/8/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c
  stable/8/sys/cddl/compat/opensolaris/sys/dnlc.h
  stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
  stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
  stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
  stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
  stable/8/sys/kern/vfs_subr.c
  stable/8/sys/modules/zfs/Makefile
  stable/8/sys/sys/eventhandler.h
Directory Properties:
  stable/8/cddl/contrib/opensolaris/   (props changed)
  stable/8/cddl/contrib/opensolaris/cmd/zdb/   (props changed)
  stable/8/cddl/contrib/opensolaris/cmd/zfs/   (props changed)
  stable/8/cddl/contrib/opensolaris/lib/libzfs/   (props changed)
  stable/8/sys/   (props changed)
  stable/8/sys/amd64/include/xen/   (props changed)
  stable/8/sys/cddl/contrib/opensolaris/   (props changed)
  stable/8/sys/contrib/dev/acpica/   (props changed)
  stable/8/sys/contrib/pf/   (props changed)
  stable/8/sys/dev/xen/xenpci/   (props changed)
  stable/8/sys/geom/sched/   (props changed)

Modified: stable/8/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h
==============================================================================
--- stable/8/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h	Mon May 24 10:09:36 2010	(r208487)
@@ -343,6 +343,9 @@ extern void	taskq_wait(taskq_t *);
 extern int	taskq_member(taskq_t *, void *);
 extern void	system_taskq_init(void);
 
+#define	taskq_dispatch_safe(tq, func, arg, task)			\
+	taskq_dispatch((tq), (func), (arg), TQ_SLEEP)
+
 #define	XVA_MAPSIZE	3
 #define	XVA_MAGIC	0x78766174
 

Modified: stable/8/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c
==============================================================================
--- stable/8/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c	Mon May 24 10:09:36 2010	(r208487)
@@ -40,12 +40,6 @@ __FBSDID("$FreeBSD$");
 
 static uma_zone_t taskq_zone;
 
-struct ostask {
-	struct task	ost_task;
-	task_func_t	*ost_func;
-	void		*ost_arg;
-};
-
 taskq_t *system_taskq = NULL;
 
 static void
@@ -140,3 +134,32 @@ taskq_dispatch(taskq_t *tq, task_func_t 
 
 	return ((taskqid_t)(void *)task);
 }
+
+#define	TASKQ_MAGIC	0x74541c
+
+static void
+taskq_run_safe(void *arg, int pending __unused)
+{
+	struct ostask *task = arg;
+
+	ASSERT(task->ost_magic == TASKQ_MAGIC);
+	task->ost_func(task->ost_arg);
+	task->ost_magic = 0;
+}
+
+taskqid_t
+taskq_dispatch_safe(taskq_t *tq, task_func_t func, void *arg,
+    struct ostask *task)
+{
+
+	ASSERT(task->ost_magic != TASKQ_MAGIC);
+
+	task->ost_magic = TASKQ_MAGIC;
+	task->ost_func = func;
+	task->ost_arg = arg;
+
+	TASK_INIT(&task->ost_task, 0, taskq_run_safe, task);
+	taskqueue_enqueue(tq->tq_queue, &task->ost_task);
+
+	return ((taskqid_t)(void *)task);
+}

Modified: stable/8/sys/cddl/compat/opensolaris/sys/dnlc.h
==============================================================================
--- stable/8/sys/cddl/compat/opensolaris/sys/dnlc.h	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/sys/cddl/compat/opensolaris/sys/dnlc.h	Mon May 24 10:09:36 2010	(r208487)
@@ -35,6 +35,6 @@
 #define	dnlc_update(dvp, name, vp)	do { } while (0)
 #define	dnlc_remove(dvp, name)		do { } while (0)
 #define	dnlc_purge_vfsp(vfsp, count)	(0)
-#define	dnlc_reduce_cache(percent)	EVENTHANDLER_INVOKE(vfs_lowvnodes, (int)(intptr_t)(percent))
+#define	dnlc_reduce_cache(percent)	do { } while (0)
 
 #endif	/* !_OPENSOLARIS_SYS_DNLC_H_ */

Copied: stable/8/sys/cddl/compat/opensolaris/sys/taskq.h (from r208147, head/sys/cddl/compat/opensolaris/sys/taskq.h)
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ stable/8/sys/cddl/compat/opensolaris/sys/taskq.h	Mon May 24 10:09:36 2010	(r208487, copy of r208147, head/sys/cddl/compat/opensolaris/sys/taskq.h)
@@ -0,0 +1,44 @@
+/*-
+ * Copyright (c) 2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$
+ */
+
+#ifndef _OPENSOLARIS_SYS_TASKQ_H_
+#define	_OPENSOLARIS_SYS_TASKQ_H_
+
+#include_next <sys/taskq.h>
+
+struct ostask {
+	struct task	 ost_task;
+	task_func_t	*ost_func;
+	void		*ost_arg;
+	int		 ost_magic;
+};
+
+taskqid_t taskq_dispatch_safe(taskq_t *tq, task_func_t func, void *arg,
+    struct ostask *task);
+
+#endif	/* _OPENSOLARIS_SYS_TASKQ_H_ */

Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
==============================================================================
--- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Mon May 24 10:09:36 2010	(r208487)
@@ -195,11 +195,6 @@ SYSCTL_QUAD(_vfs_zfs, OID_AUTO, arc_min,
 SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_disable, CTLFLAG_RDTUN,
     &zfs_mdcomp_disable, 0, "Disable metadata compression");
 
-#ifdef ZIO_USE_UMA
-extern kmem_cache_t	*zio_buf_cache[];
-extern kmem_cache_t	*zio_data_buf_cache[];
-#endif
-
 /*
  * Note that buffers can be in one of 6 states:
  *	ARC_anon	- anonymous (discussed below)
@@ -620,11 +615,6 @@ static buf_hash_table_t buf_hash_table;
 
 uint64_t zfs_crc64_table[256];
 
-#ifdef ZIO_USE_UMA
-extern kmem_cache_t	*zio_buf_cache[];
-extern kmem_cache_t	*zio_data_buf_cache[];
-#endif
-
 /*
  * Level 2 ARC
  */
@@ -2192,14 +2182,15 @@ arc_reclaim_needed(void)
 	return (0);
 }
 
+extern kmem_cache_t	*zio_buf_cache[];
+extern kmem_cache_t	*zio_data_buf_cache[];
+
 static void
 arc_kmem_reap_now(arc_reclaim_strategy_t strat)
 {
-#ifdef ZIO_USE_UMA
 	size_t			i;
 	kmem_cache_t		*prev_cache = NULL;
 	kmem_cache_t		*prev_data_cache = NULL;
-#endif
 
 #ifdef _KERNEL
 	if (arc_meta_used >= arc_meta_limit) {
@@ -2224,7 +2215,6 @@ arc_kmem_reap_now(arc_reclaim_strategy_t
 	if (strat == ARC_RECLAIM_AGGR)
 		arc_shrink();
 
-#ifdef ZIO_USE_UMA
 	for (i = 0; i < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT; i++) {
 		if (zio_buf_cache[i] != prev_cache) {
 			prev_cache = zio_buf_cache[i];
@@ -2235,7 +2225,6 @@ arc_kmem_reap_now(arc_reclaim_strategy_t
 			kmem_cache_reap_now(zio_data_buf_cache[i]);
 		}
 	}
-#endif
 	kmem_cache_reap_now(buf_cache);
 	kmem_cache_reap_now(hdr_cache);
 }

Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
==============================================================================
--- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h	Mon May 24 10:09:36 2010	(r208487)
@@ -316,6 +316,11 @@ struct zio {
 
 	/* FMA state */
 	uint64_t	io_ena;
+
+#ifdef _KERNEL
+	/* FreeBSD only. */
+	struct ostask	io_task;
+#endif
 };
 
 extern zio_t *zio_null(zio_t *pio, spa_t *spa,

Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
==============================================================================
--- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c	Mon May 24 10:09:36 2010	(r208487)
@@ -47,31 +47,6 @@ struct g_class zfs_vdev_class = {
 
 DECLARE_GEOM_CLASS(zfs_vdev_class, zfs_vdev);
 
-typedef struct vdev_geom_ctx {
-	struct g_consumer *gc_consumer;
-	int gc_state;
-	struct bio_queue_head gc_queue;
-	struct mtx gc_queue_mtx;
-} vdev_geom_ctx_t;
-
-static void
-vdev_geom_release(vdev_t *vd)
-{
-	vdev_geom_ctx_t *ctx;
-
-	ctx = vd->vdev_tsd;
-	vd->vdev_tsd = NULL;
-
-	mtx_lock(&ctx->gc_queue_mtx);
-	ctx->gc_state = 1;
-	wakeup_one(&ctx->gc_queue);
-	while (ctx->gc_state != 2)
-		msleep(&ctx->gc_state, &ctx->gc_queue_mtx, 0, "vgeom:w", 0);
-	mtx_unlock(&ctx->gc_queue_mtx);
-	mtx_destroy(&ctx->gc_queue_mtx);
-	kmem_free(ctx, sizeof(*ctx));
-}
-
 static void
 vdev_geom_orphan(struct g_consumer *cp)
 {
@@ -96,8 +71,7 @@ vdev_geom_orphan(struct g_consumer *cp)
 		ZFS_LOG(1, "Destroyed geom %s.", gp->name);
 		g_wither_geom(gp, error);
 	}
-	vdev_geom_release(vd);
-
+	vd->vdev_tsd = NULL;
 	vd->vdev_remove_wanted = B_TRUE;
 	spa_async_request(vd->vdev_spa, SPA_ASYNC_REMOVE);
 }
@@ -188,52 +162,6 @@ vdev_geom_detach(void *arg, int flag __u
 	}
 }
 
-static void
-vdev_geom_worker(void *arg)
-{
-	vdev_geom_ctx_t *ctx;
-	zio_t *zio;
-	struct bio *bp;
-
-	thread_lock(curthread);
-	sched_prio(curthread, PRIBIO);
-	thread_unlock(curthread);
-
-	ctx = arg;
-	for (;;) {
-		mtx_lock(&ctx->gc_queue_mtx);
-		bp = bioq_takefirst(&ctx->gc_queue);
-		if (bp == NULL) {
-			if (ctx->gc_state == 1) {
-				ctx->gc_state = 2;
-				wakeup_one(&ctx->gc_state);
-				mtx_unlock(&ctx->gc_queue_mtx);
-				kthread_exit();
-			}
-			msleep(&ctx->gc_queue, &ctx->gc_queue_mtx,
-			    PRIBIO | PDROP, "vgeom:io", 0);
-			continue;
-		}
-		mtx_unlock(&ctx->gc_queue_mtx);
-		zio = bp->bio_caller1;
-		zio->io_error = bp->bio_error;
-		if (bp->bio_cmd == BIO_FLUSH && bp->bio_error == ENOTSUP) {
-			vdev_t *vd;
-
-			/*
-			 * If we get ENOTSUP, we know that no future
-			 * attempts will ever succeed.  In this case we
-			 * set a persistent bit so that we don't bother
-			 * with the ioctl in the future.
-			 */
-			vd = zio->io_vd;
-			vd->vdev_nowritecache = B_TRUE;
-		}
-		g_destroy_bio(bp);
-		zio_interrupt(zio);
-	}
-}
-
 static uint64_t
 nvlist_get_guid(nvlist_t *list)
 {
@@ -396,7 +324,7 @@ vdev_geom_attach_by_guid_event(void *arg
 					continue;
 				ap->cp = vdev_geom_attach(pp);
 				if (ap->cp == NULL) {
-					printf("ZFS WARNING: Unable to attach to %s.",
+					printf("ZFS WARNING: Unable to attach to %s.\n",
 					    pp->name);
 					continue;
 				}
@@ -488,7 +416,6 @@ vdev_geom_open_by_path(vdev_t *vd, int c
 static int
 vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift)
 {
-	vdev_geom_ctx_t *ctx;
 	struct g_provider *pp;
 	struct g_consumer *cp;
 	int error, owned;
@@ -530,10 +457,19 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi
 		ZFS_LOG(1, "Provider %s not found.", vd->vdev_path);
 		error = ENOENT;
 	} else if (cp->acw == 0 && (spa_mode & FWRITE) != 0) {
+		int i;
+
 		g_topology_lock();
-		error = g_access(cp, 0, 1, 0);
+		for (i = 0; i < 5; i++) {
+			error = g_access(cp, 0, 1, 0);
+			if (error == 0)
+				break;
+			g_topology_unlock();
+			tsleep(vd, 0, "vdev", hz / 2);
+			g_topology_lock();
+		}
 		if (error != 0) {
-			printf("ZFS WARNING: Unable to open %s for writing (error=%d).",
+			printf("ZFS WARNING: Unable to open %s for writing (error=%d).\n",
 			    vd->vdev_path, error);
 			vdev_geom_detach(cp, 0);
 			cp = NULL;
@@ -548,19 +484,9 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi
 	}
 
 	cp->private = vd;
-
-	ctx = kmem_zalloc(sizeof(*ctx), KM_SLEEP);
-	bioq_init(&ctx->gc_queue);
-	mtx_init(&ctx->gc_queue_mtx, "zfs:vdev:geom:queue", NULL, MTX_DEF);
-	ctx->gc_consumer = cp;
-	ctx->gc_state = 0;
-
-	vd->vdev_tsd = ctx;
+	vd->vdev_tsd = cp;
 	pp = cp->provider;
 
-	kproc_kthread_add(vdev_geom_worker, ctx, &zfsproc, NULL, 0, 0,
-	    "zfskern", "vdev %s", pp->name);
-
 	/*
 	 * Determine the actual size of the device.
 	 */
@@ -583,50 +509,49 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi
 static void
 vdev_geom_close(vdev_t *vd)
 {
-	vdev_geom_ctx_t *ctx;
 	struct g_consumer *cp;
 
-	if ((ctx = vd->vdev_tsd) == NULL)
+	cp = vd->vdev_tsd;
+	if (cp == NULL)
 		return;
-	if ((cp = ctx->gc_consumer) == NULL)
-		return;
-	vdev_geom_release(vd);
+	vd->vdev_tsd = NULL;
 	g_post_event(vdev_geom_detach, cp, M_WAITOK, NULL);
 }
 
 static void
 vdev_geom_io_intr(struct bio *bp)
 {
-	vdev_geom_ctx_t *ctx;
 	zio_t *zio;
 
 	zio = bp->bio_caller1;
-	ctx = zio->io_vd->vdev_tsd;
-
-	if ((zio->io_error = bp->bio_error) == 0 && bp->bio_resid != 0)
+	zio->io_error = bp->bio_error;
+	if (zio->io_error == 0 && bp->bio_resid != 0)
 		zio->io_error = EIO;
+	if (bp->bio_cmd == BIO_FLUSH && bp->bio_error == ENOTSUP) {
+		vdev_t *vd;
 
-	mtx_lock(&ctx->gc_queue_mtx);
-	bioq_insert_tail(&ctx->gc_queue, bp);
-	wakeup_one(&ctx->gc_queue);
-	mtx_unlock(&ctx->gc_queue_mtx);
+		/*
+		 * If we get ENOTSUP, we know that no future
+		 * attempts will ever succeed.  In this case we
+		 * set a persistent bit so that we don't bother
+		 * with the ioctl in the future.
+		 */
+		vd = zio->io_vd;
+		vd->vdev_nowritecache = B_TRUE;
+	}
+	g_destroy_bio(bp);
+	zio_interrupt(zio);
 }
 
 static int
 vdev_geom_io_start(zio_t *zio)
 {
 	vdev_t *vd;
-	vdev_geom_ctx_t *ctx;
 	struct g_consumer *cp;
 	struct bio *bp;
 	int error;
 
-	cp = NULL;
-
 	vd = zio->io_vd;
-	ctx = vd->vdev_tsd;
-	if (ctx != NULL)
-		cp = ctx->gc_consumer;
 
 	if (zio->io_type == ZIO_TYPE_IOCTL) {
 		/* XXPOLICY */
@@ -655,6 +580,7 @@ vdev_geom_io_start(zio_t *zio)
 		return (ZIO_PIPELINE_CONTINUE);
 	}
 sendreq:
+	cp = vd->vdev_tsd;
 	if (cp == NULL) {
 		zio->io_error = ENXIO;
 		return (ZIO_PIPELINE_CONTINUE);

Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
==============================================================================
--- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c	Mon May 24 10:09:36 2010	(r208487)
@@ -33,6 +33,17 @@
 #include <sys/zio_compress.h>
 #include <sys/zio_checksum.h>
 
+#if defined(__amd64__)
+static int zio_use_uma = 1;
+#else
+static int zio_use_uma = 0;
+#endif
+SYSCTL_DECL(_vfs_zfs);
+SYSCTL_NODE(_vfs_zfs, OID_AUTO, zio, CTLFLAG_RW, 0, "ZFS ZIO");
+TUNABLE_INT("vfs.zfs.zio.use_uma", &zio_use_uma);
+SYSCTL_INT(_vfs_zfs_zio, OID_AUTO, use_uma, CTLFLAG_RDTUN, &zio_use_uma, 0,
+    "Use uma(9) for ZIO allocations");
+
 /*
  * ==========================================================================
  * I/O priority table
@@ -69,10 +80,8 @@ char *zio_type_name[ZIO_TYPES] = {
  * ==========================================================================
  */
 kmem_cache_t *zio_cache;
-#ifdef ZIO_USE_UMA
 kmem_cache_t *zio_buf_cache[SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT];
 kmem_cache_t *zio_data_buf_cache[SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT];
-#endif
 
 #ifdef _KERNEL
 extern vmem_t *zio_alloc_arena;
@@ -88,13 +97,10 @@ extern vmem_t *zio_alloc_arena;
 void
 zio_init(void)
 {
-#ifdef ZIO_USE_UMA
 	size_t c;
-#endif
 	zio_cache = kmem_cache_create("zio_cache", sizeof (zio_t), 0,
 	    NULL, NULL, NULL, NULL, NULL, 0);
 
-#ifdef ZIO_USE_UMA
 	/*
 	 * For small buffers, we want a cache for each multiple of
 	 * SPA_MINBLOCKSIZE.  For medium-size buffers, we want a cache
@@ -138,7 +144,6 @@ zio_init(void)
 		if (zio_data_buf_cache[c - 1] == NULL)
 			zio_data_buf_cache[c - 1] = zio_data_buf_cache[c];
 	}
-#endif
 
 	zio_inject_init();
 }
@@ -146,7 +151,6 @@ zio_init(void)
 void
 zio_fini(void)
 {
-#ifdef ZIO_USE_UMA
 	size_t c;
 	kmem_cache_t *last_cache = NULL;
 	kmem_cache_t *last_data_cache = NULL;
@@ -164,7 +168,6 @@ zio_fini(void)
 		}
 		zio_data_buf_cache[c] = NULL;
 	}
-#endif
 
 	kmem_cache_destroy(zio_cache);
 
@@ -186,15 +189,14 @@ zio_fini(void)
 void *
 zio_buf_alloc(size_t size)
 {
-#ifdef ZIO_USE_UMA
 	size_t c = (size - 1) >> SPA_MINBLOCKSHIFT;
 
 	ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT);
 
-	return (kmem_cache_alloc(zio_buf_cache[c], KM_PUSHPAGE));
-#else
-	return (kmem_alloc(size, KM_SLEEP));
-#endif
+	if (zio_use_uma)
+		return (kmem_cache_alloc(zio_buf_cache[c], KM_PUSHPAGE));
+	else
+		return (kmem_alloc(size, KM_SLEEP));
 }
 
 /*
@@ -206,43 +208,40 @@ zio_buf_alloc(size_t size)
 void *
 zio_data_buf_alloc(size_t size)
 {
-#ifdef ZIO_USE_UMA
 	size_t c = (size - 1) >> SPA_MINBLOCKSHIFT;
 
 	ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT);
 
-	return (kmem_cache_alloc(zio_data_buf_cache[c], KM_PUSHPAGE));
-#else
-	return (kmem_alloc(size, KM_SLEEP));
-#endif
+	if (zio_use_uma)
+		return (kmem_cache_alloc(zio_data_buf_cache[c], KM_PUSHPAGE));
+	else
+		return (kmem_alloc(size, KM_SLEEP));
 }
 
 void
 zio_buf_free(void *buf, size_t size)
 {
-#ifdef ZIO_USE_UMA
 	size_t c = (size - 1) >> SPA_MINBLOCKSHIFT;
 
 	ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT);
 
-	kmem_cache_free(zio_buf_cache[c], buf);
-#else
-	kmem_free(buf, size);
-#endif
+	if (zio_use_uma)
+		kmem_cache_free(zio_buf_cache[c], buf);
+	else
+		kmem_free(buf, size);
 }
 
 void
 zio_data_buf_free(void *buf, size_t size)
 {
-#ifdef ZIO_USE_UMA
 	size_t c = (size - 1) >> SPA_MINBLOCKSHIFT;
 
 	ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT);
 
-	kmem_cache_free(zio_data_buf_cache[c], buf);
-#else
-	kmem_free(buf, size);
-#endif
+	if (zio_use_uma)
+		kmem_cache_free(zio_data_buf_cache[c], buf);
+	else
+		kmem_free(buf, size);
 }
 
 /*
@@ -908,8 +907,8 @@ zio_taskq_dispatch(zio_t *zio, enum zio_
 	if (t == ZIO_TYPE_WRITE && zio->io_vd && zio->io_vd->vdev_aux)
 		t = ZIO_TYPE_NULL;
 
-	(void) taskq_dispatch(zio->io_spa->spa_zio_taskq[t][q],
-	    (task_func_t *)zio_execute, zio, TQ_SLEEP);
+	(void) taskq_dispatch_safe(zio->io_spa->spa_zio_taskq[t][q],
+	    (task_func_t *)zio_execute, zio, &zio->io_task);
 }
 
 static boolean_t
@@ -2220,9 +2219,9 @@ zio_done(zio_t *zio)
 			 * Reexecution is potentially a huge amount of work.
 			 * Hand it off to the otherwise-unused claim taskq.
 			 */
-			(void) taskq_dispatch(
+			(void) taskq_dispatch_safe(
 			    spa->spa_zio_taskq[ZIO_TYPE_CLAIM][ZIO_TASKQ_ISSUE],
-			    (task_func_t *)zio_reexecute, zio, TQ_SLEEP);
+			    (task_func_t *)zio_reexecute, zio, &zio->io_task);
 		}
 		return (ZIO_PIPELINE_STOP);
 	}

Modified: stable/8/sys/kern/vfs_subr.c
==============================================================================
--- stable/8/sys/kern/vfs_subr.c	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/sys/kern/vfs_subr.c	Mon May 24 10:09:36 2010	(r208487)
@@ -800,7 +800,6 @@ vnlru_proc(void)
 		}
 		mtx_unlock(&mountlist_mtx);
 		if (done == 0) {
-			EVENTHANDLER_INVOKE(vfs_lowvnodes, desiredvnodes / 10);
 #if 0
 			/* These messages are temporary debugging aids */
 			if (vnlru_nowhere < 5)
@@ -822,6 +821,19 @@ static struct kproc_desc vnlru_kp = {
 };
 SYSINIT(vnlru, SI_SUB_KTHREAD_UPDATE, SI_ORDER_FIRST, kproc_start,
     &vnlru_kp);
+ 
+static void
+vfs_lowmem(void *arg __unused)
+{
+
+	/*
+	 * On low memory condition free 1/8th of the free vnodes.
+	 */
+	mtx_lock(&vnode_free_list_mtx);
+	vnlru_free(freevnodes / 8);
+	mtx_unlock(&vnode_free_list_mtx);
+}
+EVENTHANDLER_DEFINE(vm_lowmem, vfs_lowmem, NULL, 0);
 
 /*
  * Routines having to do with the management of the vnode table.

Modified: stable/8/sys/modules/zfs/Makefile
==============================================================================
--- stable/8/sys/modules/zfs/Makefile	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/sys/modules/zfs/Makefile	Mon May 24 10:09:36 2010	(r208487)
@@ -63,9 +63,6 @@ ZFS_SRCS=	${ZFS_OBJS:C/.o$/.c/}
 SRCS+=	${ZFS_SRCS}
 SRCS+=	vdev_geom.c
 
-# Use UMA for ZIO allocation.
-CFLAGS+=-DZIO_USE_UMA
-
 # Use FreeBSD's namecache.
 CFLAGS+=-DFREEBSD_NAMECACHE
 

Modified: stable/8/sys/sys/eventhandler.h
==============================================================================
--- stable/8/sys/sys/eventhandler.h	Mon May 24 07:04:00 2010	(r208486)
+++ stable/8/sys/sys/eventhandler.h	Mon May 24 10:09:36 2010	(r208487)
@@ -183,10 +183,6 @@ typedef void (*vm_lowmem_handler_t)(void
 #define	LOWMEM_PRI_DEFAULT	EVENTHANDLER_PRI_FIRST
 EVENTHANDLER_DECLARE(vm_lowmem, vm_lowmem_handler_t);
 
-/* Low vnodes event */
-typedef void (*vfs_lowvnodes_handler_t)(void *, int);
-EVENTHANDLER_DECLARE(vfs_lowvnodes, vfs_lowvnodes_handler_t);
-
 /* Root mounted event */
 typedef void (*mountroot_handler_t)(void *);
 EVENTHANDLER_DECLARE(mountroot, mountroot_handler_t);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 12 Eugene M. Zheganin 2011-06-17 20:26:46 UTC
Can kern/157534 be related to this bug ?
Seems like I tested if after the last MFC commit bunch mentioned here, 
and it's still there.

Sorry if it can't.
Comment 13 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2014-06-01 07:17:00 UTC
State Changed
From-To: open->feedback

Could you try this patch: 

http://people.freebsd.org/~pjd/patches/vdev_geom.c.3.patch 

It is against most recent HEAD. If it is rejected on 8-STABLE, just grab 
entire vdev_geom.c from HEAD and patch this. 


Comment 14 Pawel Jakub Dawidek freebsd_committer freebsd_triage 2014-06-01 07:17:00 UTC
Responsible Changed
From-To: freebsd-fs->pjd

I'll take this one.
Comment 15 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:58:28 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped