Summary: | [zfs] deadlock after detaching block device from raidz pool | ||
---|---|---|---|
Product: | Base System | Reporter: | Alex.Bakhtin |
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | Open --- | ||
Severity: | Affects Only Me | ||
Priority: | Normal | ||
Version: | 8.0-STABLE | ||
Hardware: | Any | ||
OS: | Any |
Description
Alex.Bakhtin
2010-04-03 08:47:07 UTC
Responsible Changed From-To: freebsd-bugs->freebsd-fs Reassign to FS team. Are you sure that this is a deadlock? If yes, could you please describe what you see in more details. I am asking because to me it seems like a NULL pointer crash: > Fatal trap 12: page fault while in kernel mode > cpuid = 1; apic id = 01 > fault virtual address = 0x48 It looks like perhaps zio->io_vd became NULL while an I/O response was traveling up and vdev_geom_io_intr was not prepared to handle that. > _mtx_lock_flags() at _mtx_lock_flags+0x39 > vdev_geom_io_intr() at vdev_geom_io_intr+0x62 > g_io_schedule_up() at g_io_schedule_up+0xed > g_up_procbody() at g_up_procbody+0x6f > fork_exit() at fork_exit+0x12a > fork_trampoline() at fork_trampoline+0xe -- Andriy Gapon Andriy, Sorry for delay, gmail put your mail into spam folder. > Are you sure that this is a deadlock? Sorry, the problem description seems to be not 100 percent clear. > If yes, could you please describe what you see in more details. On GENERIC I discovered a deadlock when I detach device from raidz pool if there is intensive writing to the pool. The box responds to pings but doesn't respond to power button (ACPI request ignored). I built kernel with the following config: > cat /sys/amd64/conf/DEBUG include GENERIC ident DEBUG options ALT_BREAK_TO_DEBUGGER options INVARIANTS options INVARIANT_SUPPORT options WITNESS options DEBUG_LOCKS options DEBUG_VFS_LOCKS options DIAGNOSTIC options KDB options DDB options INCLUDE_CONFIG_FILE and got this crash. After looking into crashinfo I assumed that it crashes in _mtx_lock_flags because of debugging options (as I can see - there are many asserts in this function) but probably I'm wrong. I checked /mnt/crash directory and discovered that there is a full crash info gathered by sysutils/bsdcrashtar. Probably, this info could help to find the root cause? tar tvzf crash.10.tar.gz drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/ lrwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/machine -> usr/src.old/sys/amd64/include drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/mnt/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/boot/ -rwxr-xr-x 0 root wheel 211 Apr 3 06:13 crash.10/debug.sh -rw-r--r-- 0 root wheel 50 Apr 3 06:13 crash.10/README drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/boot/kernel/ -r-xr-xr-x 0 root wheel 12581947 Apr 3 06:13 crash.10/boot/kernel/kernel -r-xr-xr-x 0 root wheel 44335787 Apr 3 06:13 crash.10/boot/kernel/kernel.symbols -r-xr-xr-x 0 root wheel 1532664 Apr 3 06:13 crash.10/boot/kernel/zfs.ko -r-xr-xr-x 0 root wheel 12693960 Apr 3 06:13 crash.10/boot/kernel/zfs.ko.symbols -r-xr-xr-x 0 root wheel 9832 Apr 3 06:13 crash.10/boot/kernel/opensolaris.ko -r-xr-xr-x 0 root wheel 145808 Apr 3 06:13 crash.10/boot/kernel/opensolaris.ko.symbols -r-xr-xr-x 0 root wheel 146048 Apr 3 06:13 crash.10/boot/kernel/geom_mirror.ko -r-xr-xr-x 0 root wheel 314512 Apr 3 06:13 crash.10/boot/kernel/geom_mirror.ko.symbols drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cam/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/ddb/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/dev/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/fs/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/geom/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/kern/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/modules/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/net/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/nfs/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/nfsserver/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/rpc/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/security/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/sys/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/ufs/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/vm/ -rw-r--r-- 0 root wheel 83274 Apr 3 06:13 crash.10/usr/src.old/sys/vm/uma_core.c -rw-r--r-- 0 root wheel 26799 Apr 3 06:13 crash.10/usr/src.old/sys/vm/vm_glue.c -rw-r--r-- 0 root wheel 104178 Apr 3 06:13 crash.10/usr/src.old/sys/vm/vm_map.c -rw-r--r-- 0 root wheel 45074 Apr 3 06:13 crash.10/usr/src.old/sys/vm/vm_pageout.c -rw-r--r-- 0 root wheel 4978 Apr 3 06:13 crash.10/usr/src.old/sys/vm/vm_zeroidle.c drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/ufs/ffs/ -rw-r--r-- 0 root wheel 189645 Apr 3 06:13 crash.10/usr/src.old/sys/ufs/ffs/ffs_softdep.c -rw-r--r-- 0 root wheel 18465 Apr 3 06:13 crash.10/usr/src.old/sys/sys/buf.h -rw-r--r-- 0 root wheel 8941 Apr 3 06:13 crash.10/usr/src.old/sys/sys/file.h -rw-r--r-- 0 root wheel 32408 Apr 3 06:13 crash.10/usr/src.old/sys/sys/mbuf.h drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/security/audit/ -rw-r--r-- 0 root wheel 15538 Apr 3 06:13 crash.10/usr/src.old/sys/security/audit/audit_worker.c -rw-r--r-- 0 root wheel 30942 Apr 3 06:13 crash.10/usr/src.old/sys/rpc/svc.c -rw-r--r-- 0 root wheel 13440 Apr 3 06:13 crash.10/usr/src.old/sys/nfsserver/nfs_srvkrpc.c -rw-r--r-- 0 root wheel 4928 Apr 3 06:13 crash.10/usr/src.old/sys/nfs/nfs_nfssvc.c -rw-r--r-- 0 root wheel 45178 Apr 3 06:13 crash.10/usr/src.old/sys/net/flowtable.c drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/compat/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ -rw-r--r-- 0 root wheel 129959 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c -rw-r--r-- 0 root wheel 17453 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c -rw-r--r-- 0 root wheel 112698 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c -rw-r--r-- 0 root wheel 15102 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c -rw-r--r-- 0 root wheel 15178 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c -rw-r--r-- 0 root wheel 120600 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c -rw-r--r-- 0 root wheel 63052 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/compat/opensolaris/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/compat/opensolaris/kern/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/compat/opensolaris/sys/ -rw-r--r-- 0 root wheel 3843 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/compat/opensolaris/sys/atomic.h -rw-r--r-- 0 root wheel 3673 Apr 3 06:13 crash.10/usr/src.old/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/modules/zfs/ -rw-r--r-- 0 root wheel 21756 Apr 3 06:13 crash.10/usr/src.old/sys/kern/init_main.c -rw-r--r-- 0 root wheel 11706 Apr 3 06:13 crash.10/usr/src.old/sys/kern/kern_condvar.c -rw-r--r-- 0 root wheel 24056 Apr 3 06:13 crash.10/usr/src.old/sys/kern/kern_exit.c -rw-r--r-- 0 root wheel 22111 Apr 3 06:13 crash.10/usr/src.old/sys/kern/kern_fork.c -rw-r--r-- 0 root wheel 46996 Apr 3 06:13 crash.10/usr/src.old/sys/kern/kern_intr.c -rw-r--r-- 0 root wheel 25236 Apr 3 06:13 crash.10/usr/src.old/sys/kern/kern_malloc.c -rw-r--r-- 0 root wheel 23736 Apr 3 06:13 crash.10/usr/src.old/sys/kern/kern_mutex.c -rw-r--r-- 0 root wheel 79669 Apr 3 06:13 crash.10/usr/src.old/sys/kern/kern_sig.c -rw-r--r-- 0 root wheel 15720 Apr 3 06:13 crash.10/usr/src.old/sys/kern/kern_synch.c -rw-r--r-- 0 root wheel 36031 Apr 3 06:13 crash.10/usr/src.old/sys/kern/kern_time.c -rw-r--r-- 0 root wheel 71672 Apr 3 06:13 crash.10/usr/src.old/sys/kern/sched_ule.c -rw-r--r-- 0 root wheel 12187 Apr 3 06:13 crash.10/usr/src.old/sys/kern/subr_kdb.c -rw-r--r-- 0 root wheel 33114 Apr 3 06:13 crash.10/usr/src.old/sys/kern/subr_sleepqueue.c -rw-r--r-- 0 root wheel 10565 Apr 3 06:13 crash.10/usr/src.old/sys/kern/subr_taskqueue.c -rw-r--r-- 0 root wheel 34998 Apr 3 06:13 crash.10/usr/src.old/sys/kern/sys_generic.c -rw-r--r-- 0 root wheel 48706 Apr 3 06:13 crash.10/usr/src.old/sys/kern/tty.c -rw-r--r-- 0 root wheel 28015 Apr 3 06:13 crash.10/usr/src.old/sys/kern/tty_ttydisc.c -rw-r--r-- 0 root wheel 93141 Apr 3 06:13 crash.10/usr/src.old/sys/kern/uipc_socket.c -rw-r--r-- 0 root wheel 109887 Apr 3 06:13 crash.10/usr/src.old/sys/kern/vfs_bio.c -rw-r--r-- 0 root wheel 110274 Apr 3 06:13 crash.10/usr/src.old/sys/kern/vfs_subr.c -rw-r--r-- 0 root wheel 19944 Apr 3 06:13 crash.10/usr/src.old/sys/geom/geom_io.c -rw-r--r-- 0 root wheel 6915 Apr 3 06:13 crash.10/usr/src.old/sys/geom/geom_kern.c drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/fs/devfs/ -rw-r--r-- 0 root wheel 36833 Apr 3 06:13 crash.10/usr/src.old/sys/fs/devfs/devfs_vnops.c drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/dev/fdc/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/dev/md/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/dev/random/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/dev/usb/ -rw-r--r-- 0 root wheel 13618 Apr 3 06:13 crash.10/usr/src.old/sys/dev/usb/usb_process.c -rw-r--r-- 0 root wheel 11623 Apr 3 06:13 crash.10/usr/src.old/sys/dev/random/randomdev_soft.c -rw-r--r-- 0 root wheel 31483 Apr 3 06:13 crash.10/usr/src.old/sys/dev/md/md.c -rw-r--r-- 0 root wheel 49685 Apr 3 06:13 crash.10/usr/src.old/sys/dev/fdc/fdc.c -rw-r--r-- 0 root wheel 17269 Apr 3 06:13 crash.10/usr/src.old/sys/ddb/db_command.c -rw-r--r-- 0 root wheel 6005 Apr 3 06:13 crash.10/usr/src.old/sys/ddb/db_main.c -rw-r--r-- 0 root wheel 15802 Apr 3 06:13 crash.10/usr/src.old/sys/ddb/db_script.c -rw-r--r-- 0 root wheel 125529 Apr 3 06:13 crash.10/usr/src.old/sys/cam/cam_xpt.c drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/amd64/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/pc/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/xen/ -rw-r--r-- 0 root wheel 1848 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/_bus.h -rw-r--r-- 0 root wheel 8622 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/_inttypes.h -rw-r--r-- 0 root wheel 4152 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/_limits.h -rw-r--r-- 0 root wheel 5605 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/_stdint.h -rw-r--r-- 0 root wheel 4437 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/_types.h -rw-r--r-- 0 root wheel 3188 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/acpica_machdep.h -rw-r--r-- 0 root wheel 14393 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/apicreg.h -rw-r--r-- 0 root wheel 9118 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/apicvar.h -rw-r--r-- 0 root wheel 3183 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/asm.h -rw-r--r-- 0 root wheel 7814 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/asmacros.h -rw-r--r-- 0 root wheel 16060 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/atomic.h -rw-r--r-- 0 root wheel 33225 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/bus.h -rw-r--r-- 0 root wheel 1558 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/bus_dma.h -rw-r--r-- 0 root wheel 1036 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/clock.h -rw-r--r-- 0 root wheel 2850 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/cpu.h -rw-r--r-- 0 root wheel 14534 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/cpufunc.h -rw-r--r-- 0 root wheel 2218 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/cputypes.h -rw-r--r-- 0 root wheel 3175 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/db_machdep.h -rw-r--r-- 0 root wheel 3938 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/elf.h -rw-r--r-- 0 root wheel 4572 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/endian.h -rw-r--r-- 0 root wheel 1830 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/exec.h -rw-r--r-- 0 root wheel 3135 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/float.h -rw-r--r-- 0 root wheel 2099 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/floatingpoint.h -rw-r--r-- 0 root wheel 3912 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/fpu.h -rw-r--r-- 0 root wheel 2808 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/frame.h -rw-r--r-- 0 root wheel 1867 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/gdb_machdep.h -rw-r--r-- 0 root wheel 8880 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/ieeefp.h -rw-r--r-- 0 root wheel 2951 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/in_cksum.h -rw-r--r-- 0 root wheel 6089 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/intr_machdep.h -rw-r--r-- 0 root wheel 1503 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/iodev.h -rw-r--r-- 0 root wheel 1914 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/kdb.h -rw-r--r-- 0 root wheel 2462 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/legacyvar.h -rw-r--r-- 0 root wheel 1976 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/limits.h -rw-r--r-- 0 root wheel 1898 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/mca.h -rw-r--r-- 0 root wheel 3753 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/md_var.h -rw-r--r-- 0 root wheel 1605 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/memdev.h -rw-r--r-- 0 root wheel 1629 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/metadata.h -rw-r--r-- 0 root wheel 1769 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/minidump.h -rw-r--r-- 0 root wheel 1595 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/mp_watchdog.h -rw-r--r-- 0 root wheel 3994 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/mptable.h -rw-r--r-- 0 root wheel 1787 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/mutex.h -rw-r--r-- 0 root wheel 1879 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/nexusvar.h -rw-r--r-- 0 root wheel 5770 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/param.h -rw-r--r-- 0 root wheel 3397 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/pcb.h -rw-r--r-- 0 root wheel 2023 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/pci_cfgreg.h -rw-r--r-- 0 root wheel 7786 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/pcpu.h -rw-r--r-- 0 root wheel 10731 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/pmap.h -rw-r--r-- 0 root wheel 4236 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/pmc_mdep.h -rw-r--r-- 0 root wheel 1949 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/ppireg.h -rw-r--r-- 0 root wheel 2939 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/proc.h -rw-r--r-- 0 root wheel 3710 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/psl.h -rw-r--r-- 0 root wheel 6071 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/profile.h -rw-r--r-- 0 root wheel 1791 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/ptrace.h -rw-r--r-- 0 root wheel 4453 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/reg.h -rw-r--r-- 0 root wheel 2342 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/reloc.h -rw-r--r-- 0 root wheel 1991 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/resource.h -rw-r--r-- 0 root wheel 1918 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/runq.h -rw-r--r-- 0 root wheel 10148 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/segments.h -rw-r--r-- 0 root wheel 2237 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/setjmp.h -rw-r--r-- 0 root wheel 2166 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/sf_buf.h -rw-r--r-- 0 root wheel 1979 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/sigframe.h -rw-r--r-- 0 root wheel 3377 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/signal.h -rw-r--r-- 0 root wheel 2314 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/smp.h -rw-r--r-- 0 root wheel 18029 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/specialreg.h -rw-r--r-- 0 root wheel 1454 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/stack.h -rw-r--r-- 0 root wheel 2647 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/stdarg.h -rw-r--r-- 0 root wheel 3068 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/sysarch.h -rw-r--r-- 0 root wheel 2125 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/timerreg.h -rw-r--r-- 0 root wheel 4008 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/trap.h -rw-r--r-- 0 root wheel 3001 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/tss.h -rw-r--r-- 0 root wheel 3326 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/ucontext.h -rw-r--r-- 0 root wheel 3451 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/varargs.h -rw-r--r-- 0 root wheel 2094 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/vm.h -rw-r--r-- 0 root wheel 7181 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/vmparam.h -rw-r--r-- 0 root wheel 10175 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/xen/hypercall.h -rw-r--r-- 0 root wheel 3418 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/xen/synch_bitops.h -rw-r--r-- 0 root wheel 9309 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/xen/xen-os.h -rw-r--r-- 0 root wheel 2728 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/xen/xenfunc.h -rw-r--r-- 0 root wheel 7859 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/xen/xenpmap.h -rw-r--r-- 0 root wheel 3537 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/xen/xenvar.h -rw-r--r-- 0 root wheel 2791 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/pc/bios.h -rw-r--r-- 0 root wheel 1013 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/include/pc/display.h -rw-r--r-- 0 root wheel 21900 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/amd64/exception.S -rw-r--r-- 0 root wheel 3094 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/amd64/locore.S -rw-r--r-- 0 root wheel 35899 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/amd64/mp_machdep.c -rw-r--r-- 0 root wheel 127991 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/amd64/pmap.c -rw-r--r-- 0 root wheel 28692 Apr 3 06:13 crash.10/usr/src.old/sys/amd64/amd64/trap.c drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/mnt/obj/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/mnt/crash/ -rw------- 0 root wheel 3331166208 Apr 3 06:13 crash.10/mnt/crash/vmcore.10 -rw------- 0 root wheel 470 Apr 3 06:13 crash.10/mnt/crash/info.10 drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/mnt/obj/usr/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/mnt/obj/usr/src.old/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/mnt/obj/usr/src.old/sys/ drwxr-xr-x 0 root wheel 0 Apr 3 06:13 crash.10/mnt/obj/usr/src.old/sys/DEBUG/ -rw-r--r-- 0 root wheel 92901 Apr 3 06:13 crash.10/mnt/obj/usr/src.old/sys/DEBUG/vnode_if.c > I am asking because to me it seems like a NULL pointer crash: >> Fatal trap 12: page fault while in kernel mode >> cpuid = 1; apic id = 01 >> fault virtual address = 0x48 > > It looks like perhaps zio->io_vd became NULL while an I/O response was traveling > up and vdev_geom_io_intr was not prepared to handle that. > >> _mtx_lock_flags() at _mtx_lock_flags+0x39 >> vdev_geom_io_intr() at vdev_geom_io_intr+0x62 >> g_io_schedule_up() at g_io_schedule_up+0xed >> g_up_procbody() at g_up_procbody+0x6f >> fork_exit() at fork_exit+0x12a >> fork_trampoline() at fork_trampoline+0xe If there is any info I can gather on GENERIC - please let me know. The only crashinfo I have is on debug kernel. Alex Bakhtin I've seen a similar issue in the past while testing hot-removal of RAIDZ members (glabeled siis(4)-attached devices). After the /dev/ada* entry would disappear, the /dev/label/diskXX entry would remain and crash shortly down the line with ZFS IO. Here's the panic info in case it is relevant: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 14 fault virtual address = 0x48 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff8035f375 stack pointer = 0x28:0xffffff800006db60 frame pointer = 0x28:0xffffff800006db70 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) [thread pid 2 tid 100014 ] Stopped at _mtx_lock_flags+0x15: lock cmpxchgq %rsi,0x18(%rdi) db> bt Tracing pid 2 tid 100014 td 0xffffff00014d4ab0 _mtx_lock_flags() at _mtx_lock_flags+0x15 vdev_geom_release() at vdev_geom_release+0x33 vdev_geom_orphan() at vdev_geom_orphan+0x15c g_run_events() at g_run_events+0x104 g_event_procbody() at g_event_procbody+0x55 fork_exit() at fork_exit+0x118 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff800006dd30, rbp = 0 --- Can you try this patch? --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c @@ -603,6 +603,9 @@ vdev_geom_io_intr(struct bio *bp) zio = bp->bio_caller1; ctx = zio->io_vd->vdev_tsd; + if (ctx == NULL) + return; + if ((zio->io_error = bp->bio_error) == 0 && bp->bio_resid != 0) zio->io_error = EIO; -- Andriy Gapon Andriy, Upgraded to today's stable. Reproduced the problem. On GENERIC the system just hangs with the following output: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D ad12: FAILURE - WRITE_DMA48 status=3D7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=3D0 LBA=3D2312588250^ Fatal trap 12: page fault while in kernel mode cpuid =3D 1; apic id =3D 01 fault virtual address =3D 0x48 fault code =3D supervisor write data, page not present instruction pointer =3D 0x20:0xffffffff80593e95 stack pointer =3D 0x28:0xffffff8000065ba0 frame pointer =3D 0x28:0xffffff8000065bb0 code segment =3D base rx0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 3 (g_up) trap number =3D 12 panic: page fault cpuid =3D 1 Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0x0 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff80545a28 stack pointer =3D 0x28:0xffffff80eada2a40 frame pointer =3D 0x28:0xffffff80eada2a90 code segment =3D base rx0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 0 (spa_zio) trap number =3D 12 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D With GENERIG + DDB/KDB enabled I got the following (it seems that first time I detached the device when there was no active transaction - can try to reproduce): =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D ad12: FAILURE - device detached Fatal trap 12: page fault while in kernel mode cpuid =3D 1; apic id =3D 01 fault virtual address =3D 0x48 fault code =3D supervisor write data, page not present instruction pointer =3D 0x20:0xffffffff805a0345 Fatal double fault stack pointer =3D 0x28:0xffffff800006aba0 rip =3D 0xffffffff808085ad frame pointer =3D 0x28:0xffffff800006abb0 rsp =3D 0xffffff80ead87000 code segment =3D base rx0, limit 0xfffff, type 0x1b rbp =3D 0xffffff80ead87070 =3D DPL 0, pres 1, long 1, def32 0, gran 1 cpuid =3D 0; processor eflags =3D apic id =3D 00 interrupt enabled, panic: double fault resume, cpuid =3D 0 IOPL =3D 0 KDB: enter: panic c[thread pid 0 tid 100113 ] Stopped at kdb_enter+0x3d: movq $0,0x69cee0(%rip) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D And another one =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D ad12: FAILURE - WRITE_DMA status=3D7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR> error=3D0 LBA=3D111033498^M ^M ^M Fatal trap 12: page fault while in kernel mode^M cpuid =3D 1; apic id =3D 01^M fault virtual address =3D 0x48^M fault code =3D supervisor write data, page not present^M instruction pointer =3D 0x20:0xffffffff805a0345^M stack pointer =3D 0x28:0xffffff800006aba0^M frame pointer =3D 0x28:0xffffff800006abb0^M code segment =3D base rx0, limit 0xfffff, type 0x1b^M =3D DPL 0, pres 1, long 1, def32 0, gran 1^M processor eflags =3D interrupt enabled, resume, IOPL =3D 0^M current process =3D 3 (g_up)^M [thread pid 3 tid 100011 ] Stopped at _mtx_lock_flags+0x15: lock cmpxchgq %rsi,0x18(%rdi) db:0:kdb.enter.default> capture on =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D And with your patch the system doesn't detect that device is detached and seems to be dead-locked (doesn't respond to power-button): =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D acpi0: suspend request ignored (not ready yet) acpi0: request to enter state S5 failed (err 6) acpi0: suspend request ignored (not ready yet) acpi0: request to enter state S5 failed (err 6) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D So, I can still easily reproduce this problem on 8-STABLE. Your simple patch helps to avoid page fault but dead-locks the system. Are you sure that you can just return at this point? Probably it make sense to set some error flag before return? Alex Bakhtin 2010/4/23 Andriy Gapon <avg@icyb.net.ua>: > > Can you try this patch? > > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c > @@ -603,6 +603,9 @@ vdev_geom_io_intr(struct bio *bp) > =A0 =A0 =A0 =A0zio =3D bp->bio_caller1; > =A0 =A0 =A0 =A0ctx =3D zio->io_vd->vdev_tsd; > > + =A0 =A0 =A0 if (ctx =3D=3D NULL) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return; > + > =A0 =A0 =A0 =A0if ((zio->io_error =3D bp->bio_error) =3D=3D 0 && bp->bio_= resid !=3D 0) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0zio->io_error =3D EIO; > > > -- > Andriy Gapon > on 04/05/2010 02:23 Alex Bakhtin said the following: > > So, I can still easily reproduce this problem on 8-STABLE. Your > simple patch helps to avoid page fault but dead-locks the system. Are > you sure that you can just return at this point? Probably it make > sense to set some error flag before return? You are correct, my simple patch is far from being correct. And properly fixing the problem is not trivial. Some issues: 1. vdev_geom_release() sets vdev_tsd to NULL before shutting down the corresponding gc_queue; because of that, bios that may later come via vdev_geom_io_intr() can not be mapped to their gc_queue and thus there is no choice but to drop them on the floor. 2. Shutdown logic in vdev_geom_worker() does not seem to be reliable. I think that vdev thread may get stuck forever if a bio happens to be on gc_queue when vdev_geom_release() is called. In that case gc_state check may be skipped and gc_queue may never be waken up again. 3. I am not sure if pending zios are taken care of when vdev_geom_release() is called. If not, then they may get stuck forever. Hopefully Pawel can help us here. > 2010/4/23 Andriy Gapon <avg@icyb.net.ua>: >> Can you try this patch? >> >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >> @@ -603,6 +603,9 @@ vdev_geom_io_intr(struct bio *bp) >> zio = bp->bio_caller1; >> ctx = zio->io_vd->vdev_tsd; >> >> + if (ctx == NULL) >> + return; >> + >> if ((zio->io_error = bp->bio_error) == 0 && bp->bio_resid != 0) >> zio->io_error = EIO; >> >> >> -- >> Andriy Gapon >> -- Andriy Gapon Author: pjd Date: Sun May 16 11:56:42 2010 New Revision: 208142 URL: http://svn.freebsd.org/changeset/base/208142 Log: The whole point of having dedicated worker thread for each leaf VDEV was to avoid calling zio_interrupt() from geom_up thread context. It turns out that when provider is forcibly removed from the system and we kill worker thread there can still be some ZIOs pending. To complete pending ZIOs when there is no worker thread anymore we still have to call zio_interrupt() from geom_up context. To avoid this race just remove use of worker threads altogether. This should be more or less fine, because I also thought that zio_interrupt() does more work, but it only makes small UMA allocation with M_WAITOK. It also saves one context switch per I/O request. PR: kern/145339 Reported by: Alex Bakhtin <Alex.Bakhtin@gmail.com> MFC after: 1 week Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c ============================================================================== --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Sun May 16 11:17:21 2010 (r208141) +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Sun May 16 11:56:42 2010 (r208142) @@ -47,31 +47,6 @@ struct g_class zfs_vdev_class = { DECLARE_GEOM_CLASS(zfs_vdev_class, zfs_vdev); -typedef struct vdev_geom_ctx { - struct g_consumer *gc_consumer; - int gc_state; - struct bio_queue_head gc_queue; - struct mtx gc_queue_mtx; -} vdev_geom_ctx_t; - -static void -vdev_geom_release(vdev_t *vd) -{ - vdev_geom_ctx_t *ctx; - - ctx = vd->vdev_tsd; - vd->vdev_tsd = NULL; - - mtx_lock(&ctx->gc_queue_mtx); - ctx->gc_state = 1; - wakeup_one(&ctx->gc_queue); - while (ctx->gc_state != 2) - msleep(&ctx->gc_state, &ctx->gc_queue_mtx, 0, "vgeom:w", 0); - mtx_unlock(&ctx->gc_queue_mtx); - mtx_destroy(&ctx->gc_queue_mtx); - kmem_free(ctx, sizeof(*ctx)); -} - static void vdev_geom_orphan(struct g_consumer *cp) { @@ -96,8 +71,7 @@ vdev_geom_orphan(struct g_consumer *cp) ZFS_LOG(1, "Destroyed geom %s.", gp->name); g_wither_geom(gp, error); } - vdev_geom_release(vd); - + vd->vdev_tsd = NULL; vd->vdev_remove_wanted = B_TRUE; spa_async_request(vd->vdev_spa, SPA_ASYNC_REMOVE); } @@ -188,52 +162,6 @@ vdev_geom_detach(void *arg, int flag __u } } -static void -vdev_geom_worker(void *arg) -{ - vdev_geom_ctx_t *ctx; - zio_t *zio; - struct bio *bp; - - thread_lock(curthread); - sched_prio(curthread, PRIBIO); - thread_unlock(curthread); - - ctx = arg; - for (;;) { - mtx_lock(&ctx->gc_queue_mtx); - bp = bioq_takefirst(&ctx->gc_queue); - if (bp == NULL) { - if (ctx->gc_state == 1) { - ctx->gc_state = 2; - wakeup_one(&ctx->gc_state); - mtx_unlock(&ctx->gc_queue_mtx); - kthread_exit(); - } - msleep(&ctx->gc_queue, &ctx->gc_queue_mtx, - PRIBIO | PDROP, "vgeom:io", 0); - continue; - } - mtx_unlock(&ctx->gc_queue_mtx); - zio = bp->bio_caller1; - zio->io_error = bp->bio_error; - if (bp->bio_cmd == BIO_FLUSH && bp->bio_error == ENOTSUP) { - vdev_t *vd; - - /* - * If we get ENOTSUP, we know that no future - * attempts will ever succeed. In this case we - * set a persistent bit so that we don't bother - * with the ioctl in the future. - */ - vd = zio->io_vd; - vd->vdev_nowritecache = B_TRUE; - } - g_destroy_bio(bp); - zio_interrupt(zio); - } -} - static uint64_t nvlist_get_guid(nvlist_t *list) { @@ -488,7 +416,6 @@ vdev_geom_open_by_path(vdev_t *vd, int c static int vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift) { - vdev_geom_ctx_t *ctx; struct g_provider *pp; struct g_consumer *cp; int error, owned; @@ -557,19 +484,9 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi } cp->private = vd; - - ctx = kmem_zalloc(sizeof(*ctx), KM_SLEEP); - bioq_init(&ctx->gc_queue); - mtx_init(&ctx->gc_queue_mtx, "zfs:vdev:geom:queue", NULL, MTX_DEF); - ctx->gc_consumer = cp; - ctx->gc_state = 0; - - vd->vdev_tsd = ctx; + vd->vdev_tsd = cp; pp = cp->provider; - kproc_kthread_add(vdev_geom_worker, ctx, &zfsproc, NULL, 0, 0, - "zfskern", "vdev %s", pp->name); - /* * Determine the actual size of the device. */ @@ -592,50 +509,49 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi static void vdev_geom_close(vdev_t *vd) { - vdev_geom_ctx_t *ctx; struct g_consumer *cp; - if ((ctx = vd->vdev_tsd) == NULL) - return; - if ((cp = ctx->gc_consumer) == NULL) + cp = vd->vdev_tsd; + if (cp == NULL) return; - vdev_geom_release(vd); + vd->vdev_tsd = NULL; g_post_event(vdev_geom_detach, cp, M_WAITOK, NULL); } static void vdev_geom_io_intr(struct bio *bp) { - vdev_geom_ctx_t *ctx; zio_t *zio; zio = bp->bio_caller1; - ctx = zio->io_vd->vdev_tsd; - - if ((zio->io_error = bp->bio_error) == 0 && bp->bio_resid != 0) + zio->io_error = bp->bio_error; + if (zio->io_error == 0 && bp->bio_resid != 0) zio->io_error = EIO; + if (bp->bio_cmd == BIO_FLUSH && bp->bio_error == ENOTSUP) { + vdev_t *vd; - mtx_lock(&ctx->gc_queue_mtx); - bioq_insert_tail(&ctx->gc_queue, bp); - wakeup_one(&ctx->gc_queue); - mtx_unlock(&ctx->gc_queue_mtx); + /* + * If we get ENOTSUP, we know that no future + * attempts will ever succeed. In this case we + * set a persistent bit so that we don't bother + * with the ioctl in the future. + */ + vd = zio->io_vd; + vd->vdev_nowritecache = B_TRUE; + } + g_destroy_bio(bp); + zio_interrupt(zio); } static int vdev_geom_io_start(zio_t *zio) { vdev_t *vd; - vdev_geom_ctx_t *ctx; struct g_consumer *cp; struct bio *bp; int error; - cp = NULL; - vd = zio->io_vd; - ctx = vd->vdev_tsd; - if (ctx != NULL) - cp = ctx->gc_consumer; if (zio->io_type == ZIO_TYPE_IOCTL) { /* XXPOLICY */ @@ -664,6 +580,7 @@ vdev_geom_io_start(zio_t *zio) return (ZIO_PIPELINE_CONTINUE); } sendreq: + cp = vd->vdev_tsd; if (cp == NULL) { zio->io_error = ENXIO; return (ZIO_PIPELINE_CONTINUE); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" Pawel, I tested your patch in the following zfs configuration (all on 5x2TB WD20EARS drivers): 1. raidz1 on top of physical disks. 2. raidz1 on top of geli 3. raidz2 on top of physical disks. In all three cases it seems that the problem was fixed - I can't crash zfs in vdev_geom when unplugging the disk. Unfortunately, 3 times I got a deadlock in zfs after plugging vdevs back under load. It happens several seconds after zpool online command. I'm not 100 percent sure that deadlocks are related to this patch, but... I'm going to make some additional testing with patched and not patched kernels. 2010/5/13 <pjd@freebsd.org>: > Synopsis: [zfs] deadlock after detaching block device from raidz pool > > State-Changed-From-To: open->feedback > State-Changed-By: pjd > State-Changed-When: czw 13 maj 2010 09:33:20 UTC > State-Changed-Why: > Could you try this patch: > > =A0 =A0 =A0 =A0http://people.freebsd.org/~pjd/patches/vdev_geom.c.3.patch > > It is against most recent HEAD. If it is rejected on 8-STABLE, just grab > entire vdev_geom.c from HEAD and patch this. > > > Responsible-Changed-From-To: freebsd-fs->pjd > Responsible-Changed-By: pjd > Responsible-Changed-When: czw 13 maj 2010 09:33:20 UTC > Responsible-Changed-Why: > I'll take this one. > > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D145339 > Pawel, I made some additional testing. Now I'm 95 percent sure that this deadlock was introduced by this patch. I tried patched and non-patched GENERIC kernel. It seems that it is harder to reproduce this deadlock on raidz1 than raidz2. With raidz2 I tried to detach and reattach back two = disk at the same time, and deadlock is 100% reproducible on patched kernel = but I can't reproduce it on non-patched kernel. How to reproduce: 1. Create raiz2 pool 2. Detach two devices while the pool is idle. 3. Start writing to the pool (dd if=3D/dev/zero of=3D/storage/test = bs=3D1m) 4. atacontrol detach/attach to have disks back. 5. Online two disks at the same time (zpool online storage adX adY). 6. Wait some time (in my testing - several seconds, less than one = minute) - all disk activity would be stopped. After that it's impossible to abort zpool online or dd command. Also it's impossible to reboot without hard-reset. If you need core from deadlocked kernel please let me know. Alex Bakhtin -----Original Message----- From: Alex Bakhtin [mailto:alex.bakhtin@gmail.com]=20 Sent: Monday, May 17, 2010 7:37 PM To: pjd@freebsd.org Cc: freebsd-fs@freebsd.org; bug-followup@freebsd.org Subject: Re: kern/145339: [zfs] deadlock after detaching block device = from raidz pool Pawel, I tested your patch in the following zfs configuration (all on 5x2TB WD20EARS drivers): 1. raidz1 on top of physical disks. 2. raidz1 on top of geli 3. raidz2 on top of physical disks. In all three cases it seems that the problem was fixed - I can't crash zfs in vdev_geom when unplugging the disk. Unfortunately, 3 times I got a deadlock in zfs after plugging vdevs back under load. It happens several seconds after zpool online command. I'm not 100 percent sure that deadlocks are related to this patch, but... I'm going to make some additional testing with patched and not patched kernels. 2010/5/13 <pjd@freebsd.org>: > Synopsis: [zfs] deadlock after detaching block device from raidz pool > > State-Changed-From-To: open->feedback > State-Changed-By: pjd > State-Changed-When: czw 13 maj 2010 09:33:20 UTC > State-Changed-Why: > Could you try this patch: > > =9A =9A =9A = =9Ahttp://people.freebsd.org/~pjd/patches/vdev_geom.c.3.patch > > It is against most recent HEAD. If it is rejected on 8-STABLE, just = grab > entire vdev_geom.c from HEAD and patch this. > > > Responsible-Changed-From-To: freebsd-fs->pjd > Responsible-Changed-By: pjd > Responsible-Changed-When: czw 13 maj 2010 09:33:20 UTC > Responsible-Changed-Why: > I'll take this one. > > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D145339 > Author: pjd Date: Mon May 24 10:09:36 2010 New Revision: 208487 URL: http://svn.freebsd.org/changeset/base/208487 Log: MFC r207920,r207934,r207936,r207937,r207970,r208142,r208147,r208148,r208166, r208454,r208455,r208458: r207920: Back out r205134. It is not stable. r207934: Add missing new line characters to the warnings. r207936: Eventhough r203504 eliminates taste traffic provoked by vdev_geom.c, ZFS still like to open all vdevs, close them and open them again, which in turn provokes taste traffic anyway. I don't know of any clean way to fix it, so do it the hard way - if we can't open provider for writing just retry 5 times with 0.5 pauses. This should elimitate accidental races caused by other classes tasting providers created on top of our vdevs. Reported by: James R. Van Artsdalen <james-freebsd-fs2@jrv.org> Reported by: Yuri Pankov <yuri.pankov@gmail.com> r207937: I added vfs_lowvnodes event, but it was only used for a short while and now it is totally unused. Remove it. r207970: When there is no memory or KVA, try to help by reclaiming some vnodes. This helps with 'kmem_map too small' panics. No objections from: kib Tested by: Alexander V. Ribchansky <shurik@zk.informjust.ua> r208142: The whole point of having dedicated worker thread for each leaf VDEV was to avoid calling zio_interrupt() from geom_up thread context. It turns out that when provider is forcibly removed from the system and we kill worker thread there can still be some ZIOs pending. To complete pending ZIOs when there is no worker thread anymore we still have to call zio_interrupt() from geom_up context. To avoid this race just remove use of worker threads altogether. This should be more or less fine, because I also thought that zio_interrupt() does more work, but it only makes small UMA allocation with M_WAITOK. It also saves one context switch per I/O request. PR: kern/145339 Reported by: Alex Bakhtin <Alex.Bakhtin@gmail.com> r208147: Add task structure to zio and use it instead of allocating one. This eliminates the only place where we can sleep when calling zio_interrupt(). As a side-effect this can actually improve performance a little as we allocate one less thing for every I/O. Prodded by: kib r208148: Allow to configure UMA usage for ZIO data via loader and turn it on by default for amd64. On i386 I saw performance degradation when UMA was used, but for amd64 it should help. r208166: Fix userland build by making io_task available only for the kernel and by providing taskq_dispatch_safe() macro. r208454: Remove ZIO_USE_UMA from arc.c as well. r208455: ZIO_USE_UMA is no longer used. r208458: Create UMA zones unconditionally. Added: stable/8/sys/cddl/compat/opensolaris/sys/taskq.h - copied unchanged from r208147, head/sys/cddl/compat/opensolaris/sys/taskq.h Modified: stable/8/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h stable/8/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c stable/8/sys/cddl/compat/opensolaris/sys/dnlc.h stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c stable/8/sys/kern/vfs_subr.c stable/8/sys/modules/zfs/Makefile stable/8/sys/sys/eventhandler.h Directory Properties: stable/8/cddl/contrib/opensolaris/ (props changed) stable/8/cddl/contrib/opensolaris/cmd/zdb/ (props changed) stable/8/cddl/contrib/opensolaris/cmd/zfs/ (props changed) stable/8/cddl/contrib/opensolaris/lib/libzfs/ (props changed) stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) stable/8/sys/dev/xen/xenpci/ (props changed) stable/8/sys/geom/sched/ (props changed) Modified: stable/8/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h ============================================================================== --- stable/8/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h Mon May 24 07:04:00 2010 (r208486) +++ stable/8/cddl/contrib/opensolaris/lib/libzpool/common/sys/zfs_context.h Mon May 24 10:09:36 2010 (r208487) @@ -343,6 +343,9 @@ extern void taskq_wait(taskq_t *); extern int taskq_member(taskq_t *, void *); extern void system_taskq_init(void); +#define taskq_dispatch_safe(tq, func, arg, task) \ + taskq_dispatch((tq), (func), (arg), TQ_SLEEP) + #define XVA_MAPSIZE 3 #define XVA_MAGIC 0x78766174 Modified: stable/8/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c ============================================================================== --- stable/8/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c Mon May 24 07:04:00 2010 (r208486) +++ stable/8/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c Mon May 24 10:09:36 2010 (r208487) @@ -40,12 +40,6 @@ __FBSDID("$FreeBSD$"); static uma_zone_t taskq_zone; -struct ostask { - struct task ost_task; - task_func_t *ost_func; - void *ost_arg; -}; - taskq_t *system_taskq = NULL; static void @@ -140,3 +134,32 @@ taskq_dispatch(taskq_t *tq, task_func_t return ((taskqid_t)(void *)task); } + +#define TASKQ_MAGIC 0x74541c + +static void +taskq_run_safe(void *arg, int pending __unused) +{ + struct ostask *task = arg; + + ASSERT(task->ost_magic == TASKQ_MAGIC); + task->ost_func(task->ost_arg); + task->ost_magic = 0; +} + +taskqid_t +taskq_dispatch_safe(taskq_t *tq, task_func_t func, void *arg, + struct ostask *task) +{ + + ASSERT(task->ost_magic != TASKQ_MAGIC); + + task->ost_magic = TASKQ_MAGIC; + task->ost_func = func; + task->ost_arg = arg; + + TASK_INIT(&task->ost_task, 0, taskq_run_safe, task); + taskqueue_enqueue(tq->tq_queue, &task->ost_task); + + return ((taskqid_t)(void *)task); +} Modified: stable/8/sys/cddl/compat/opensolaris/sys/dnlc.h ============================================================================== --- stable/8/sys/cddl/compat/opensolaris/sys/dnlc.h Mon May 24 07:04:00 2010 (r208486) +++ stable/8/sys/cddl/compat/opensolaris/sys/dnlc.h Mon May 24 10:09:36 2010 (r208487) @@ -35,6 +35,6 @@ #define dnlc_update(dvp, name, vp) do { } while (0) #define dnlc_remove(dvp, name) do { } while (0) #define dnlc_purge_vfsp(vfsp, count) (0) -#define dnlc_reduce_cache(percent) EVENTHANDLER_INVOKE(vfs_lowvnodes, (int)(intptr_t)(percent)) +#define dnlc_reduce_cache(percent) do { } while (0) #endif /* !_OPENSOLARIS_SYS_DNLC_H_ */ Copied: stable/8/sys/cddl/compat/opensolaris/sys/taskq.h (from r208147, head/sys/cddl/compat/opensolaris/sys/taskq.h) ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ stable/8/sys/cddl/compat/opensolaris/sys/taskq.h Mon May 24 10:09:36 2010 (r208487, copy of r208147, head/sys/cddl/compat/opensolaris/sys/taskq.h) @@ -0,0 +1,44 @@ +/*- + * Copyright (c) 2010 Pawel Jakub Dawidek <pjd@FreeBSD.org> + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _OPENSOLARIS_SYS_TASKQ_H_ +#define _OPENSOLARIS_SYS_TASKQ_H_ + +#include_next <sys/taskq.h> + +struct ostask { + struct task ost_task; + task_func_t *ost_func; + void *ost_arg; + int ost_magic; +}; + +taskqid_t taskq_dispatch_safe(taskq_t *tq, task_func_t func, void *arg, + struct ostask *task); + +#endif /* _OPENSOLARIS_SYS_TASKQ_H_ */ Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Mon May 24 07:04:00 2010 (r208486) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c Mon May 24 10:09:36 2010 (r208487) @@ -195,11 +195,6 @@ SYSCTL_QUAD(_vfs_zfs, OID_AUTO, arc_min, SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_disable, CTLFLAG_RDTUN, &zfs_mdcomp_disable, 0, "Disable metadata compression"); -#ifdef ZIO_USE_UMA -extern kmem_cache_t *zio_buf_cache[]; -extern kmem_cache_t *zio_data_buf_cache[]; -#endif - /* * Note that buffers can be in one of 6 states: * ARC_anon - anonymous (discussed below) @@ -620,11 +615,6 @@ static buf_hash_table_t buf_hash_table; uint64_t zfs_crc64_table[256]; -#ifdef ZIO_USE_UMA -extern kmem_cache_t *zio_buf_cache[]; -extern kmem_cache_t *zio_data_buf_cache[]; -#endif - /* * Level 2 ARC */ @@ -2192,14 +2182,15 @@ arc_reclaim_needed(void) return (0); } +extern kmem_cache_t *zio_buf_cache[]; +extern kmem_cache_t *zio_data_buf_cache[]; + static void arc_kmem_reap_now(arc_reclaim_strategy_t strat) { -#ifdef ZIO_USE_UMA size_t i; kmem_cache_t *prev_cache = NULL; kmem_cache_t *prev_data_cache = NULL; -#endif #ifdef _KERNEL if (arc_meta_used >= arc_meta_limit) { @@ -2224,7 +2215,6 @@ arc_kmem_reap_now(arc_reclaim_strategy_t if (strat == ARC_RECLAIM_AGGR) arc_shrink(); -#ifdef ZIO_USE_UMA for (i = 0; i < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT; i++) { if (zio_buf_cache[i] != prev_cache) { prev_cache = zio_buf_cache[i]; @@ -2235,7 +2225,6 @@ arc_kmem_reap_now(arc_reclaim_strategy_t kmem_cache_reap_now(zio_data_buf_cache[i]); } } -#endif kmem_cache_reap_now(buf_cache); kmem_cache_reap_now(hdr_cache); } Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h Mon May 24 07:04:00 2010 (r208486) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h Mon May 24 10:09:36 2010 (r208487) @@ -316,6 +316,11 @@ struct zio { /* FMA state */ uint64_t io_ena; + +#ifdef _KERNEL + /* FreeBSD only. */ + struct ostask io_task; +#endif }; extern zio_t *zio_null(zio_t *pio, spa_t *spa, Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Mon May 24 07:04:00 2010 (r208486) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Mon May 24 10:09:36 2010 (r208487) @@ -47,31 +47,6 @@ struct g_class zfs_vdev_class = { DECLARE_GEOM_CLASS(zfs_vdev_class, zfs_vdev); -typedef struct vdev_geom_ctx { - struct g_consumer *gc_consumer; - int gc_state; - struct bio_queue_head gc_queue; - struct mtx gc_queue_mtx; -} vdev_geom_ctx_t; - -static void -vdev_geom_release(vdev_t *vd) -{ - vdev_geom_ctx_t *ctx; - - ctx = vd->vdev_tsd; - vd->vdev_tsd = NULL; - - mtx_lock(&ctx->gc_queue_mtx); - ctx->gc_state = 1; - wakeup_one(&ctx->gc_queue); - while (ctx->gc_state != 2) - msleep(&ctx->gc_state, &ctx->gc_queue_mtx, 0, "vgeom:w", 0); - mtx_unlock(&ctx->gc_queue_mtx); - mtx_destroy(&ctx->gc_queue_mtx); - kmem_free(ctx, sizeof(*ctx)); -} - static void vdev_geom_orphan(struct g_consumer *cp) { @@ -96,8 +71,7 @@ vdev_geom_orphan(struct g_consumer *cp) ZFS_LOG(1, "Destroyed geom %s.", gp->name); g_wither_geom(gp, error); } - vdev_geom_release(vd); - + vd->vdev_tsd = NULL; vd->vdev_remove_wanted = B_TRUE; spa_async_request(vd->vdev_spa, SPA_ASYNC_REMOVE); } @@ -188,52 +162,6 @@ vdev_geom_detach(void *arg, int flag __u } } -static void -vdev_geom_worker(void *arg) -{ - vdev_geom_ctx_t *ctx; - zio_t *zio; - struct bio *bp; - - thread_lock(curthread); - sched_prio(curthread, PRIBIO); - thread_unlock(curthread); - - ctx = arg; - for (;;) { - mtx_lock(&ctx->gc_queue_mtx); - bp = bioq_takefirst(&ctx->gc_queue); - if (bp == NULL) { - if (ctx->gc_state == 1) { - ctx->gc_state = 2; - wakeup_one(&ctx->gc_state); - mtx_unlock(&ctx->gc_queue_mtx); - kthread_exit(); - } - msleep(&ctx->gc_queue, &ctx->gc_queue_mtx, - PRIBIO | PDROP, "vgeom:io", 0); - continue; - } - mtx_unlock(&ctx->gc_queue_mtx); - zio = bp->bio_caller1; - zio->io_error = bp->bio_error; - if (bp->bio_cmd == BIO_FLUSH && bp->bio_error == ENOTSUP) { - vdev_t *vd; - - /* - * If we get ENOTSUP, we know that no future - * attempts will ever succeed. In this case we - * set a persistent bit so that we don't bother - * with the ioctl in the future. - */ - vd = zio->io_vd; - vd->vdev_nowritecache = B_TRUE; - } - g_destroy_bio(bp); - zio_interrupt(zio); - } -} - static uint64_t nvlist_get_guid(nvlist_t *list) { @@ -396,7 +324,7 @@ vdev_geom_attach_by_guid_event(void *arg continue; ap->cp = vdev_geom_attach(pp); if (ap->cp == NULL) { - printf("ZFS WARNING: Unable to attach to %s.", + printf("ZFS WARNING: Unable to attach to %s.\n", pp->name); continue; } @@ -488,7 +416,6 @@ vdev_geom_open_by_path(vdev_t *vd, int c static int vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *ashift) { - vdev_geom_ctx_t *ctx; struct g_provider *pp; struct g_consumer *cp; int error, owned; @@ -530,10 +457,19 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi ZFS_LOG(1, "Provider %s not found.", vd->vdev_path); error = ENOENT; } else if (cp->acw == 0 && (spa_mode & FWRITE) != 0) { + int i; + g_topology_lock(); - error = g_access(cp, 0, 1, 0); + for (i = 0; i < 5; i++) { + error = g_access(cp, 0, 1, 0); + if (error == 0) + break; + g_topology_unlock(); + tsleep(vd, 0, "vdev", hz / 2); + g_topology_lock(); + } if (error != 0) { - printf("ZFS WARNING: Unable to open %s for writing (error=%d).", + printf("ZFS WARNING: Unable to open %s for writing (error=%d).\n", vd->vdev_path, error); vdev_geom_detach(cp, 0); cp = NULL; @@ -548,19 +484,9 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi } cp->private = vd; - - ctx = kmem_zalloc(sizeof(*ctx), KM_SLEEP); - bioq_init(&ctx->gc_queue); - mtx_init(&ctx->gc_queue_mtx, "zfs:vdev:geom:queue", NULL, MTX_DEF); - ctx->gc_consumer = cp; - ctx->gc_state = 0; - - vd->vdev_tsd = ctx; + vd->vdev_tsd = cp; pp = cp->provider; - kproc_kthread_add(vdev_geom_worker, ctx, &zfsproc, NULL, 0, 0, - "zfskern", "vdev %s", pp->name); - /* * Determine the actual size of the device. */ @@ -583,50 +509,49 @@ vdev_geom_open(vdev_t *vd, uint64_t *psi static void vdev_geom_close(vdev_t *vd) { - vdev_geom_ctx_t *ctx; struct g_consumer *cp; - if ((ctx = vd->vdev_tsd) == NULL) + cp = vd->vdev_tsd; + if (cp == NULL) return; - if ((cp = ctx->gc_consumer) == NULL) - return; - vdev_geom_release(vd); + vd->vdev_tsd = NULL; g_post_event(vdev_geom_detach, cp, M_WAITOK, NULL); } static void vdev_geom_io_intr(struct bio *bp) { - vdev_geom_ctx_t *ctx; zio_t *zio; zio = bp->bio_caller1; - ctx = zio->io_vd->vdev_tsd; - - if ((zio->io_error = bp->bio_error) == 0 && bp->bio_resid != 0) + zio->io_error = bp->bio_error; + if (zio->io_error == 0 && bp->bio_resid != 0) zio->io_error = EIO; + if (bp->bio_cmd == BIO_FLUSH && bp->bio_error == ENOTSUP) { + vdev_t *vd; - mtx_lock(&ctx->gc_queue_mtx); - bioq_insert_tail(&ctx->gc_queue, bp); - wakeup_one(&ctx->gc_queue); - mtx_unlock(&ctx->gc_queue_mtx); + /* + * If we get ENOTSUP, we know that no future + * attempts will ever succeed. In this case we + * set a persistent bit so that we don't bother + * with the ioctl in the future. + */ + vd = zio->io_vd; + vd->vdev_nowritecache = B_TRUE; + } + g_destroy_bio(bp); + zio_interrupt(zio); } static int vdev_geom_io_start(zio_t *zio) { vdev_t *vd; - vdev_geom_ctx_t *ctx; struct g_consumer *cp; struct bio *bp; int error; - cp = NULL; - vd = zio->io_vd; - ctx = vd->vdev_tsd; - if (ctx != NULL) - cp = ctx->gc_consumer; if (zio->io_type == ZIO_TYPE_IOCTL) { /* XXPOLICY */ @@ -655,6 +580,7 @@ vdev_geom_io_start(zio_t *zio) return (ZIO_PIPELINE_CONTINUE); } sendreq: + cp = vd->vdev_tsd; if (cp == NULL) { zio->io_error = ENXIO; return (ZIO_PIPELINE_CONTINUE); Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Mon May 24 07:04:00 2010 (r208486) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c Mon May 24 10:09:36 2010 (r208487) @@ -33,6 +33,17 @@ #include <sys/zio_compress.h> #include <sys/zio_checksum.h> +#if defined(__amd64__) +static int zio_use_uma = 1; +#else +static int zio_use_uma = 0; +#endif +SYSCTL_DECL(_vfs_zfs); +SYSCTL_NODE(_vfs_zfs, OID_AUTO, zio, CTLFLAG_RW, 0, "ZFS ZIO"); +TUNABLE_INT("vfs.zfs.zio.use_uma", &zio_use_uma); +SYSCTL_INT(_vfs_zfs_zio, OID_AUTO, use_uma, CTLFLAG_RDTUN, &zio_use_uma, 0, + "Use uma(9) for ZIO allocations"); + /* * ========================================================================== * I/O priority table @@ -69,10 +80,8 @@ char *zio_type_name[ZIO_TYPES] = { * ========================================================================== */ kmem_cache_t *zio_cache; -#ifdef ZIO_USE_UMA kmem_cache_t *zio_buf_cache[SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT]; kmem_cache_t *zio_data_buf_cache[SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT]; -#endif #ifdef _KERNEL extern vmem_t *zio_alloc_arena; @@ -88,13 +97,10 @@ extern vmem_t *zio_alloc_arena; void zio_init(void) { -#ifdef ZIO_USE_UMA size_t c; -#endif zio_cache = kmem_cache_create("zio_cache", sizeof (zio_t), 0, NULL, NULL, NULL, NULL, NULL, 0); -#ifdef ZIO_USE_UMA /* * For small buffers, we want a cache for each multiple of * SPA_MINBLOCKSIZE. For medium-size buffers, we want a cache @@ -138,7 +144,6 @@ zio_init(void) if (zio_data_buf_cache[c - 1] == NULL) zio_data_buf_cache[c - 1] = zio_data_buf_cache[c]; } -#endif zio_inject_init(); } @@ -146,7 +151,6 @@ zio_init(void) void zio_fini(void) { -#ifdef ZIO_USE_UMA size_t c; kmem_cache_t *last_cache = NULL; kmem_cache_t *last_data_cache = NULL; @@ -164,7 +168,6 @@ zio_fini(void) } zio_data_buf_cache[c] = NULL; } -#endif kmem_cache_destroy(zio_cache); @@ -186,15 +189,14 @@ zio_fini(void) void * zio_buf_alloc(size_t size) { -#ifdef ZIO_USE_UMA size_t c = (size - 1) >> SPA_MINBLOCKSHIFT; ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT); - return (kmem_cache_alloc(zio_buf_cache[c], KM_PUSHPAGE)); -#else - return (kmem_alloc(size, KM_SLEEP)); -#endif + if (zio_use_uma) + return (kmem_cache_alloc(zio_buf_cache[c], KM_PUSHPAGE)); + else + return (kmem_alloc(size, KM_SLEEP)); } /* @@ -206,43 +208,40 @@ zio_buf_alloc(size_t size) void * zio_data_buf_alloc(size_t size) { -#ifdef ZIO_USE_UMA size_t c = (size - 1) >> SPA_MINBLOCKSHIFT; ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT); - return (kmem_cache_alloc(zio_data_buf_cache[c], KM_PUSHPAGE)); -#else - return (kmem_alloc(size, KM_SLEEP)); -#endif + if (zio_use_uma) + return (kmem_cache_alloc(zio_data_buf_cache[c], KM_PUSHPAGE)); + else + return (kmem_alloc(size, KM_SLEEP)); } void zio_buf_free(void *buf, size_t size) { -#ifdef ZIO_USE_UMA size_t c = (size - 1) >> SPA_MINBLOCKSHIFT; ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT); - kmem_cache_free(zio_buf_cache[c], buf); -#else - kmem_free(buf, size); -#endif + if (zio_use_uma) + kmem_cache_free(zio_buf_cache[c], buf); + else + kmem_free(buf, size); } void zio_data_buf_free(void *buf, size_t size) { -#ifdef ZIO_USE_UMA size_t c = (size - 1) >> SPA_MINBLOCKSHIFT; ASSERT(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT); - kmem_cache_free(zio_data_buf_cache[c], buf); -#else - kmem_free(buf, size); -#endif + if (zio_use_uma) + kmem_cache_free(zio_data_buf_cache[c], buf); + else + kmem_free(buf, size); } /* @@ -908,8 +907,8 @@ zio_taskq_dispatch(zio_t *zio, enum zio_ if (t == ZIO_TYPE_WRITE && zio->io_vd && zio->io_vd->vdev_aux) t = ZIO_TYPE_NULL; - (void) taskq_dispatch(zio->io_spa->spa_zio_taskq[t][q], - (task_func_t *)zio_execute, zio, TQ_SLEEP); + (void) taskq_dispatch_safe(zio->io_spa->spa_zio_taskq[t][q], + (task_func_t *)zio_execute, zio, &zio->io_task); } static boolean_t @@ -2220,9 +2219,9 @@ zio_done(zio_t *zio) * Reexecution is potentially a huge amount of work. * Hand it off to the otherwise-unused claim taskq. */ - (void) taskq_dispatch( + (void) taskq_dispatch_safe( spa->spa_zio_taskq[ZIO_TYPE_CLAIM][ZIO_TASKQ_ISSUE], - (task_func_t *)zio_reexecute, zio, TQ_SLEEP); + (task_func_t *)zio_reexecute, zio, &zio->io_task); } return (ZIO_PIPELINE_STOP); } Modified: stable/8/sys/kern/vfs_subr.c ============================================================================== --- stable/8/sys/kern/vfs_subr.c Mon May 24 07:04:00 2010 (r208486) +++ stable/8/sys/kern/vfs_subr.c Mon May 24 10:09:36 2010 (r208487) @@ -800,7 +800,6 @@ vnlru_proc(void) } mtx_unlock(&mountlist_mtx); if (done == 0) { - EVENTHANDLER_INVOKE(vfs_lowvnodes, desiredvnodes / 10); #if 0 /* These messages are temporary debugging aids */ if (vnlru_nowhere < 5) @@ -822,6 +821,19 @@ static struct kproc_desc vnlru_kp = { }; SYSINIT(vnlru, SI_SUB_KTHREAD_UPDATE, SI_ORDER_FIRST, kproc_start, &vnlru_kp); + +static void +vfs_lowmem(void *arg __unused) +{ + + /* + * On low memory condition free 1/8th of the free vnodes. + */ + mtx_lock(&vnode_free_list_mtx); + vnlru_free(freevnodes / 8); + mtx_unlock(&vnode_free_list_mtx); +} +EVENTHANDLER_DEFINE(vm_lowmem, vfs_lowmem, NULL, 0); /* * Routines having to do with the management of the vnode table. Modified: stable/8/sys/modules/zfs/Makefile ============================================================================== --- stable/8/sys/modules/zfs/Makefile Mon May 24 07:04:00 2010 (r208486) +++ stable/8/sys/modules/zfs/Makefile Mon May 24 10:09:36 2010 (r208487) @@ -63,9 +63,6 @@ ZFS_SRCS= ${ZFS_OBJS:C/.o$/.c/} SRCS+= ${ZFS_SRCS} SRCS+= vdev_geom.c -# Use UMA for ZIO allocation. -CFLAGS+=-DZIO_USE_UMA - # Use FreeBSD's namecache. CFLAGS+=-DFREEBSD_NAMECACHE Modified: stable/8/sys/sys/eventhandler.h ============================================================================== --- stable/8/sys/sys/eventhandler.h Mon May 24 07:04:00 2010 (r208486) +++ stable/8/sys/sys/eventhandler.h Mon May 24 10:09:36 2010 (r208487) @@ -183,10 +183,6 @@ typedef void (*vm_lowmem_handler_t)(void #define LOWMEM_PRI_DEFAULT EVENTHANDLER_PRI_FIRST EVENTHANDLER_DECLARE(vm_lowmem, vm_lowmem_handler_t); -/* Low vnodes event */ -typedef void (*vfs_lowvnodes_handler_t)(void *, int); -EVENTHANDLER_DECLARE(vfs_lowvnodes, vfs_lowvnodes_handler_t); - /* Root mounted event */ typedef void (*mountroot_handler_t)(void *); EVENTHANDLER_DECLARE(mountroot, mountroot_handler_t); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" Can kern/157534 be related to this bug ? Seems like I tested if after the last MFC commit bunch mentioned here, and it's still there. Sorry if it can't. State Changed From-To: open->feedback Could you try this patch: http://people.freebsd.org/~pjd/patches/vdev_geom.c.3.patch It is against most recent HEAD. If it is rejected on 8-STABLE, just grab entire vdev_geom.c from HEAD and patch this. Responsible Changed From-To: freebsd-fs->pjd I'll take this one. For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped |