I upgraded 10.0-RELEASE to 10.0-STABLE, can not mount root anymore zfs panics on boot. After rebooting with 10.0-RELEASE ISO i can zpool import without errors. I do not have dump, because panic is before filesystem is mounted. panic: double fault stacktrace vdev_queue_io_to_issue vdev_queue_io zio_vdev_io_start zio_execute vdev_mirror_io_start zio_vdev_io_start zio_execute zio_gang_tree_assemble zio_gang_assemble zio_execute spa_load_verify_cb traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp traverse_visitbp
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
What version of stable, what zpool layout are you using on what hardware? Regards Steve
10.0-STABLE #0 r265265 running in VMware. i run this virtual machine nice freebsd 7. Problems with ZFS started with 10.0 release. Sometimes it panicked at boot during initial mount after unclean shutdown, but restarting virtual machine fixed problem. Reported problem is after good shutdown and update to 10-STABLE. ZFS version 5, pool version 5000. Architecture is i386 (32-bit). If you know how to make panic dump work during initial mount let me know i will capture more data. =
Can you try building a debug kernel with the attached patch. It should allow you to configure a dump device before the root device is mounted by specifying it in /boot/loader.conf kern.shutdown.defaultdumpdev="ada4p3" Ensure you choose a valid device such a swap device. If all goes well that will enable you to get a proper stack trace from the dump. Regards Steve
Any luck with the patch Radim?
With your path, i was able to capture crash dump. This patch should be added to mainstream bsd to help people with capture boot panics. Here is beginning of core.txt file. Let me know if you need more info. Trying to mount root from zfs:root []... Fatal double fault: eip = 0xc0cb801e esp = 0xd93eff98 ebp = 0xd93f0300 cpuid = 0; apic id = 00 panic: double fault cpuid = 0 KDB: enter: panic Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/krpc.ko.symbols...done. Loaded symbols for /boot/kernel/krpc.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/if_vmx.ko.symbols...done. Loaded symbols for /boot/kernel/if_vmx.ko.symbols Reading symbols from /boot/kernel/cc_hd.ko.symbols...done. Loaded symbols for /boot/kernel/cc_hd.ko.symbols Reading symbols from /boot/kernel/h_ertt.ko.symbols...done. Loaded symbols for /boot/kernel/h_ertt.ko.symbols #0 doadump (textdump=0) at pcpu.h:233 233 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=0) at pcpu.h:233 #1 0xc04c87d1 in db_dump (dummy=-1066847139, dummy2=0, dummy3=-1, dummy4=0xc0a808b4 "") at /usr/src/sys/ddb/db_command.c:543 #2 0xc04c82cb in db_command (cmd_table=<value optimized out>) at /usr/src/sys/ddb/db_command.c:449 #3 0xc04c8010 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502 #4 0xc04ca8a1 in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:231 #5 0xc0693bd4 in kdb_trap (type=<value optimized out>, code=<value optimized out>, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:656 #6 0xc093918f in trap (frame=<value optimized out>) at /usr/src/sys/i386/i386/trap.c:712 #7 0xc092336c in calltrap () at /usr/src/sys/i386/i386/exception.s:170 #8 0xc069345d in kdb_enter (why=0xc09a7ef8 "panic", msg=<value optimized out>) at cpufunc.h:71 #9 0xc065710f in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:823 #10 0xc0939acb in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:1072 #11 0xc0cb801e in vdev_queue_io_to_issue (vq=0xc46a4b00) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:471 #12 0xc0cb7fb8 in vdev_queue_io (zio=0xc4855000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:744 #13 0xc0cd84ee in zio_vdev_io_start (ziop=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2607 #14 0xc0cd4c18 in zio_execute (zio=0xc4855000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1350 #15 0xc0cb74e4 in vdev_mirror_io_start (zio=0xc4823894) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:284 =
> This could be something that has already has a fixed in head but wasn't expected > to get triggered without TRIM queueing which isn't in 10-stable yet. The fix > is also likely to change based on feedback from the openzfs guys, hence isn't > in 10-stable yet. only workaround to get 10-STABLE boot without panic is to boot 10.0 and then import/export pool. 265046 is already merged in 10-stable. Now building stable with 265152 and 265321 merged. =
----- Original Message ----- From: "Radim Kolar" <hsn@sendmail.cz> >> This could be something that has already has a fixed in head but wasn't expected >> to get triggered without TRIM queueing which isn't in 10-stable yet. The fix >> is also likely to change based on feedback from the openzfs guys, hence isn't >> in 10-stable yet. > > only workaround to get 10-STABLE boot without panic is to boot 10.0 and then import/export pool > > > 265046 is already merged in 10-stable. Ah yes I did merge that one just in case. > Now building stable with 265152 and 265321 merged. You shouldn't need 265152 as thats just for queued TRIM's and may just confuse things further. I'm not sure 265321 on its own will make a difference here as thats a fix for stack overflow and as your stack in the trace is only 14 deep shouldn't be your case. Once your done building and if it does still panic can you confirm the code line at the panic location with kgdb as well as the details of the zio being processed under pretty print.
> 265046 is already merged in 10-stable. > Now building stable with 265152 and 265321 merged. these 2 patches did not remove panic on boot. I am compiling kernel with makeoptions DEBUG=-g WITH_CTF=1 are there any options which i can use for creating more debugable dump? such as add -O0 somewhere? =
crash stack from patched kernel #7 0xc092336c in calltrap () at /usr/src/sys/i386/i386/exception.s:170 #8 0xc069345d in kdb_enter (why=0xc09a7ef8 "panic", msg=<value optimized out>) at cpufunc.h:71 #9 0xc065710f in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:823 #10 0xc0939acb in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:1072 #11 0xc0cb8187 in vdev_queue_io_to_issue (vq=0xc46ab300) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:489 #12 0xc0cb8119 in vdev_queue_io (zio=0xc486c5b8) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:768 #13 0xc0cd8704 in zio_vdev_io_start (ziop=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2617 #14 0xc0cd4e28 in zio_execute (zio=0xc486c5b8) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1357 #15 0xc0cb7604 in vdev_mirror_io_start (zio=0xc476f000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:284 #16 0xc0cd854f in zio_vdev_io_start (ziop=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2537 #17 0xc0cd4e28 in zio_execute (zio=0xc476f000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1357 #18 0xc0ca0ddc in spa_load_verify_cb (zilog=0x0, bp=<value optimized out>, dnp=0xc4b0fa00, arg=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:1887 #19 0xc0c5eb79 in traverse_visitbp (td=0xd93f1120, dnp=0xc46ab300, bp=0xc4b18100, zb=0xd93f06b0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c:272 #20 0xc0c5edb0 in traverse_visitbp (td=0xd93f1120, dnp=0xc4b0fa00, bp=0x80, zb=0xd93f07a8) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c:306 failing line 489 is: if (avl_numnodes(&vq->vq_active_tree) >= zfs_vdev_max_active) return (ZIO_PRIORITY_NUM_QUEUEABLE); =
----- Original Message ----- From: "Radim Kolar" <hsn@sendmail.cz> failing line 489 is: if (avl_numnodes(&vq->vq_active_tree) >= zfs_vdev_max_active) return (ZIO_PRIORITY_NUM_QUEUEABLE); Ok so thats what I thought it was could you see what vq is?
> Ok so thats what I thought it was could you see what vq is? (kgdb) up 11 #11 0xc0cb8187 in vdev_queue_io_to_issue (vq=0xc46ab300) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:489 489 if (avl_numnodes(&vq->vq_active_tree) >= zfs_vdev_max_active) Current language: auto; currently minimal (kgdb) print vq $1 = (vdev_queue_t *) 0xc46ab300 (kgdb) print *vq $2 = {vq_vdev = 0xc46ab000, vq_class = {{vqc_active = 0, vqc_queued_tree = { avl_root = 0x0, avl_compar = 0xc0cb7de0 <vdev_queue_timestamp_compare>, avl_offset = 476, avl_numnodes = 0, avl_size = 732}}, { vqc_active = 0, vqc_queued_tree = {avl_root = 0x0, avl_compar = 0xc0cb7de0 <vdev_queue_timestamp_compare>, avl_offset = 476, avl_numnodes = 0, avl_size = 732}}, { vqc_active = 1, vqc_queued_tree = {avl_root = 0x0, avl_compar = 0xc0cb7d70 <vdev_queue_offset_compare>, avl_offset = 476, avl_numnodes = 0, avl_size = 732}}, { vqc_active = 0, vqc_queued_tree = {avl_root = 0x0, avl_compar = 0xc0cb7d70 <vdev_queue_offset_compare>, avl_offset = 476, avl_numnodes = 0, avl_size = 732}}, { vqc_active = 0, vqc_queued_tree = {avl_root = 0xc486c794, avl_compar = 0xc0cb7d70 <vdev_queue_offset_compare>, avl_offset = 476, avl_numnodes = 1, avl_size = 732}}, { vqc_active = 0, vqc_queued_tree = {avl_root = 0x0, avl_compar = 0xc0cb7d70 <vdev_queue_offset_compare>, avl_offset = 476, avl_numnodes = 0, avl_size = 732}}}, vq_active_tree = {avl_root = 0xc476fa70, avl_compar = 0xc0cb7d70 <vdev_queue_offset_compare>, avl_offset = 476, avl_numnodes = 1, avl_size = 732}, vq_last_offset = 5941966336, vq_io_complete_ts = 7702783485, vq_lock = {lock_object = { lo_name = 0xc0d6b0ff "vq->vq_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0}, sx_lock = 3290710800}} (kgdb) =
line number will be probably wrong, i had such cases with clang in 10.0 already if optimization is used. I am rebuilding kernel without optimalization now. =
Seems this didn't get through first time so resending. Hmm now that is odd as vq->vq_active_tree looks quite sane with avl_numnodes = 1, so theirs one outstanding zio in the active tree. What might be of interest here is the stack indicates your zio is a scrubbing read so there is something not quite right on the pool. Is there some where you can tar up: 1. The kernel dump 2. Your current sources 3. Your current kernel Then email me link privately to smh at freebsd.org, so I can have a look in more detail?
i have troubles to compile kernel with optimalization disabled to get accurate line where it crashes. i run into two problems while trying to compile ZFS without optimalization. kern/190101 kern/190103 =
for google and other freebsd users: To compile freebsd kernel and its modules without optimization use makeoptions DEBUG="-g -O0" in kernel config file. =
after recompiling kernel and zfs with -O0, i am have very different stacktrace. I cant get dump with your patch in this case. Can you identify which kernel component failed? its MPT driver? panic dblfault_handler cpu_search cpu_search_lowest sched_lowest sched_pickcpu sched_add intr_event_schedule_thread intr_event_handle intr_execute_handlers lapic_handle_intr Xapis_isr1 bus_space_write_4 mpt_write mpt_send_cmd mpt_execute_req bus_dmamap_load_ccb mpt_start mpt_action xpt_run_devq xpt_action_default scsi_action xpt_action dastart xpt_run_allocq xpt_schedule dareprobe g_disk_access g_access g_part_access g_access vdev_geom_attach_taster m_attach_taster vdev_geom_read_pool_label spa_generate_rootconf spa_import_rootpool zfs_mount vfs_domount_first vfs_domount vfs_donmount kernel_mount parse_mount vfs_mountroot_parse =
This looks like a possible stack exhaustion. -- Andriy Gapon
Silly question are you using i386 and not amd64 architecture? If so can your try adding the following to your kernel config: options KSTACK_PAGES=4 Regards Steve
yes its i386, i will recompile kernel and report results. =
with "options KSTACK_PAGES=4" in kernel config it do not panic anymore. Please commit patch for making it default value on i386 (32-bit) =
This is already noted in UPDATING. With i386 being an aging tech which really shouldn't really be used to run ZFS and that has machines with often small amounts of memory this should be left to those who do require it to build a custom kernel I'm afraid.
its not just about building custom kernel. GENERIC from 10-STABLE panics on i386 mount ZFS root too. Maybe ZFS should not be used on i386, but it should not panic system. If you are against changing default kernel configuration on i386 then add another warning to boot messages similar to this: ZFS WARNING: Recommended minimum kmem_size is 512MB; expect unstable behavior. Consider tuning vm.kmem_size and vm.kmem_size_max in /boot/loader.conf. =