I have root on ZFS and using 10-STABLE. Somewhere from March 15 i started to get following panic when mounting root: Fatal trap 18: integer divide fault while in kernel mode Fatal trap 18: integer divide fault while in kernel mode cpuid = 0; cpuid = 1; apic id = 00 apic id = 01 instruction pointer = 0x20:0xffffffff81a7f92d instruction pointer = 0x28:0xffffffff81a7f92d stack pointer = 0x28:0xfffffe0097a8b100 stack pointer = 0x28:0xfffffe0097a12ba0 frame pointer = 0x28:0xfffffe0097a8b120 frame pointer = 0x28:0xfffffe0097a12bc0 code segment = base rx0, limit 0xfffff, type 0x1b code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = processor eflags = interrupt enabled, interrupt enabled, resume, resume, IOPL = 0 IOPL = 0 current process = current process = 0 (system_taskq_1) 1198 (zpool) trap number = 18 tpanic: integer divide fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff80910f30 at kdb_backtrace+0x60 #1 0xffffffff808d6645 at panic+0x155 #2 0xffffffff80cc9a9f at trap_fatal+0x38f #3 0xffffffff80cc96fc at trap+0x72c #4 0xffffffff80cb04f2 at calltrap+0x8 #5 0xffffffff81a8c4e6 at vdev_mirror_io_start+0x146 #6 0xffffffff81aa7205 at zio_vdev_io_start+0x185 #7 0xffffffff81aa4352 at zio_execute+0x162 #8 0xffffffff81a2f83d at arc_read+0x87d #9 0xffffffff81a4917e at traverse_prefetcher+0x13e #10 0xffffffff81a48580 at traverse_visitbp+0x270 #11 0xffffffff81a48749 at traverse_visitbp+0x439 #12 0xffffffff81a48fa3 at traverse_dnode+0x83 #13 0xffffffff81a48d45 at traverse_visitbp+0xa35 #14 0xffffffff81a48749 at traverse_visitbp+0x439 #15 0xffffffff81a48fa3 at traverse_dnode+0x83 #16 0xffffffff81a48a0c at traverse_visitbp+0x6fc #17 0xffffffff81a482b6 at traverse_prefetch_thread+0x96 I've downloaded latest snapshot of 10-STABLE, booted it and ran ~ # zpool import -fR /mnt zroot The error was same.
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
Recent 11-CURRENT snapshot also panics with solaris assertion: zio->io_error != 0 failed at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c, line 576
I'm seeing the same exact panic error. I'm wondering if there's any workaround to recover from this, as I can not import the zpool when trying to do so from LiveCD. Also, running "zdb -e -bcsvL zroot" seg faults. My system is FreeBSD 10.0 Release on an HP Proliant ML110 G7, with ZFS on root. The pool consists of 2 mirrored hard drives 500GB.
Can you use kdgb to print out the following with pretty print on:- 1. zio 2. each mirror_child_t entry in mm
(In reply to 6yearold from comment #0) > I have root on ZFS and using 10-STABLE. Somewhere from March 15 i started to > get following panic when mounting root: > > Fatal trap 18: integer divide fault while in kernel mode > Fatal trap 18: integer divide fault while in kernel mode > cpuid = 0; cpuid = 1; apic id = 00 > apic id = 01 > instruction pointer = 0x20:0xffffffff81a7f92d > instruction pointer = 0x28:0xffffffff81a7f92d > stack pointer = 0x28:0xfffffe0097a8b100 > stack pointer = 0x28:0xfffffe0097a12ba0 > frame pointer = 0x28:0xfffffe0097a8b120 > frame pointer = 0x28:0xfffffe0097a12bc0 > code segment = base rx0, limit 0xfffff, type 0x1b > code segment = base rx0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = processor eflags = interrupt enabled, > interrupt enabled, resume, resume, IOPL = 0 > IOPL = 0 > current process = current process = 0 (system_taskq_1) 1198 > (zpool) > trap number = 18 > tpanic: integer divide fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff80910f30 at kdb_backtrace+0x60 > #1 0xffffffff808d6645 at panic+0x155 > #2 0xffffffff80cc9a9f at trap_fatal+0x38f > #3 0xffffffff80cc96fc at trap+0x72c > #4 0xffffffff80cb04f2 at calltrap+0x8 > #5 0xffffffff81a8c4e6 at vdev_mirror_io_start+0x146 > #6 0xffffffff81aa7205 at zio_vdev_io_start+0x185 > #7 0xffffffff81aa4352 at zio_execute+0x162 > #8 0xffffffff81a2f83d at arc_read+0x87d > #9 0xffffffff81a4917e at traverse_prefetcher+0x13e > #10 0xffffffff81a48580 at traverse_visitbp+0x270 > #11 0xffffffff81a48749 at traverse_visitbp+0x439 > #12 0xffffffff81a48fa3 at traverse_dnode+0x83 > #13 0xffffffff81a48d45 at traverse_visitbp+0xa35 > #14 0xffffffff81a48749 at traverse_visitbp+0x439 > #15 0xffffffff81a48fa3 at traverse_dnode+0x83 > #16 0xffffffff81a48a0c at traverse_visitbp+0x6fc > #17 0xffffffff81a482b6 at traverse_prefetch_thread+0x96 > > I've downloaded latest snapshot of 10-STABLE, booted it and ran > > ~ # zpool import -fR /mnt zroot > > The error was same. please provide the output from the kernel dump which includes line numbers.
Sorry, Iām not sure how to do that as I have a ZFS on root install. The kernel panic happens immediately upon trying to mount the filesystem during startup. When booting from an OpenIndiana live CD, the kernel panics too. Also, inserting the hard drives into another computer produces the same outcome. If you have any tips with respect to providing debugging info, or data recovery from the pool :), please let me know.
Created attachment 149271 [details] early dump dev If you can build a kernel on another machine you can use this patch to allow you to configure the dump device via a tunable which will available prior to mounting root. From the loader prompt run: set kern.shutdown.dumpdev_early=<dev> e.g. set kern.shutdown.dumpdev_early=/dev/ada0p3 This should be set to either an empty partition / disk or your swap partition. After that boot using: boot It will go through the usual process but should also dump the panic to the device listed above. Once you have the panic on disk you will need to boot with something like mfsbsd (which doesn't panic). After booted edit /etc/rc.conf and add: dumpdev="AUTO" Then run: /etc/rc.d/dumpon start If the kernel your running is not the same as the kernel which created the dump ensure you copy that kernel to /boot/kernel/ including all symbols. Now run: /etc/rc.d/savecore start This will copy the crash off the dump device to /var/crash/ and perform a trace on it which you can copy off along with the kernel to perform additional debugging on. Additional info which may well be helpful is the output from: zdb -l <pool device> e.g. zdb -l /dev/ada0p2 Hope this all make sense.
Correction the set line shouldn't include /dev/ e.g. set kern.shutdown.dumpdev_early=ada0p3
I will try to get the first part done tomorrow. Second part (i.e. zdb -l output) is a little easier so here it is: root@:~ # zpool import pool: zroot id: 8676718972300662759 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://illumos.org/msg/ZFS-8000-EY config: zroot ONLINE mirror-0 ONLINE gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a ONLINE gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a ONLINE root@:~ # zdb -l /dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a -------------------------------------------- LABEL 0 -------------------------------------------- version: 5000 name: 'zroot' state: 0 txg: 54060 pool_guid: 8676718972300662759 hostid: 4131752584 hostname: '' top_guid: 8851846128134668664 guid: 12362166153701700454 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 8851846128134668664 metaslab_array: 33 metaslab_shift: 32 ashift: 12 asize: 495807365120 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 12362166153701700454 path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 497212309658488509 path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 features_for_read: -------------------------------------------- LABEL 1 -------------------------------------------- version: 5000 name: 'zroot' state: 0 txg: 54060 pool_guid: 8676718972300662759 hostid: 4131752584 hostname: '' top_guid: 8851846128134668664 guid: 12362166153701700454 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 8851846128134668664 metaslab_array: 33 metaslab_shift: 32 ashift: 12 asize: 495807365120 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 12362166153701700454 path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 497212309658488509 path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 features_for_read: -------------------------------------------- LABEL 2 -------------------------------------------- version: 5000 name: 'zroot' state: 0 txg: 54060 pool_guid: 8676718972300662759 hostid: 4131752584 hostname: '' top_guid: 8851846128134668664 guid: 12362166153701700454 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 8851846128134668664 metaslab_array: 33 metaslab_shift: 32 ashift: 12 asize: 495807365120 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 12362166153701700454 path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 497212309658488509 path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 features_for_read: -------------------------------------------- LABEL 3 -------------------------------------------- version: 5000 name: 'zroot' state: 0 txg: 54060 pool_guid: 8676718972300662759 hostid: 4131752584 hostname: '' top_guid: 8851846128134668664 guid: 12362166153701700454 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 8851846128134668664 metaslab_array: 33 metaslab_shift: 32 ashift: 12 asize: 495807365120 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 12362166153701700454 path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 497212309658488509 path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 features_for_read: root@:~ # zdb -l /dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a -------------------------------------------- LABEL 0 -------------------------------------------- version: 5000 name: 'zroot' state: 0 txg: 54060 pool_guid: 8676718972300662759 hostid: 4131752584 hostname: '' top_guid: 8851846128134668664 guid: 497212309658488509 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 8851846128134668664 metaslab_array: 33 metaslab_shift: 32 ashift: 12 asize: 495807365120 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 12362166153701700454 path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 497212309658488509 path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 features_for_read: -------------------------------------------- LABEL 1 -------------------------------------------- version: 5000 name: 'zroot' state: 0 txg: 54060 pool_guid: 8676718972300662759 hostid: 4131752584 hostname: '' top_guid: 8851846128134668664 guid: 497212309658488509 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 8851846128134668664 metaslab_array: 33 metaslab_shift: 32 ashift: 12 asize: 495807365120 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 12362166153701700454 path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 497212309658488509 path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 features_for_read: -------------------------------------------- LABEL 2 -------------------------------------------- version: 5000 name: 'zroot' state: 0 txg: 54060 pool_guid: 8676718972300662759 hostid: 4131752584 hostname: '' top_guid: 8851846128134668664 guid: 497212309658488509 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 8851846128134668664 metaslab_array: 33 metaslab_shift: 32 ashift: 12 asize: 495807365120 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 12362166153701700454 path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 497212309658488509 path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 features_for_read: -------------------------------------------- LABEL 3 -------------------------------------------- version: 5000 name: 'zroot' state: 0 txg: 54060 pool_guid: 8676718972300662759 hostid: 4131752584 hostname: '' top_guid: 8851846128134668664 guid: 497212309658488509 vdev_children: 1 vdev_tree: type: 'mirror' id: 0 guid: 8851846128134668664 metaslab_array: 33 metaslab_shift: 32 ashift: 12 asize: 495807365120 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 12362166153701700454 path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 497212309658488509 path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a' whole_disk: 1 create_txg: 4 features_for_read:
Alternative to applying the patch and building is if you have a recent stable/10 or head you can boot then you can use: set dumpdev=<device> Thanks to emaste for pointing this out.
Having trouble getting this info since I'm not running STABLE and I can't see a reasonably quick way to get the sources. I see now the bug title indicates STABLE, but I'm on 10.0 RELEASE. In case helps, when I run zdb on an MfBSD image I now get a source code reference to where it fails exactly: root@mfsbsd:~ # zdb -b -e zroot Traversing all blocks to verify nothing leaked ... 6.95G completed ( 228MB/s) estimated time remaining: 0hr 00min 19sec Assertion failed: (bp->blk_pad[0] == 0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 2978. Abort (core dumped)
(In reply to Dan Milgram from comment #11) > Having trouble getting this info since I'm not running STABLE and I can't > see a reasonably quick way to get the sources. I see now the bug title > indicates STABLE, but I'm on 10.0 RELEASE. > > In case helps, when I run zdb on an MfBSD image I now get a source code > reference to where it fails exactly: > > root@mfsbsd:~ # zdb -b -e zroot > > Traversing all blocks to verify nothing leaked ... > > 6.95G completed ( 228MB/s) estimated time remaining: 0hr 00min 19sec > Assertion failed: (bp->blk_pad[0] == 0), file > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/ > fs/zfs/zio.c, line 2978. > Abort (core dumped) there's at least one change that directly effects this assert in 10.1 so I would recommend upgrading to at least that or stable/10 to check if its already fixed.
Thanks, I tried w/ FreeBSD 10.1 RC4 - fails w/ the same assert at essentially the same place (i.e. different LoC due to whatever other changes were made but same function, and same relative line within) 6.90G completed ( 233MB/s) estimated time remaining: 0hr 00min 18sec Assertion failed: (bp->blk_pad[0] == 0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 3088. Abort
(In reply to Dan Milgram from comment #13) > Thanks, I tried w/ FreeBSD 10.1 RC4 - fails w/ the same assert at > essentially the same place (i.e. different LoC due to whatever other changes > were made but same function, and same relative line within) > > 6.90G completed ( 233MB/s) estimated time remaining: 0hr 00min 18sec > Assertion failed: (bp->blk_pad[0] == 0), file > /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/ > fs/zfs/zio.c, line 3088. > Abort That's basically saying something wrote the padding bytes which is unexpected so either a coding error or something nasty is happening and the data in that memory location is being overwritten by something else. You might want to run the zdb via gdb and put a break point at that location. When it trips output the state of the zio. Might help show what's going on.
Thanks very much for your help! I re-attempted another import on the zpool after zdb failed, and 10.1 RC4 seems to have done the trick for me. It imported without problem, and zpool status doesn't show any errors. This may be moot at at this point but trying to run zdb through gdb doesn't seem to work for me (no debugging symbols). Not sure if I'm just not running it correctly, or if I would have to build a custom zdb that includes debugging info. If there's any info I can provide that would be of value though, let me know though.
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Seems to be fixed per comment #15 ?