Bug 188187 - [zfs] [panic] 10-stable: Kernel panic on zpool import: integer divide fault
Summary: [zfs] [panic] 10-stable: Kernel panic on zpool import: integer divide fault
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-02 07:40 UTC by Gleb Popov
Modified: 2018-05-29 09:52 UTC (History)
2 users (show)

See Also:


Attachments
early dump dev (3.39 KB, patch)
2014-11-10 22:51 UTC, Steven Hartland
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gleb Popov freebsd_committer 2014-04-02 07:40:00 UTC
I have root on ZFS and using 10-STABLE. Somewhere from March 15 i started to get following panic when mounting root:

Fatal trap 18: integer divide fault while in kernel mode
Fatal trap 18: integer divide fault while in kernel mode
cpuid = 0; cpuid = 1; apic id = 00
apic id = 01
instruction pointer     = 0x20:0xffffffff81a7f92d
instruction pointer     = 0x28:0xffffffff81a7f92d
stack pointer           = 0x28:0xfffffe0097a8b100
stack pointer           = 0x28:0xfffffe0097a12ba0
frame pointer           = 0x28:0xfffffe0097a8b120
frame pointer           = 0x28:0xfffffe0097a12bc0
code segment            = base rx0, limit 0xfffff, type 0x1b
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = processor eflags       = interrupt enabled, interrupt enabled, resume, resume, IOPL = 0
IOPL = 0
current process         = current process        = 0 (system_taskq_1) 1198 (zpool)
trap number             = 18
tpanic: integer divide fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80910f30 at kdb_backtrace+0x60
#1 0xffffffff808d6645 at panic+0x155
#2 0xffffffff80cc9a9f at trap_fatal+0x38f
#3 0xffffffff80cc96fc at trap+0x72c
#4 0xffffffff80cb04f2 at calltrap+0x8
#5 0xffffffff81a8c4e6 at vdev_mirror_io_start+0x146
#6 0xffffffff81aa7205 at zio_vdev_io_start+0x185
#7 0xffffffff81aa4352 at zio_execute+0x162
#8 0xffffffff81a2f83d at arc_read+0x87d
#9 0xffffffff81a4917e at traverse_prefetcher+0x13e
#10 0xffffffff81a48580 at traverse_visitbp+0x270
#11 0xffffffff81a48749 at traverse_visitbp+0x439
#12 0xffffffff81a48fa3 at traverse_dnode+0x83
#13 0xffffffff81a48d45 at traverse_visitbp+0xa35
#14 0xffffffff81a48749 at traverse_visitbp+0x439
#15 0xffffffff81a48fa3 at traverse_dnode+0x83
#16 0xffffffff81a48a0c at traverse_visitbp+0x6fc
#17 0xffffffff81a482b6 at traverse_prefetch_thread+0x96

I've downloaded latest snapshot of 10-STABLE, booted it and ran

~ # zpool import -fR /mnt zroot

The error was same.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2014-04-02 10:47:24 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Gleb Popov freebsd_committer 2014-04-30 11:17:31 UTC
Recent 11-CURRENT snapshot also panics with

solaris assertion: zio->io_error != 0 failed
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c,
line 576
Comment 3 Dan Milgram 2014-11-08 20:43:48 UTC
I'm seeing the same exact panic error. I'm wondering if there's any workaround to recover from this, as I can not import the zpool when trying to do so from LiveCD. Also, running "zdb -e -bcsvL zroot" seg faults.

My system is FreeBSD 10.0 Release on an HP Proliant ML110 G7, with ZFS on root.
The pool consists of 2 mirrored hard drives 500GB.
Comment 4 Steven Hartland freebsd_committer 2014-11-08 22:05:25 UTC
Can you use kdgb to print out the following with pretty print on:-
1. zio
2. each mirror_child_t entry in mm
Comment 5 Steven Hartland freebsd_committer 2014-11-08 22:07:06 UTC
(In reply to 6yearold from comment #0)
> I have root on ZFS and using 10-STABLE. Somewhere from March 15 i started to
> get following panic when mounting root:
> 
> Fatal trap 18: integer divide fault while in kernel mode
> Fatal trap 18: integer divide fault while in kernel mode
> cpuid = 0; cpuid = 1; apic id = 00
> apic id = 01
> instruction pointer     = 0x20:0xffffffff81a7f92d
> instruction pointer     = 0x28:0xffffffff81a7f92d
> stack pointer           = 0x28:0xfffffe0097a8b100
> stack pointer           = 0x28:0xfffffe0097a12ba0
> frame pointer           = 0x28:0xfffffe0097a8b120
> frame pointer           = 0x28:0xfffffe0097a12bc0
> code segment            = base rx0, limit 0xfffff, type 0x1b
> code segment            = base rx0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = processor eflags       = interrupt enabled,
> interrupt enabled, resume, resume, IOPL = 0
> IOPL = 0
> current process         = current process        = 0 (system_taskq_1) 1198
> (zpool)
> trap number             = 18
> tpanic: integer divide fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff80910f30 at kdb_backtrace+0x60
> #1 0xffffffff808d6645 at panic+0x155
> #2 0xffffffff80cc9a9f at trap_fatal+0x38f
> #3 0xffffffff80cc96fc at trap+0x72c
> #4 0xffffffff80cb04f2 at calltrap+0x8
> #5 0xffffffff81a8c4e6 at vdev_mirror_io_start+0x146
> #6 0xffffffff81aa7205 at zio_vdev_io_start+0x185
> #7 0xffffffff81aa4352 at zio_execute+0x162
> #8 0xffffffff81a2f83d at arc_read+0x87d
> #9 0xffffffff81a4917e at traverse_prefetcher+0x13e
> #10 0xffffffff81a48580 at traverse_visitbp+0x270
> #11 0xffffffff81a48749 at traverse_visitbp+0x439
> #12 0xffffffff81a48fa3 at traverse_dnode+0x83
> #13 0xffffffff81a48d45 at traverse_visitbp+0xa35
> #14 0xffffffff81a48749 at traverse_visitbp+0x439
> #15 0xffffffff81a48fa3 at traverse_dnode+0x83
> #16 0xffffffff81a48a0c at traverse_visitbp+0x6fc
> #17 0xffffffff81a482b6 at traverse_prefetch_thread+0x96
> 
> I've downloaded latest snapshot of 10-STABLE, booted it and ran
> 
> ~ # zpool import -fR /mnt zroot
> 
> The error was same.

please provide the output from the kernel dump which includes line numbers.
Comment 6 Dan Milgram 2014-11-10 22:16:56 UTC
Sorry, Iā€™m not sure how to do that as I have a ZFS on root install. The kernel panic happens immediately upon trying to mount the filesystem during startup.
When booting from an OpenIndiana live CD, the kernel panics too. Also, inserting the hard drives into another computer produces the same outcome.

If you have any tips with respect to providing debugging info, or data recovery from the pool :),  please let me know.
Comment 7 Steven Hartland freebsd_committer 2014-11-10 22:51:05 UTC
Created attachment 149271 [details]
early dump dev

If you can build a kernel on another machine you can use this patch to allow you to configure the dump device via a tunable which will available prior to mounting root.

From the loader prompt run:
set kern.shutdown.dumpdev_early=<dev>
e.g.
set kern.shutdown.dumpdev_early=/dev/ada0p3

This should be set to either an empty partition / disk or your swap partition.

After that boot using:
boot

It will go through the usual process but should also dump the panic to the device listed above.

Once you have the panic on disk you will need to boot with something like mfsbsd (which doesn't panic). After booted edit /etc/rc.conf and add:
dumpdev="AUTO"

Then run:
/etc/rc.d/dumpon start

If the kernel your running is not the same as the kernel which created the dump ensure you copy that kernel to /boot/kernel/ including all symbols.

Now run:
/etc/rc.d/savecore start

This will copy the crash off the dump device to /var/crash/ and perform a trace on it which you can copy off along with the kernel to perform additional debugging on.

Additional info which may well be helpful is the output from:
zdb -l <pool device>
e.g. zdb -l /dev/ada0p2

Hope this all make sense.
Comment 8 Steven Hartland freebsd_committer 2014-11-10 22:56:45 UTC
Correction the set line shouldn't include /dev/ e.g.
set kern.shutdown.dumpdev_early=ada0p3
Comment 9 Dan Milgram 2014-11-10 23:09:55 UTC
I will try to get the first part done tomorrow. Second part (i.e. zdb -l output) is a little easier so here it is:

root@:~ # zpool import
   pool: zroot
     id: 8676718972300662759
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
	the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

	zroot                                           ONLINE
	  mirror-0                                      ONLINE
	    gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a  ONLINE
	    gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a  ONLINE


root@:~ # zdb -l /dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 54060
    pool_guid: 8676718972300662759
    hostid: 4131752584
    hostname: ''
    top_guid: 8851846128134668664
    guid: 12362166153701700454
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 8851846128134668664
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 495807365120
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12362166153701700454
            path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 497212309658488509
            path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 1
--------------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 54060
    pool_guid: 8676718972300662759
    hostid: 4131752584
    hostname: ''
    top_guid: 8851846128134668664
    guid: 12362166153701700454
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 8851846128134668664
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 495807365120
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12362166153701700454
            path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 497212309658488509
            path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 54060
    pool_guid: 8676718972300662759
    hostid: 4131752584
    hostname: ''
    top_guid: 8851846128134668664
    guid: 12362166153701700454
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 8851846128134668664
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 495807365120
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12362166153701700454
            path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 497212309658488509
            path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 54060
    pool_guid: 8676718972300662759
    hostid: 4131752584
    hostname: ''
    top_guid: 8851846128134668664
    guid: 12362166153701700454
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 8851846128134668664
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 495807365120
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12362166153701700454
            path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 497212309658488509
            path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
    features_for_read:



root@:~ # zdb -l /dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 54060
    pool_guid: 8676718972300662759
    hostid: 4131752584
    hostname: ''
    top_guid: 8851846128134668664
    guid: 497212309658488509
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 8851846128134668664
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 495807365120
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12362166153701700454
            path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 497212309658488509
            path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 1
--------------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 54060
    pool_guid: 8676718972300662759
    hostid: 4131752584
    hostname: ''
    top_guid: 8851846128134668664
    guid: 497212309658488509
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 8851846128134668664
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 495807365120
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12362166153701700454
            path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 497212309658488509
            path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 54060
    pool_guid: 8676718972300662759
    hostid: 4131752584
    hostname: ''
    top_guid: 8851846128134668664
    guid: 497212309658488509
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 8851846128134668664
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 495807365120
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12362166153701700454
            path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 497212309658488509
            path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 5000
    name: 'zroot'
    state: 0
    txg: 54060
    pool_guid: 8676718972300662759
    hostid: 4131752584
    hostname: ''
    top_guid: 8851846128134668664
    guid: 497212309658488509
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 8851846128134668664
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 495807365120
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12362166153701700454
            path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b1f0a207-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 497212309658488509
            path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            phys_path: '/dev/gptid/b260cfdf-641c-11e4-b9e7-28924a36bc5a'
            whole_disk: 1
            create_txg: 4
    features_for_read:
Comment 10 Steven Hartland freebsd_committer 2014-11-10 23:27:13 UTC
Alternative to applying the patch and building is if you have a recent stable/10 or head you can boot then you can use:
set dumpdev=<device>

Thanks to emaste for pointing this out.
Comment 11 Dan Milgram 2014-11-11 20:35:54 UTC
Having trouble getting this info since I'm not running STABLE and I can't see a reasonably quick way to get the sources. I see now the bug title indicates STABLE, but I'm on 10.0 RELEASE.

In case helps, when I run zdb on an MfBSD image I now get a source code reference to where it fails exactly:

root@mfsbsd:~ # zdb -b -e zroot

Traversing all blocks to verify nothing leaked ...

6.95G completed ( 228MB/s) estimated time remaining: 0hr 00min 19sec        Assertion failed: (bp->blk_pad[0] == 0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 2978.
Abort (core dumped)
Comment 12 Steven Hartland freebsd_committer 2014-11-11 22:13:07 UTC
(In reply to Dan Milgram from comment #11)
> Having trouble getting this info since I'm not running STABLE and I can't
> see a reasonably quick way to get the sources. I see now the bug title
> indicates STABLE, but I'm on 10.0 RELEASE.
> 
> In case helps, when I run zdb on an MfBSD image I now get a source code
> reference to where it fails exactly:
> 
> root@mfsbsd:~ # zdb -b -e zroot
> 
> Traversing all blocks to verify nothing leaked ...
> 
> 6.95G completed ( 228MB/s) estimated time remaining: 0hr 00min 19sec       
> Assertion failed: (bp->blk_pad[0] == 0), file
> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/
> fs/zfs/zio.c, line 2978.
> Abort (core dumped)

there's at least one change that directly effects this assert in 10.1 so I would recommend upgrading to at least that or stable/10 to check if its already fixed.
Comment 13 Dan Milgram 2014-11-12 18:02:15 UTC
Thanks, I tried w/ FreeBSD 10.1 RC4 - fails w/ the same assert at essentially the same place (i.e. different LoC due to whatever other changes were made but same function, and same relative line within)

6.90G completed ( 233MB/s) estimated time remaining: 0hr 00min 18sec        Assertion failed: (bp->blk_pad[0] == 0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 3088.
Abort
Comment 14 Steven Hartland freebsd_committer 2014-11-12 18:24:15 UTC
(In reply to Dan Milgram from comment #13)
> Thanks, I tried w/ FreeBSD 10.1 RC4 - fails w/ the same assert at
> essentially the same place (i.e. different LoC due to whatever other changes
> were made but same function, and same relative line within)
> 
> 6.90G completed ( 233MB/s) estimated time remaining: 0hr 00min 18sec       
> Assertion failed: (bp->blk_pad[0] == 0), file
> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/
> fs/zfs/zio.c, line 3088.
> Abort

That's basically saying something wrote the padding bytes which is unexpected so either a coding error or something nasty is happening and the data in that memory location is being overwritten by something else.

You might want to run the zdb via gdb and put a break point at that location. When it trips output the state of the zio. Might help show what's going on.
Comment 15 Dan Milgram 2014-11-12 21:15:07 UTC
Thanks very much for your help! I re-attempted another import on the zpool after zdb failed, and 10.1 RC4 seems to have done the trick for me. It imported without problem, and zpool status doesn't show any errors. 

This may be moot at at this point but trying to run zdb through gdb doesn't seem to work for me (no debugging symbols). Not sure if I'm just not running it correctly, or if I would have to build a custom zdb that includes debugging info. If there's any info I can provide that would be of value though, let me know though.
Comment 16 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:47:13 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 17 Andriy Gapon freebsd_committer 2018-05-29 09:52:32 UTC
Seems to be fixed per comment #15 ?