Bug 229007

Summary: [zfs] kernel crashes on remove disk
Product: Base System Reporter: Sergei Masharov <serzh>
Component: kernAssignee: Alexander Motin <mav>
Status: Closed FIXED    
Severity: Affects Some People CC: allanjude, emaste, gldisater, maurizio1018, mav
Priority: ---    
Version: CURRENT   
Hardware: amd64   
OS: Any   

Description Sergei Masharov 2018-06-14 08:56:58 UTC
11.2-RC2

I tried to test new zpool device removal feature.

create new pool from one physical drive (zpool create ztmp ada2)
add additional drive to the pool (zpool add ztmp ada3)
remove first disk from the pool (zpool remove ztmp ada2)

system crashes and crashes again after reboot, until I remove these disks.

Also crashes if I try to import this pool later. Only works if I import this pool in read-only mode.
Comment 1 Sergei Masharov 2018-06-14 11:42:42 UTC
also crashes on files:

root@:/ # mount -t tmpfs tmpfs /tmp
root@:/ # dd if=/dev/zero of=/tmp/z1 bs=1G count=0 seek=1
0+0 records in
0+0 records out
0 bytes transferred in 0.000010 secs (0 bytes/sec)
root@:/ # dd if=/dev/zero of=/tmp/z2 bs=1G count=0 seek=1
0+0 records in
0+0 records out
0 bytes transferred in 0.000010 secs (0 bytes/sec)
root@:/ # zpool create zm /tmp/z1
root@:/ # zpool add zm /tmp/z2
root@:/ # zpool list -v zm
NAME         SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zm          1.88G   154K  1.87G        -         -     0%     0%  1.00x  ONLINE  -
  /tmp/z1    960M   145K   960M        -         -     0%     0%
  /tmp/z2    960M  8.50K   960M        -         -     0%     0%
root@:/ # zpool remove zm /tmp/z1
root@:/ # zpool list -v zm
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zm             960M   176K   960M        -         -     0%     0%  1.00x  ONLINE  -
  /tmp/z2      960M   176K   960M        -         -     0%     0%
root@:/ # dd if=/dev/zero of=/tmp/z0 bs=1M count=0 seek=100
0+0 records in
0+0 records out
0 bytes transferred in 0.000010 secs (0 bytes/sec)
root@:/ # zpool add zm /tmp/z0
root@:/ # zpool list -v zm
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zm            1.02G   138K  1.02G        -         -     0%     0%  1.00x  ONLINE  -
  /tmp/z2      960M   138K   960M        -         -     0%     0%
  /tmp/z0       80M      0    80M        -         -     0%     0%
root@:/ # zpool remove zm /tmp/z2
cannot remove /tmp/z2: out of space
root@:/ # zpool remove zm /tmp/z2
cannot remove /tmp/z2: out of space
root@:/ # zpool remove zm /tmp/z2
cannot remove /tmp/z2: out of space
root@:/ # zpool add zm /tmp/z1
root@:/ # zpool remove zm /tmp/z2

and system crashes
Comment 2 Jeremy Faulkner 2018-06-22 00:56:08 UTC
Also occurs on current (r335317):

db> trace
Tracing pid 0 tid 100536 td 0xfffff8001149b000
kdb_enter() at kdb_enter+0x3b/frame 0xfffffe004cfa8810
vpanic() at vpanic+0x1c0/frame 0xfffffe004cfa8870
panic() at panic+0x43/frame 0xfffffe004cfa88d0
assfail3() at assfail3+0x2c/frame 0xfffffe004cfa88f0
vdev_indirect_io_start() at vdev_indirect_io_start+0x6b/frame 0xfffffe004cfa8910
zio_vdev_io_start() at zio_vdev_io_start+0x4c6/frame 0xfffffe004cfa8970
zio_execute() at zio_execute+0x17c/frame 0xfffffe004cfa89c0
zio_nowait() at zio_nowait+0xc4/frame 0xfffffe004cfa89f0
vdev_mirror_io_start() at vdev_mirror_io_start+0x3fa/frame 0xfffffe004cfa8a70
zio_vdev_io_start() at zio_vdev_io_start+0x208/frame 0xfffffe004cfa8ad0
zio_execute() at zio_execute+0x17c/frame 0xfffffe004cfa8b20
taskqueue_run_locked() at taskqueue_run_locked+0x14c/frame 0xfffffe004cfa8b80
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe004cfa8bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe004cfa8bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004cfa8bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db>
Comment 3 Jeremy Faulkner 2018-08-17 14:53:36 UTC
FreeBSD 12.0-ALPHA2 #3 r337947: Fri Aug 17 10:29:32 EDT 2018

Starting file system checks:
panic: solaris assert: zio->io_type == ZIO_TYPE_WRITE (0x3 == 0x2), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c, line: 1267
cpuid = 3
time = 1534517459
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004d1f37e0
vpanic() at vpanic+0x1a3/frame 0xfffffe004d1f3840
panic() at panic+0x43/frame 0xfffffe004d1f38a0
assfail3() at assfail3+0x2c/frame 0xfffffe004d1f38c0
vdev_indirect_io_start() at vdev_indirect_io_start+0xae/frame 0xfffffe004d1f3900
zio_vdev_io_start() at zio_vdev_io_start+0x4b5/frame 0xfffffe004d1f3960
zio_execute() at zio_execute+0x18c/frame 0xfffffe004d1f39b0
zio_nowait() at zio_nowait+0xc4/frame 0xfffffe004d1f39e0
vdev_mirror_io_start() at vdev_mirror_io_start+0x42a/frame 0xfffffe004d1f3a70
zio_vdev_io_start() at zio_vdev_io_start+0x1fe/frame 0xfffffe004d1f3ad0
zio_execute() at zio_execute+0x18c/frame 0xfffffe004d1f3b20
taskqueue_run_locked() at taskqueue_run_locked+0x14c/frame 0xfffffe004d1f3b80
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe004d1f3bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe004d1f3bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004d1f3bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100538 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>
Comment 4 Jeremy Faulkner 2018-10-10 00:06:23 UTC
October 2018 OpenZFS Leadership Meeting listed mav@ as maintainer for zfs device removal. CC'd
Comment 5 Allan Jude freebsd_committer freebsd_triage 2018-10-10 02:26:07 UTC
I crashed differently.

Fatal trap 9: general protection fault while in kernel mode
cpuid = 3; apic id = 03
instruction pointer     = 0x20:0xffffffff827ccc3f
stack pointer           = 0x28:0xfffffe002d962390
frame pointer           = 0x28:0xfffffe002d962390
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (zio_write_issue_1)


#10 0xffffffff827ccc3f in vdev_queue_offset_compare (x1=0xfffff800843ea830, x2=0xdeadc0dedeadbe8e) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:320
        z1 = 0xfffff800843ea830
        z2 = 0xdeadc0dedeadbe8e
        cmp = <optimized out>
#11 0xffffffff826fc174 in avl_find (tree=0xfffff80005f28740, value=0xfffff800843ea830, where=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/common/avl/avl.c:268
        node = 0xfffff80003d98a80
        off = 592
        prev = 0xdeadc0dedeadc0de
        diff = <optimized out>
#12 avl_add (tree=0xfffff80005f28740, new_node=0xfffff800843ea830) at /usr/src/sys/cddl/contrib/opensolaris/common/avl/avl.c:644
        where = <optimized out>
#13 0xffffffff827cdbbc in vdev_queue_pending_add (vq=0xfffff80005f285b8, zio=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:472
        spa = <optimized out>
        __right = <optimized out>
        __left = <optimized out>


and

Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 01
instruction pointer     = 0x20:0xffffffff826fbd1f
stack pointer           = 0x28:0xfffffe002d8e09d8
frame pointer           = 0x28:0xfffffe002d8e0a10
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (zio_write_intr_3)


#10 0xffffffff826fbd1f in avl_rotation (tree=0xfffff8000d01f740, node=0xfffff80003d8b668, balance=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/common/avl/avl.c:379
        right = 0
        left = <optimized out>
        left_heavy = <optimized out>
        right_heavy = <optimized out>
        parent = 0xdeadc0dedeadc0d8
        child = 0xfffff80003dd9250
        which_child = 1
        child_bal = -1
        gchild = <optimized out>
        gleft = <optimized out>
        gright = <optimized out>
        cright = 0x0
#11 0xffffffff826fc3e3 in avl_remove (tree=0xfffff8000d01f740, data=0xfffff80006fd4830) at /usr/src/sys/cddl/contrib/opensolaris/common/avl/avl.c:818
        off = <optimized out>
        delete = <optimized out>
        left = <optimized out>
        right = <optimized out>
        node = 0xdeadc0dedeadc0d8
        parent = 0xfffff80003d8b668
        which_child = <optimized out>
        old_balance = <optimized out>
        new_balance = 0
        tmp = <optimized out>
#12 0xffffffff827cdd42 in vdev_queue_pending_remove (vq=0xfffff8000d01f5b8, zio=0xfffff80006fd4830) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:490
        spa = <optimized out>
        __right = <optimized out>
        __left = <optimized out>
#13 vdev_queue_io_done (zio=0xfffff80006fd4830) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:937
        vq = 0xfffff8000d01f5b8
        nio = <optimized out>
#14 0xffffffff827fe7f1 in zio_vdev_io_done (zio=0xfffff80006fd4830) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3366
        vd = 0xfffff8000d01f000
        unexpected_error = <error reading variable unexpected_error (Cannot access memory at address 0x0)>
        ops = <optimized out>
#15 0xffffffff827f903c in zio_execute (zio=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1784
        pipeline = <optimized out>
        stage = <optimized out>
Comment 6 Allan Jude freebsd_committer freebsd_triage 2018-10-11 04:25:00 UTC
Possibly related to two different crashes discussed in: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=228750
Comment 7 Sergei Masharov 2018-10-11 10:46:14 UTC
(In reply to Allan Jude from comment #6)
I think they are related. In my case I also see "indirect" devices in problem pools.
Comment 8 Allan Jude freebsd_committer freebsd_triage 2018-10-11 14:03:09 UTC
(In reply to Sergei Masharov from comment #7)
An indirect vdev is what you get once a remove has progressed.

Do you have a stack trace or crash dump from that point in the process?
Comment 9 Jeremy Faulkner 2018-10-11 15:58:28 UTC
tar zcvf core-dump-zfs-devrem.tar.gz /boot/kernel /usr/lib/debug/boot/kernel /var/crash/*.0

https://drive.google.com/open?id=1SUTm-hBgIhkkTmYNridU9zZqVMSIlA3T

root@devremoval:~ # uname -a
FreeBSD devremoval 12.0-ALPHA7 FreeBSD 12.0-ALPHA7 r338859 GENERIC  amd64

I haven't updated this bhyve vm recently but it's been unable to boot if three of it's drives are enabled.


panic: solaris assert: zio->io_type == ZIO_TYPE_WRITE (0x3 == 0x2), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c, line: 1267
cpuid = 1
time = 1539270204
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004d1677e0
vpanic() at vpanic+0x1a3/frame 0xfffffe004d167840
panic() at panic+0x43/frame 0xfffffe004d1678a0
assfail3() at assfail3+0x2c/frame 0xfffffe004d1678c0
vdev_indirect_io_start() at vdev_indirect_io_start+0xae/frame 0xfffffe004d167900
zio_vdev_io_start() at zio_vdev_io_start+0x4b5/frame 0xfffffe004d167960
zio_execute() at zio_execute+0x18c/frame 0xfffffe004d1679b0
zio_nowait() at zio_nowait+0xc4/frame 0xfffffe004d1679e0
vdev_mirror_io_start() at vdev_mirror_io_start+0x42a/frame 0xfffffe004d167a70
zio_vdev_io_start() at zio_vdev_io_start+0x1fe/frame 0xfffffe004d167ad0
zio_execute() at zio_execute+0x18c/frame 0xfffffe004d167b20
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfffffe004d167b80
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe004d167bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe004d167bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004d167bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100510 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>
Comment 10 Alexander Motin freebsd_committer freebsd_triage 2018-10-11 21:11:52 UTC
At least some those crashes (zio->io_type == ZIO_TYPE_WRITE (0x3 == 0x2)) are caused by lack of TRIM support in device removal code.  This patch fixes alike crashes for me: https://reviews.freebsd.org/D17523
Comment 11 Jeremy Faulkner 2018-10-12 00:58:58 UTC
The patch allowed my bhyve virtual machine to boot and import the pool that had previously had a device removed. I attempted to remove an additional device from the pool and got a new panic:

root@devremoval:~ # zpool status
  pool: devrem
 state: ONLINE
  scan: none requested
remove: Removal of vdev 2 copied 4.36G in 1659h54m, completed on Wed Aug 29 20:37:33 2018
    108K memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        devrem        ONLINE       0     0     0
          vtbd1       ONLINE       0     0     0
          vtbd2       ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          vtbd0p3   ONLINE       0     0     0

errors: No known data errors
root@devremoval:~ # zpool remove devrem vtbd2
panic: solaris assert: rc->rc_count == number, file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c, lin              e: 94
cpuid = 1
time = 1539304693
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004d13a8f0
vpanic() at vpanic+0x1a3/frame 0xfffffe004d13a950
panic() at panic+0x43/frame 0xfffffe004d13a9b0
assfail() at assfail+0x1a/frame 0xfffffe004d13a9c0
refcount_destroy_many() at refcount_destroy_many+0x2b/frame 0xfffffe004d13a9f0
abd_free() at abd_free+0x18d/frame 0xfffffe004d13aa20
spa_vdev_copy_segment_write_done() at spa_vdev_copy_segment_write_done+0x20/frame 0xfffffe004d13aa50
zio_done() at zio_done+0xf21/frame 0xfffffe004d13aad0
zio_execute() at zio_execute+0x18c/frame 0xfffffe004d13ab20
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfffffe004d13ab80
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe004d13abb0
fork_exit() at fork_exit+0x84/frame 0xfffffe004d13abf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004d13abf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100501 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db>


 tar zcvf core-dump-zfs-devrem-2.tar.gz /boot/kernel /usr/lib/debug/boot/kernel /var/crash/*.1

https://drive.google.com/open?id=1eBBJC3s3CnHRCQL8dNxEIBfNyAVy5S9K
Comment 12 Alexander Motin freebsd_committer freebsd_triage 2018-10-12 01:14:42 UTC
This is a different issue.  I reproduced it too, but still thinking about proper solution.  Quick and dirty patch looks like this:

--- zio.c       (revision 339291)
+++ zio.c       (working copy)
@@ -1180,7 +1180,8 @@
        }

        /* Not all IO types require vdev io done stage e.g. free */
-       if (!(pio->io_pipeline & ZIO_STAGE_VDEV_IO_DONE))
+       if (type == pio->io_type &&
+           !(pio->io_pipeline & ZIO_STAGE_VDEV_IO_DONE))
                pipeline &= ~ZIO_STAGE_VDEV_IO_DONE;

        if (vd->vdev_ops->vdev_op_leaf) {
Comment 13 commit-hook freebsd_committer freebsd_triage 2018-10-12 15:14:30 UTC
A commit references this bug:

Author: mav
Date: Fri Oct 12 15:14:22 UTC 2018
New revision: 339329
URL: https://svnweb.freebsd.org/changeset/base/339329

Log:
  Add ZIO_TYPE_FREE support for indirect vdevs.

  Upstream code expects only ZIO_TYPE_READ and some ZIO_TYPE_WRITE
  requests to removed (indirect) vdevs, while on FreeBSD there is also
  ZIO_TYPE_FREE (TRIM).  ZIO_TYPE_FREE requests do not have the data
  buffers, so don't need the pointer adjustment.

  PR:		228750, 229007
  Reviewed by:	allanjude, sef
  Approved by:	re (kib)
  MFC after:	1 week
  Differential Revision:	https://reviews.freebsd.org/D17523

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c
Comment 14 commit-hook freebsd_committer freebsd_triage 2018-10-15 21:59:45 UTC
A commit references this bug:

Author: mav
Date: Mon Oct 15 21:59:25 UTC 2018
New revision: 339372
URL: https://svnweb.freebsd.org/changeset/base/339372

Log:
  Skip VDEV_IO_DONE stage only for ZIO_TYPE_FREE.

  Device removal code uses zio_vdev_child_io() with ZIO_TYPE_NULL parent,
  that never happened before.  It confused FreeBSD-specific TRIM code,
  which does not use VDEV_IO_DONE for logical ZIO_TYPE_FREE ZIOs.  As
  result of that stage being skipped device removal ZIOs leaked references
  and memory that supposed to be freed by VDEV_IO_DONE, making it stuck.

  It is a quick patch rather then a nice fix, but hopefully we'll be able
  to drop it all together when alternative TRIM implementation finally get
  landed.

  PR:		228750, 229007
  Discussed with:	allanjude, avg, smh
  Approved by:	re (delphij)
  MFC after:	5 days
  Sponsored by:	iXsystems, Inc.

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
Comment 15 Alexander Motin freebsd_committer freebsd_triage 2018-10-15 22:03:54 UTC
I believe ZFS device removal should now work in FreeBSD head.
Comment 16 Sergei Masharov 2018-10-16 14:29:02 UTC
When will be the patch for 11.2-RELEASE?
Comment 17 commit-hook freebsd_committer freebsd_triage 2018-10-19 04:31:30 UTC
A commit references this bug:

Author: mav
Date: Fri Oct 19 04:30:26 UTC 2018
New revision: 339440
URL: https://svnweb.freebsd.org/changeset/base/339440

Log:
  MFC r339329: Add ZIO_TYPE_FREE support for indirect vdevs.

  Upstream code expects only ZIO_TYPE_READ and some ZIO_TYPE_WRITE
  requests to removed (indirect) vdevs, while on FreeBSD there is also
  ZIO_TYPE_FREE (TRIM).  ZIO_TYPE_FREE requests do not have the data
  buffers, so don't need the pointer adjustment.

  PR:	228750, 229007

Changes:
_U  stable/11/
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c
Comment 18 commit-hook freebsd_committer freebsd_triage 2018-10-19 04:37:38 UTC
A commit references this bug:

Author: mav
Date: Fri Oct 19 04:37:28 UTC 2018
New revision: 339441
URL: https://svnweb.freebsd.org/changeset/base/339441

Log:
  MFC r339372: Skip VDEV_IO_DONE stage only for ZIO_TYPE_FREE.

  Device removal code uses zio_vdev_child_io() with ZIO_TYPE_NULL parent,
  that never happened before.  It confused FreeBSD-specific TRIM code,
  which does not use VDEV_IO_DONE for logical ZIO_TYPE_FREE ZIOs.  As
  result of that stage being skipped device removal ZIOs leaked references
  and memory that supposed to be freed by VDEV_IO_DONE, making it stuck.

  It is a quick patch rather then a nice fix, but hopefully we'll be able
  to drop it all together when alternative TRIM implementation finally get
  landed.

  PR:	228750, 229007

Changes:
_U  stable/11/
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c