Bug 228750 - panic on zfs mirror removal
Summary: panic on zfs mirror removal
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: Alexander Motin
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2018-06-04 18:24 UTC by Roger Hammerstein
Modified: 2019-09-04 23:04 UTC (History)
9 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roger Hammerstein 2018-06-04 18:24:20 UTC
11.2-beta2 and 11.2-RC1

steps to reproduce:

zpool create test mirror da1 da2 mirror da3 d4 mirror da5 da6

zpool remove test mirror-1

it then panics.

it also seems to be panic after reboot when trying to import it,
or scrub it, or destroy it.



# kgdb /boot/kernel/kernel /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: page fault
cpuid = 8
KDB: stack backtrace:
#0 0xffffffff80b3d407 at kdb_backtrace+0x67
#1 0xffffffff80af6a77 at vpanic+0x177
#2 0xffffffff80af68f3 at panic+0x43
#3 0xffffffff80f77f6f at trap_fatal+0x35f
#4 0xffffffff80f77fc9 at trap_pfault+0x49
#5 0xffffffff80f77797 at trap+0x2c7
#6 0xffffffff80f5744c at calltrap+0x8
#7 0xffffffff824f01d7 at vdev_indirect_io_start_cb+0x37
#8 0xffffffff824efe58 at vdev_indirect_remap+0x2f8
#9 0xffffffff824efb3d at vdev_indirect_io_start+0x2d
#10 0xffffffff8251ac9e at zio_vdev_io_start+0x2ae
#11 0xffffffff8251774c at zio_execute+0xac
#12 0xffffffff8251706b at zio_nowait+0xcb
#13 0xffffffff824f38ef at vdev_mirror_io_start+0x3ff
#14 0xffffffff8251ab52 at zio_vdev_io_start+0x162
#15 0xffffffff8251774c at zio_execute+0xac
#16 0xffffffff80b4ec14 at taskqueue_run_locked+0x154
#17 0xffffffff80b4fd78 at taskqueue_thread_loop+0x98
Uptime: 1m46s
Dumping 1886 out of 49109 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%


Reading symbols from /boot/kernel/ums.ko...Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ums.ko
Reading symbols from /boot/kernel/pf.ko...Reading symbols from /usr/lib/debug//boot/kernel/pf.ko.debug...done.
done.
Loaded symbols for /boot/kernel/pf.ko
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
#0  doadump (textdump=<value optimized out>) at pcpu.h:229
229     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb)




(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:229
#1  0xffffffff80af668b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:383
#2  0xffffffff80af6ab1 in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:776
#3  0xffffffff80af68f3 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:707
#4  0xffffffff80f77f6f in trap_fatal (frame=0xfffffe0c58d30720, eva=0) at /usr/src/sys/amd64/amd64/trap.c:875
#5  0xffffffff80f77fc9 in trap_pfault (frame=0xfffffe0c58d30720, usermode=0) at pcpu.h:229
#6  0xffffffff80f77797 in trap (frame=0xfffffe0c58d30720) at /usr/src/sys/amd64/amd64/trap.c:415
#7  0xffffffff80f5744c in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#8  0xffffffff82476994 in abd_get_offset (sabd=0x0, off=0)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:443
#9  0xffffffff824f01d7 in vdev_indirect_io_start_cb (split_offset=<value optimized out>, vd=0xfffff800237fd000,
    offset=1258659328, size=3584, arg=0xfffff8006ac49000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c:1082
#10 0xffffffff824efe58 in vdev_indirect_remap (vd=<value optimized out>, offset=<value optimized out>,
    asize=<value optimized out>, func=0xffffffff824f01a0 <vdev_indirect_io_start_cb>, arg=0xfffff8006ac49000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c:1041
#11 0xffffffff824efb3d in vdev_indirect_io_start (zio=0xfffff8006ac49000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c:1099
#12 0xffffffff8251ac9e in zio_vdev_io_start (zio=0xfffff8006ac49000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3297
#13 0xffffffff8251774c in zio_execute (zio=0xfffff8006ac49000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1768
#14 0xffffffff8251706b in zio_nowait (zio=0xfffff8006ac49000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1826
#15 0xffffffff824f38ef in vdev_mirror_io_start (zio=<value optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:557
#16 0xffffffff8251ab52 in zio_vdev_io_start (zio=0xfffff8006ad1c000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3166
#17 0xffffffff8251774c in zio_execute (zio=0xfffff8006ad1c000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1768
#18 0xffffffff80b4ec14 in taskqueue_run_locked (queue=0xfffff8006a839900) at /usr/src/sys/kern/subr_taskqueue.c:463
#19 0xffffffff80b4fd78 in taskqueue_thread_loop (arg=<value optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:755
#20 0xffffffff80aba0b3 in fork_exit (callout=0xffffffff80b4fce0 <taskqueue_thread_loop>, arg=0xfffff8006a7b81f0,
    frame=0xfffffe0c58d30c00) at /usr/src/sys/kern/kern_fork.c:1054
#21 0xffffffff80f5836e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:957
#22 0x0000000000000000 in ?? ()
(kgdb)








--------------------------------------



kgdb /boot/kernel/kernel /var/crash/vmcore.1
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
current process         = 0 (zio_free_issue_6_1)
trap number             = 12
fault code              = supervisor read data, page not present
                        = DPL 0, pres 1, long 1, def32 0, gran 1
instruction pointer     = 0x20:0xffffffff82476994
stack pointer           = 0x28:0xfffffe0c58b217e0
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (zio_free_issue_3_1)
frame pointer           = 0x28:0xfffffe0c58b21810
panic: page fault
cpuid = 5
KDB: stack backtrace:

#0 0xffffffff80b3d407 at kdb_backtrace+0x67
#1 0xffffffff80af6a77 at vpanic+0x177
#2 0xffffffff80af68f3 at panic+0x43
#3 0xffffffff80f77f6f at trap_fatal+0x35f
#4 0xffffffff80f77fc9 at trap_pfault+0x49
#5 0xffffffff80f77797 at trap+0x2c7
#6 0xffffffff80f5744c at calltrap+0x8
#7 0xffffffff824f01d7 at vdev_indirect_io_start_cb+0x37
#8 0xffffffff824efe58 at vdev_indirect_remap+0x2f8
#9 0xffffffff824efb3d at vdev_indirect_io_start+0x2d
#10 0xffffffff8251ac9e at zio_vdev_io_start+0x2ae
#11 0xffffffff8251774c at zio_execute+0xac
#12 0xffffffff8251706b at zio_nowait+0xcb
#13 0xffffffff824f38ef at vdev_mirror_io_start+0x3ff
#14 0xffffffff8251ab52 at zio_vdev_io_start+0x162
#15 0xffffffff8251774c at zio_execute+0xac
#16 0xffffffff80b4ec14 at taskqueue_run_locked+0x154
#17 0xffffffff80b4fd78 at taskqueue_thread_loop+0x98
Uptime: 16m41s
Dumping 1894 out of 49109 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%






Reading symbols from /boot/kernel/ums.ko...Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ums.ko
Reading symbols from /boot/kernel/pf.ko...Reading symbols from /usr/lib/debug//boot/kernel/pf.ko.debug...done.
done.
Loaded symbols for /boot/kernel/pf.ko
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
#0  doadump (textdump=<value optimized out>) at pcpu.h:229
229     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb)
(kgdb) bt


#0  doadump (textdump=<value optimized out>) at pcpu.h:229
#1  0xffffffff80af668b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:383
#2  0xffffffff80af6ab1 in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:776
#3  0xffffffff80af68f3 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:707
#4  0xffffffff80f77f6f in trap_fatal (frame=0xfffffe0c58adb720, eva=0) at /usr/src/sys/amd64/amd64/trap.c:875
#5  0xffffffff80f77fc9 in trap_pfault (frame=0xfffffe0c58adb720, usermode=0) at pcpu.h:229
#6  0xffffffff80f77797 in trap (frame=0xfffffe0c58adb720) at /usr/src/sys/amd64/amd64/trap.c:415
#7  0xffffffff80f5744c in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#8  0xffffffff82476994 in abd_get_offset (sabd=0x0, off=0)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:443
#9  0xffffffff824f01d7 in vdev_indirect_io_start_cb (split_offset=<value optimized out>, vd=0xfffff8002373f800,
    offset=1225929216, size=2560, arg=0xfffff801c43c9000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c:1082
#10 0xffffffff824efe58 in vdev_indirect_remap (vd=<value optimized out>, offset=<value optimized out>,
    asize=<value optimized out>, func=0xffffffff824f01a0 <vdev_indirect_io_start_cb>, arg=0xfffff801c43c9000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c:1041
#11 0xffffffff824efb3d in vdev_indirect_io_start (zio=0xfffff801c43c9000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c:1099
#12 0xffffffff8251ac9e in zio_vdev_io_start (zio=0xfffff801c43c9000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3297
#13 0xffffffff8251774c in zio_execute (zio=0xfffff801c43c9000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1768
#14 0xffffffff8251706b in zio_nowait (zio=0xfffff801c43c9000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1826
#15 0xffffffff824f38ef in vdev_mirror_io_start (zio=<value optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:557
#16 0xffffffff8251ab52 in zio_vdev_io_start (zio=0xfffff801c4419820)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3166
#17 0xffffffff8251774c in zio_execute (zio=0xfffff801c4419820)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1768
#18 0xffffffff80b4ec14 in taskqueue_run_locked (queue=0xfffff80168168100) at /usr/src/sys/kern/subr_taskqueue.c:463
#19 0xffffffff80b4fd78 in taskqueue_thread_loop (arg=<value optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:755
#20 0xffffffff80aba0b3 in fork_exit (callout=0xffffffff80b4fce0 <taskqueue_thread_loop>, arg=0xfffff800220f6f60,
    frame=0xfffffe0c58adbc00) at /usr/src/sys/kern/kern_fork.c:1054
#21 0xffffffff80f5836e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:957
#22 0x0000000000000000 in ?? ()


Current language:  auto; currently minimal
(kgdb)


# zpool import
   pool: test
     id: 632784374722369342
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        test          ONLINE
          mirror-0    ONLINE
            da1       ONLINE
            da2       ONLINE
          indirect-1  ONLINE
          mirror-2    ONLINE
            da5       ONLINE
            da6       ONLINE

and then it panics when trying to import.
Comment 1 Glen Barber freebsd_committer 2018-06-12 11:34:34 UTC
Can you provide dmesg(8) output, so we know what type of hardware is involved?
Comment 2 Glen Barber freebsd_committer 2018-06-12 17:28:54 UTC
FYI, this is an issue on 12-CURRENT as well.
Comment 3 crest 2018-09-05 08:49:39 UTC
I just ran into the same problem on FreeBSD 11.2-p2. Please put at least a warning in if the feature is not (yet) usable.
Comment 4 Roger Hammerstein 2018-09-13 16:27:00 UTC
On 12.0-ALPHA5 today, r338620M, the panic today is

ZFS storage pool version: features support (5000)
panic: solaris assert: rc-rc_count == number, file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c, line: 94
cpuid = 2
time = 1536840299
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0091cb87b0
vpanic() at vpanic+0x1a3/frame 0xfffffe0091cb8810
panic() at panic+0x43/frame 0xfffffe091cb8870
assfail() at assfail+0x1a/frame 0xfffffe0091cb8880
refcount_destroy_many() at refcount_destroy_many+0x2b/frame 0xfffffe0091cb88b0
abd_free() at abd_free+0x18d/frame 0xfffffe0091cb88e0
spa_vdev_copy_segment_write_done() at spa_vdev_copy_segment_write_done+0x20/frame 0xfffffe0091cb8910
zio_done() at zio_done+0xf21/frame 0xfffffe0091cb8990
zio_execute() at zio_execute+0x18c/frame 0xfffffe0091cb89e0
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfffffe0091cb8a40
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe0091cb8a70
fork_exit() at fork_exit+0x84/frame 0xfffffe0091cb8ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0091cb8ab0
--- trap 0, ripe = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100477 ]
Stopped at      kdb_enter+0x3b: movq      $0,kdb_why
db>
Comment 5 Allan Jude freebsd_committer 2018-10-11 04:25:04 UTC
(In reply to Roger Hammerstein from comment #4)
That panic is different, and may warrant its own PR.
See also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229007
Comment 6 Alexander Motin freebsd_committer 2018-10-11 21:08:12 UTC
I think those crashes are caused by lach of TRIM support in device removal code.  This patch fixes alike crashes for me: https://reviews.freebsd.org/D17523
Comment 7 Alexander Motin freebsd_committer 2018-10-11 21:09:58 UTC
(In reply to Alexander Motin from comment #6)
I meant original crashes.  The last one is different issue, also related to TRIM, which I also already identified and looking for solution.
Comment 8 commit-hook freebsd_committer 2018-10-12 15:14:26 UTC
A commit references this bug:

Author: mav
Date: Fri Oct 12 15:14:22 UTC 2018
New revision: 339329
URL: https://svnweb.freebsd.org/changeset/base/339329

Log:
  Add ZIO_TYPE_FREE support for indirect vdevs.

  Upstream code expects only ZIO_TYPE_READ and some ZIO_TYPE_WRITE
  requests to removed (indirect) vdevs, while on FreeBSD there is also
  ZIO_TYPE_FREE (TRIM).  ZIO_TYPE_FREE requests do not have the data
  buffers, so don't need the pointer adjustment.

  PR:		228750, 229007
  Reviewed by:	allanjude, sef
  Approved by:	re (kib)
  MFC after:	1 week
  Differential Revision:	https://reviews.freebsd.org/D17523

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c
Comment 9 Roger Hammerstein 2018-10-13 16:55:42 UTC
I moved the 11.2 test machine to 12, but I can make a new 11.2 test later.

The panic in Comment 4 is still occurring in the latest 12 at r339345.
Can you take a look at that ? 

FreeBSD 12.0-ALPHA9 (GENERIC) #3 r339345M
Comment 10 Allan Jude freebsd_committer 2018-10-13 17:19:13 UTC
(In reply to Roger Hammerstein from comment #9)
There is a 2nd patch you'll need as well.

See here:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229007#c12
Comment 11 Roger Hammerstein 2018-10-13 18:09:16 UTC
(In reply to Allan Jude from comment #10)

yes, that patch works.  

root@freebsd12:~ # zpool remove test mirror-1
root@freebsd12:~ #

root@freebsd12:~ # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Evacuation of mirror-1 in progress since Sat Oct 13 10:03:42 2018
    5.81M copied out of 5.81M at 850K/s, 100.00% done, 0h0m to go
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da9     ONLINE       0     0     0
            da8     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            da7     ONLINE       0     0     0
            da6     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            da5     ONLINE       0     0     0
            da4     ONLINE       0     0     0



errors: No known data errors
root@freebsd12:~ #
root@freebsd12:~ # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Removal of vdev 1 copied 5.81M in 0h0m, completed on Sat Oct 13 10:03:51 2018
    384 memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        test          ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da9       ONLINE       0     0     0
            da8       ONLINE       0     0     0
          mirror-2    ONLINE       0     0     0
            da5       ONLINE       0     0     0
            da4       ONLINE       0     0     0

errors: No known data errors
root@freebsd12:~ #



and for extra:

root@freebsd12:~ # zpool remove test mirror-2
root@freebsd12:~ # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Evacuation of mirror-2 in progress since Sat Oct 13 10:05:49 2018
    7.15M copied out of 7.15M at 1.79M/s, 100.00% done, 0h0m to go
    384 memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        test          ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da9       ONLINE       0     0     0
            da8       ONLINE       0     0     0
          mirror-2    ONLINE       0     0     0
            da5       ONLINE       0     0     0
            da4       ONLINE       0     0     0

errors: No known data errors
root@freebsd12:~ #

root@freebsd12:~ # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Removal of vdev 2 copied 7.15M in 0h0m, completed on Sat Oct 13 10:05:54 2018
    912 memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        test          ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da9       ONLINE       0     0     0
            da8       ONLINE       0     0     0

errors: No known data errors
root@freebsd12:~ #



root@freebsd12:~ # zpool clear test
root@freebsd12:~ # zpool status test
  pool: test
 state: ONLINE
  scan: none requested
remove: Removal of vdev 2 copied 7.15M in 0h0m, completed on Sat Oct 13 10:05:54 2018
    912 memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        test          ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da9       ONLINE       0     0     0
            da8       ONLINE       0     0     0

errors: No known data errors
root@freebsd12:~ #


root@freebsd12:~ # zpool export test
root@freebsd12:~ # zpool import test
zpool status test
root@freebsd12:~ # zpool status test
  pool: test
 state: ONLINE
  scan: none requested
remove: Removal of vdev 2 copied 7.15M in 0h0m, completed on Sat Oct 13 10:05:54 2018
    912 memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        test          ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da9       ONLINE       0     0     0
            da8       ONLINE       0     0     0

errors: No known data errors
root@freebsd12:~ #

FreeBSD freebsd12 12.0-ALPHA9 FreeBSD 12.0-ALPHA9 #3 r339345M
Comment 12 Roger Hammerstein 2018-10-14 18:24:22 UTC
11.2-stable also works (for one removal) with the second patch from 229007#c12

FreeBSD freebsd11 11.2-STABLE FreeBSD 11.2-STABLE #0 r339346: Sat Oct 13 15:30:06 EDT 2018   


without the second patch, the removal seemed to never finish, after leaving it overnight:


root@freebsd11:~ # zpool create test mirror /dev/da8 /dev/da7 mirror /dev/da6 /dev/da5 mirror /dev/da4 /dev/da3 mirror /dev/da2 /dev/da1
root@freebsd11:~ # zpool status
  pool: test
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da7     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            da6     ONLINE       0     0     0
            da5     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da3     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da1     ONLINE       0     0     0

errors: No known data errors
root@freebsd11:~ # cp -a /usr/src /test/
root@freebsd11:~ # zpool remove test mirror-2
root@freebsd11:~ #
root@freebsd11:~ # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Evacuation of mirror-2 in progress since Sat Oct 13 15:44:31 2018
    22.2M copied out of 22.2M at 2.46M/s, 100.00% done, 0h0m to go
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da7     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            da6     ONLINE       0     0     0
            da5     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da3     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da1     ONLINE       0     0     0


errors: No known data errors
root@freebsd11:~ # 


root@freebsd11:~ # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Evacuation of mirror-2 in progress since Sat Oct 13 15:44:31 2018
    22.2M copied out of 22.2M at 169K/s, 100.00% done, 0h0m to go
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da7     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            da6     ONLINE       0     0     0
            da5     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da3     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da1     ONLINE       0     0     0

errors: No known data errors
root@freebsd11:~ #



root@freebsd11:/usr/src # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Evacuation of mirror-2 in progress since Sat Oct 13 15:44:31 2018
    22.2M copied out of 22.2M at 6.68K/s, 100.00% done, 0h0m to go
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da7     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            da6     ONLINE       0     0     0
            da5     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da3     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da1     ONLINE       0     0     0

errors: No known data errors
root@freebsd11:/usr/src #


after overnight:


root@freebsd11:/usr/src # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Evacuation of mirror-2 in progress since Sat Oct 13 15:44:31 2018
    22.2M copied out of 22.2M at 355/s, 100.00% done, 0h0m to go
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da7     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            da6     ONLINE       0     0     0
            da5     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da3     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da1     ONLINE       0     0     0

errors: No known data errors



rebooting onto the new kernel with the patch from 229007



root@freebsd11:~ # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Evacuation of mirror-2 in progress since Sat Oct 13 15:44:31 2018
    1 copied out of 22.2M at 1/s, 0.00% done, (copy is slow, no estimated time)
config:

        NAME        STATE     READ WRITE CKSUM
        test        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da8     ONLINE       0     0     0
            da7     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            da6     ONLINE       0     0     0
            da5     ONLINE       0     0     0
          mirror-2  ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da3     ONLINE       0     0     0
          mirror-3  ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da1     ONLINE       0     0     0

errors: No known data errors


root@freebsd11:~ # zpool status
  pool: test
 state: ONLINE
  scan: none requested
remove: Removal of vdev 2 copied 22.2M in 18h20m, completed on Sun Oct 14 10:04:46 2018
    816 memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        test          ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da8       ONLINE       0     0     0
            da7       ONLINE       0     0     0
          mirror-1    ONLINE       0     0     0
            da6       ONLINE       0     0     0
            da5       ONLINE       0     0     0
          mirror-3    ONLINE       0     0     0
            da2       ONLINE       0     0     0
            da1       ONLINE       0     0     0

errors: No known data errors



but then a second remove of

 zpool remove test mirror-0 

paniced

KDB: stack backtrace:
#0 0xffffffff80b40df7 at kdb_backtrace+0x67
#1 0xffffffff80afa337 at vpanic+0x177
#2 0xffffffff80afa1b3 at panic+0x43
#3 0xffffffff80f7c38f at trap_fatal+0x35f
#4 0xffffffff80f7c3e9 at trap_pfault+0x49
#5 0xffffffff80f7ba8c at trap+0x29c
#6 0xffffffff80f5bfcc at calltrap+0x8
#7 0xffffffff824bc89b at vdev_indirect_io_start+0x9b
#8 0xffffffff824e9fa9 at zio_vdev_io_start+0x2a9
#9 0xffffffff824e68ec at zio_execute+0xbc
#10 0xffffffff824e61fb at zio_nowait+0xcb
#11 0xffffffff824c248f at vdev_mirror_io_start+0x41f
#12 0xffffffff824e9e5c at zio_vdev_io_start+0x15c
#13 0xffffffff824e68ec at zio_execute+0xbc
#14 0xffffffff80b52694 at taskqueue_run_locked+0x154
#15 0xffffffff80b537f8 at taskqueue_thread_loop+0x98
#16 0xffffffff80abd963 at fork_exit+0x83
#17 0xffffffff80f5cf8e at fork_trampoline+0xe
Uptime: 11m46s



but i think not everything that is in head has been mfcd back to 11 yet
Comment 13 Allan Jude freebsd_committer 2018-10-14 18:26:27 UTC
(In reply to Roger Hammerstein from comment #12)
When posting a panic, please include the message that comes before the backtrace as well.

While the backtrace is very important, we also need the actual error message.
Comment 14 Roger Hammerstein 2018-10-15 00:28:50 UTC
(In reply to Allan Jude from comment #13)

actually there is no panic on the console.


kgdb /boot/kernel/kernel /var/crash/vmcore.2


Unread portion of the kernel message buffer:




Fatal trap 12: page fault while in kernel mode
Fatal trap 12: page fault while in kernel mode

cpuid = 19; apic id = 26

fault virtual address   = 0x0
fault code              = supervisor read data, page not present

instruction pointer     = 0x20:0xffffffff8243df34

Fatal trap 12: page fault while in kernel mode
stack pointer           = 0x28:0xfffffe023ca77750
frame pointer           = 0x28:0xfffffe023ca77780
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (zio_free_issue_2_3)
trap number             = 12
panic: page fault
cpuid = 19
KDB: stack backtrace:
#0 0xffffffff80b40df7 at kdb_backtrace+0x67

#1 0xffffffff80afa337 at vpanic+0x177
#2 0xffffffff80afa1b3 at panic+0x43
#3 0xffffffff80f7c38f at trap_fatal+0x35f
#4 0xffffffff80f7c3e9 at trap_pfault+0x49
#5 0xffffffff80f7ba8c at trap+0x29c
#6 0xffffffff80f5bfcc at calltrap+0x8
#7 0xffffffff824bc89b at vdev_indirect_io_start+0x9b
#8 0xffffffff824e9fa9 at zio_vdev_io_start+0x2a9
#9 0xffffffff824e68ec at zio_execute+0xbc
#10 0xffffffff824e61fb at zio_nowait+0xcb
#11 0xffffffff824c248f at vdev_mirror_io_start+0x41f
#12 0xffffffff824e9e5c at zio_vdev_io_start+0x15c
#13 0xffffffff824e68ec at zio_execute+0xbc
#14 0xffffffff80b52694 at taskqueue_run_locked+0x154
#15 0xffffffff80b537f8 at taskqueue_thread_loop+0x98
#16 0xffffffff80abd963 at fork_exit+0x83
#17 0xffffffff80f5cf8e at fork_trampoline+0xe
Comment 15 commit-hook freebsd_committer 2018-10-15 21:59:41 UTC
A commit references this bug:

Author: mav
Date: Mon Oct 15 21:59:24 UTC 2018
New revision: 339372
URL: https://svnweb.freebsd.org/changeset/base/339372

Log:
  Skip VDEV_IO_DONE stage only for ZIO_TYPE_FREE.

  Device removal code uses zio_vdev_child_io() with ZIO_TYPE_NULL parent,
  that never happened before.  It confused FreeBSD-specific TRIM code,
  which does not use VDEV_IO_DONE for logical ZIO_TYPE_FREE ZIOs.  As
  result of that stage being skipped device removal ZIOs leaked references
  and memory that supposed to be freed by VDEV_IO_DONE, making it stuck.

  It is a quick patch rather then a nice fix, but hopefully we'll be able
  to drop it all together when alternative TRIM implementation finally get
  landed.

  PR:		228750, 229007
  Discussed with:	allanjude, avg, smh
  Approved by:	re (delphij)
  MFC after:	5 days
  Sponsored by:	iXsystems, Inc.

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
Comment 16 Alexander Motin freebsd_committer 2018-10-15 22:05:29 UTC
I believe ZFS device removal should now work in FreeBSD head.  I'll merge the fixes to stable/11 in a week.
Comment 17 commit-hook freebsd_committer 2018-10-19 04:31:25 UTC
A commit references this bug:

Author: mav
Date: Fri Oct 19 04:30:26 UTC 2018
New revision: 339440
URL: https://svnweb.freebsd.org/changeset/base/339440

Log:
  MFC r339329: Add ZIO_TYPE_FREE support for indirect vdevs.

  Upstream code expects only ZIO_TYPE_READ and some ZIO_TYPE_WRITE
  requests to removed (indirect) vdevs, while on FreeBSD there is also
  ZIO_TYPE_FREE (TRIM).  ZIO_TYPE_FREE requests do not have the data
  buffers, so don't need the pointer adjustment.

  PR:	228750, 229007

Changes:
_U  stable/11/
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_indirect.c
Comment 18 commit-hook freebsd_committer 2018-10-19 04:37:41 UTC
A commit references this bug:

Author: mav
Date: Fri Oct 19 04:37:28 UTC 2018
New revision: 339441
URL: https://svnweb.freebsd.org/changeset/base/339441

Log:
  MFC r339372: Skip VDEV_IO_DONE stage only for ZIO_TYPE_FREE.

  Device removal code uses zio_vdev_child_io() with ZIO_TYPE_NULL parent,
  that never happened before.  It confused FreeBSD-specific TRIM code,
  which does not use VDEV_IO_DONE for logical ZIO_TYPE_FREE ZIOs.  As
  result of that stage being skipped device removal ZIOs leaked references
  and memory that supposed to be freed by VDEV_IO_DONE, making it stuck.

  It is a quick patch rather then a nice fix, but hopefully we'll be able
  to drop it all together when alternative TRIM implementation finally get
  landed.

  PR:	228750, 229007

Changes:
_U  stable/11/
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
Comment 19 Alexander Motin freebsd_committer 2018-10-19 04:39:15 UTC
Merged to stable/11.
Comment 20 miguelmclara 2019-09-04 19:35:34 UTC
Sorry to bring this up again but was this fixed on 12.0 too?

I just removed a device and I did see the message in status about stating to remove it but then it panics with a similar error, only n my case I see:

panic: solaris assert: ((offset) & ((1ULL << vd->vdev_ashift) - 1)) == 0 (0xa00 == 0x0), file /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c, line: 3561

The fact that it shoes vdev_ashift leads me to believe that the cause is slightly different but still related to device removal.
Comment 21 miguelmclara 2019-09-04 23:04:08 UTC
Imported in read only and I see:

remove: Evacuation of label/zfs1 in progress since Wed Sep  4 17:37:55 2019
    29.5K copied out of 837G at 1/s, 0.00% done, (copy is slow, no estimated time)



But also with zdb I can see that ashift is not the same and I'm guessing that's why I get the panic:

           metaslab_array: 33
                metaslab_shift: 33
                ashift: 9
                asize: 1000199946240
                is_log: 0
                removing: 1



I suppose its probably easier to rebuild the pool