Bug 237637 - ZFS kernel panic after removing a vdev
Summary: ZFS kernel panic after removing a vdev
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs mailing list
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2019-04-29 05:52 UTC by Alex Bihlmaier
Modified: 2019-05-09 01:52 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Bihlmaier 2019-04-29 05:52:36 UTC
Hi,
on FreeBSD 11.2-RELEASE-p9 i removed a vdev on a ZFS pool, ZFS then started with "evacuating" the data on this device and proceeded for 40mins.
When zpool status reported 100% the system panicked and rebooted.

According to man 7 zpool-features device_removal is supported.

Current zpool status (after invoking the kernel immediately crashes):
  pool: zfspool
 state: ONLINE
  scan: scrub repaired 0 in 4h10m with 0 errors on Sat Mar 16 23:39:35 2019
remove: Removal of vdev 4 copied 49.9G in 0h40m, completed on Sun Apr 21 21:01:31 2019
    1.49M memory used for removed device mappings
config:

	NAME          STATE     READ WRITE CKSUM
	zfspool       ONLINE       0     0     0
	  da1         ONLINE       0     0     0
	  da2         ONLINE       0     0     0
	  da0         ONLINE       0     0     0

errors: No known data errors

Example kernel panic:
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x0
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff8246e994
stack pointer	        = 0x28:0xfffffe02384547e0
frame pointer	        = 0x28:0xfffffe0238454810
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (zio_free_issue_6_6)
trap number		= 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80b3d5b7 at kdb_backtrace+0x67
#1 0xffffffff80af6b57 at vpanic+0x177
#2 0xffffffff80af69d3 at panic+0x43
#3 0xffffffff80f77fdf at trap_fatal+0x35f
#4 0xffffffff80f78039 at trap_pfault+0x49
#5 0xffffffff80f77807 at trap+0x2c7
#6 0xffffffff80f580cc at calltrap+0x8
#7 0xffffffff824e81d7 at vdev_indirect_io_start_cb+0x37
#8 0xffffffff824e7e58 at vdev_indirect_remap+0x2f8
#9 0xffffffff824e7b3d at vdev_indirect_io_start+0x2d
#10 0xffffffff82512cae at zio_vdev_io_start+0x2ae
#11 0xffffffff8250f75c at zio_execute+0xac
#12 0xffffffff8250f07b at zio_nowait+0xcb
#13 0xffffffff824eb8ef at vdev_mirror_io_start+0x3ff
#14 0xffffffff82512b62 at zio_vdev_io_start+0x162
#15 0xffffffff8250f75c at zio_execute+0xac
#16 0xffffffff80b4edc4 at taskqueue_run_locked+0x154
#17 0xffffffff80b4ff28 at taskqueue_thread_loop+0x98
Uptime: 5d9h32m23s
Dumping 719 out of 8157 MB:..3%..12%..21%..32%..41%..52%..61%..72%..81%..92%
Dump complete
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...

Expected behaviour after device removal would be to have a usable, albeit reduced-size ZFS pool.

thanks
thal