Created attachment 213430 [details] snapshot of kernel panic Hey, I removed a ZIL device from my ZFS pool. I ended up fully removing the device from my server, as well as regenerating /boot/zfs/zpool.cache after removing the device. On boot, I now get a kernel panic when trying to boot. Unusually, the server *immediately* reboots despite setting kern.panic_reboot_wait_time=1 in loader.conf. I managed to catch a blurry shot of the panic, attached. To save you from deciphering that, the panic is a failing assert right here: https://svnweb.freebsd.org/base/release/12.1.0/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c?revision=354337&view=markup#l5222 I was able to boot into mfsbsd and have been able to import and mount the pool with no problem, and have removed and regenerated zpool.cache a few times. I'm not sure what else to do, and this seems like a bug. Thanks!
Has the disk been physically removed? Can you show zpool status -v and gpart show output after importing the pool into mfsbsd?
At first, I tried booting with the disk still attached, since I was going to repurpose it as a cache device. However, now the disk is entirely removed. Note that I also have 2 USB disks (mfsbsd itself, and an 18TB external drive) attached. They're at the bottom of the gpart output. gpart show: => 34 7814037101 diskid/DISK-WD-WCC4EKKD0A8P GPT (3.6T) 34 6 - free - (3.0K) 40 1024 1 freebsd-boot (512K) 1064 4194304 2 freebsd-swap (2.0G) 4195368 7809841760 3 freebsd-zfs (3.6T) 7814037128 7 - free - (3.5K) => 40 15628053088 diskid/DISK-VAHDA7WL GPT (7.3T) 40 1024 1 freebsd-boot (512K) 1064 4194304 2 freebsd-swap (2.0G) 4195368 15623857760 3 freebsd-zfs (7.3T) => 34 7814037101 diskid/DISK-WD-WCC4E0478835 GPT (3.6T) 34 6 - free - (3.0K) 40 1024 1 freebsd-boot (512K) 1064 4194304 2 freebsd-swap (2.0G) 4195368 7809841760 3 freebsd-zfs (3.6T) 7814037128 7 - free - (3.5K) => 34 7814037101 diskid/DISK-WD-WCC4E1262418 GPT (3.6T) 34 6 - free - (3.0K) 40 1024 1 freebsd-boot (512K) 1064 4194304 2 freebsd-swap (2.0G) 4195368 7809841760 3 freebsd-zfs (3.6T) 7814037128 7 - free - (3.5K) => 34 7814037101 diskid/DISK-WD-WCC4E2VZV3E1 GPT (3.6T) 34 6 - free - (3.0K) 40 1024 1 freebsd-boot (512K) 1064 4194304 2 freebsd-swap (2.0G) 4195368 7809841760 3 freebsd-zfs (3.6T) 7814037128 7 - free - (3.5K) => 34 7814037101 ada5 GPT (3.6T) 34 6 - free - (3.0K) 40 1024 1 freebsd-boot (512K) 1064 4194304 2 freebsd-swap (2.0G) 4195368 7809841760 3 freebsd-zfs (3.6T) 7814037128 7 - free - (3.5K) => 34 7814037101 diskid/DISK-WD-WCC4E1965981 GPT (3.6T) 34 6 - free - (3.0K) 40 1024 1 freebsd-boot (512K) 1064 4194304 2 freebsd-swap (2.0G) 4195368 7809841760 3 freebsd-zfs (3.6T) 7814037128 7 - free - (3.5K) => 34 7814037101 diskid/DISK-WD-WCC4E2050088 GPT (3.6T) 34 6 - free - (3.0K) 40 1024 1 freebsd-boot (512K) 1064 4194304 2 freebsd-swap (2.0G) 4195368 7809841760 3 freebsd-zfs (3.6T) 7814037128 7 - free - (3.5K) => 40 655344 da0 GPT (7.5G) [CORRUPT] 40 472 1 freebsd-boot (236K) 512 654872 2 freebsd-ufs (320M) => 40 655344 diskid/DISK-07AA16081C285D19 GPT (7.5G) [CORRUPT] 40 472 1 freebsd-boot (236K) 512 654872 2 freebsd-ufs (320M) => 40 39065624496 da1 GPT (18T) 40 39065624496 1 freebsd-zfs (18T) => 40 39065624496 diskid/DISK-575542533239343130393639 GPT (18T) 40 39065624496 1 freebsd-zfs (18T) zpool status -v: pool: tank state: ONLINE status: One or more devices are configured to use a non-native block size. Expect reduced performance. action: Replace affected devices with devices that support the configured block size, or migrate data to a properly configured pool. scan: scrub repaired 0 in 0 days 06:42:00 with 0 errors on Sat Apr 11 08:36:28 2020 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 diskid/DISK-WD-WCC4E2VZV3E1p3 ONLINE 0 0 0 gptid/c74650ad-c61c-11e3-8b42-d0509909d8a6 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 diskid/DISK-WD-WCC4E0478835p3 ONLINE 0 0 0 diskid/DISK-WD-WCC4E1262418p3 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 diskid/DISK-WD-WCC4E1965981p3 ONLINE 0 0 0 block size: 512B configured, 4096B native diskid/DISK-VAHDA7WLp3 ONLINE 0 0 0 block size: 512B configured, 4096B native mirror-4 ONLINE 0 0 0 diskid/DISK-WD-WCC4E2050088p3 ONLINE 0 0 0 diskid/DISK-WD-WCC4EKKD0A8Pp3 ONLINE 0 0 0 errors: No known data errors
I am wondering if DISK-VAHDA7WL could be a problem. It has a 7+ TB partition mirrored with a 3+ TB partition in the pool. If there's any garbage that looks like a valid ZFS label in the unused portion of the larger partition that that might confuse ZFS.
Is there a way that I can verify that? While that's one of the newer disks, I have rebooted with that disk installed previously. I can also try breaking the mirror and rebooting that, if and only if that's the sole way to verify.
I have tried now removing the larger hard disk and rebooting, and I still get the same panic.
I came across a similar panic: VERIFY(nvlist_lookup_uint64(configs[i], ZPOOL_CONFIG_POOL_TXG, &txg) == 0) failed on a newer OpenZFS system with 13.0-CURRENT from 31 Dec. In this case, the panic was not after removing the ZIL device from the pool. There was only a panic on executing bectl list after removing the ZIL. However if I tried to add the ZIL back into the pool I see the panic on that statement. Should this be related, the test case in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252396 will cause the similar fault after re-adding the ZIL to the pool.