Bug 256368 - FreebBSD 13 does not boot after ZIL remove: panic: VERIFY(nvlist_lookup_uint64(configs[i], ZPOOL_CONFIG_POOL_TXG, &txg)
Summary: FreebBSD 13 does not boot after ZIL remove: panic: VERIFY(nvlist_lookup_uint6...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-02 07:09 UTC by Rumen Palov
Modified: 2021-06-23 10:10 UTC (History)
1 user (show)

See Also:


Attachments
Panic after removing ZIL (248.19 KB, image/jpeg)
2021-06-17 20:36 UTC, ruben
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rumen Palov 2021-06-02 07:09:00 UTC
Hello all,

We face bug with bootable zpool. 

It can be reproduced very easy. 

1) Install FreeBSD13 from official images with ZFS root partition and add one disk for ZIL. 

2) Write some megabytes to /root/example.txt for example.

3) Remove the ZIL.

*) Until here if you restart the OS between each step it boots successful. 

4) Add the same of new hdd as ZIL. 

5) Reboot OS

After step 5 the OS falls in booting loop. FreeBSD boot, pass to mounting root boot order step and then comes kernel panic with message : 

panic: VERIFY(nvlist_lookup_uint64(configs[i], ZPOOL_CONFIG_POOL_TXG, &txg)

6) To restore normal boot process, you need to boot from live-media, import the pool and remove the ZIL

My search in bugs database found few similar reports for older FreeBSD versions.

Cheers
Rumen Palov
Comment 1 ruben 2021-06-17 20:36:38 UTC
Created attachment 225899 [details]
Panic after removing ZIL

Observed the same panic while staging a ZIL replacement in VMWare. 

After clearing out the log vdev (and other special purpose vdevs like cache and special) and even removing the zpool.cache the panic remains. 

zdb -e / zdb -l don't show any trace of the log device
Comment 2 ruben 2021-06-18 17:49:12 UTC
I tried

* zeroing the log device
* ran bonnie++ -c 2 -n 2048:64:16:512 as a means to make sure any uberblock as seen with zdb -lllu would not refer to any uberblock that might know about the previous state, checked with timestamps seen in the zdb output.
  * Did this with prior attaching the zil vdev
  * Did this with after attaching the zil vdev
* completely resilvered the mirror the zil was intended for by breaking the mirror one way and re adding a clean device again and after resilvering doing it the other way around.

so my assumption is that this is not caused by misinterpreting data in either the zil or somewhere in the uberblocks, but beyond, and does survive a resilver

Doesn't seem to be triggered in 12.2p4
Comment 3 ruben 2021-06-23 10:10:27 UTC
Hard removing the SLOG device itself that is used for ZIL, causing the pool to fail, and then use a boot iso/memstick to zpool replace the missing SLOG device is a (not for the faint of heart) work around which doesn't trigger the condition in my VM lab.

I would advise that work around only if you have a backup of the pool (e.g. it is a mirror so you can rebuild the mirror if things go wrong anyway). Probably not applicable on single disk/zraid pools.