Bug 219411 - [zfs] Stale zpool.cache corrupts renamed (but not exported) root zpool on boot
Summary: [zfs] Stale zpool.cache corrupts renamed (but not exported) root zpool on boot
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-20 03:55 UTC by eborisch+FreeBSD
Modified: 2017-05-20 20:16 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description eborisch+FreeBSD 2017-05-20 03:55:11 UTC
If the root zpool is -- via LiveCD -- renamed, but not exported, and the zpool.cache file is not removed, two versions of the pool (old and new name) are imported, and corruption soon follows.

Discussed on mailing list: https://lists.freebsd.org/pipermail/freebsd-fs/2017-May/024717.html

Steps to reproduce (don't do it to your live system!):

Install FreeBSD root-on-zfs
Reboot into LiveCD 
Rename the installed root pool [1] [2] [3]
Reboot into installed OS.
Two pools will show under zpool list/status: the original name and the new name. Errors are quickly reported, and the pool is corrupted (will no longer import) very quickly. Kick off a scrub on either pool to force the issue for testing.

Notes:
[1] To do this, a 'zfs import -f ...' is required, as you can't export your root pool while it is, well, your root pool. There was some discussion on the mailing list if this is sufficient to just say "well, you've shot yourself in the foot, congrats, not a bug." I contend it is not, as it is the existence of the stale cache file that seems to cause the problem; the zpool itself is perfectly happy after being imported in this way. A bad config (cache) file shouldn't corrupt a zpool on boot. Halt the boot, by all means, if necessary, but don't trash the pool.

[2] If you mount the pool at this point and remove the old zpool.cache file, the problem does not appear on reboot and everything appears to be OK.

[3] If you export the pool before rebooting at this point, with our without removing the old zpool.cache file, the problem also doesn't occur.

It seems that some combination of the bootloader's initial opening of a "currently imported" pool and the handoff to the kernel (and later parsing of the zpool.cache file) mishandles the case where a pool described in the cache file with a different name but the same GUID is improperly imported again.