Bug 268100 - 13.1's loader.efi is not able to find a ZFS pool if it was checkpointed on 12.3
Summary: 13.1's loader.efi is not able to find a ZFS pool if it was checkpointed on 12.3
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: loader, needs-qa
Depends on:
Blocks:
 
Reported: 2022-12-01 16:22 UTC by ml
Modified: 2023-03-07 23:17 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ml 2022-12-01 16:22:46 UTC
Hello.
Suppose you:
_ use UEFI;
_ boot from ZFS;
_ are on 12.3;
_ checkpoint the root pool;
_ upgrade to 13.1.

Now 13.1's loader.efi won't be able to find your pool and the machine won't boot.
If you are lucky you'll be able to see "can not read checkpoint data", but this message might scroll fast out of sight.

It would not be a big problem, if this was documented.

N.B.
The above checklist reflect my case, but it could possibly be relaxed (e.g. WRT system versions) or narrowed (pool version or features?).
Comment 1 Warner Losh freebsd_committer freebsd_triage 2022-12-04 17:33:56 UTC
I need more specific details on how to recreate this. Why can't loader.efi see it? What are the characteristics of the ZFS snapshot that gives the loader grief?

Does this persist in a zfs send -> zfs receieve? If so can I get that dataset?

How important is this to you working?
Comment 2 ml 2022-12-04 18:49:58 UTC
(In reply to Warner Losh from comment #1)

First off, there are no "characteristics of the ZFS snapshot": we are talking about "zpool checkpoint", not "zfs snap".

I have no idea on why loader.efi can't boot from a checkpointed ZFS pool: 
I only saw the message above ("can not read checkpoint data").

I don't have the original problematic pool anymore: it was a production server and I needed to boot it ASAP, so I just issued "zpool checkpoint -d" (that is, as soon as I relized what the problem was).
I guess, in order to recreate the situation, procedure would be: install 12.3 UEFI+ZFS, checkpoint, upgrade to 13.1 (maybe upgrading boot loader is enough). The only thing I could add is it was a zraid5 pool (3 disks); not sure it matters.

Importance is "less than bulk" to me: it would have saved me three hours of spreading panic if I had known; now I'll simply check if a checkpoint exists before upgrading (or, if I forget, boot from an USB key and remove it later).
My only goal was to let other people know, so maybe they won't be hit so hard by this.