Bug 264282 - BIOS boot from GELI encrypted broken / 'currdev' set to wrong string
Summary: BIOS boot from GELI encrypted broken / 'currdev' set to wrong string
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.4-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Toomas Soome
URL: https://github.com/freebsd/freebsd-sr...
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-27 12:04 UTC by yamagi
Modified: 2024-01-10 03:12 UTC (History)
7 users (show)

See Also:
grahamperrin: mfc-stable12?


Attachments
Remove special dv_name for GELI disks, handle them like normal disks. (492 bytes, patch)
2022-06-03 11:55 UTC, yamagi
no flags Details | Diff
Prebuild /boot/loader with the patch applied, can be used to recover bricked systems (488.00 KB, application/octet-stream)
2022-06-03 11:56 UTC, yamagi
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description yamagi 2022-05-27 12:04:42 UTC
Hello,
in 13.1-RELEASE and -CURRENT as of 18054d0220cf it's impossible to boot from an GELI encrypted root volume, because the 'currdev' loader variable is set to a wrong string. Thus the loader is unable to load the boot data from the disk. I've bisect this in 13.1-RELEASE. It was broken in commit bc9154a208248, which was cherry picked from b4cb3fe0e39a.

Setup
-----
I've tested BIOS boot from GPT with / on GELI. / is decrypted by gptboot. The bootchain is BIOS -> pmbr (read from the MBR) -> gptboot (read from a freebsd-boot partition) -> /boot/loader.

geli show:
=>      40  41942960  vtbd0  GPT  (20G)
        40       256      1  freebsd-boot  (128K)
       296   4194304      2  freebsd-swap  (2.0G)
   4194600  37748400      3  freebsd-ufs  (18G)

It doesn't matter if it's real hardware or - like in this example - an VM. It happens regardless how many devices are attached.

Problem
-------
Try to boot the system. /boot/loader errors out with "ERROR: cannot open /boot/lua/loader.lua: no such file or directory." This is caused by the currdev variable get set to the wrong string:

# show currdev
gelidisk0p3:

The 'geli' at the beginning of the string is wrong. lsdev lists the device with its correct name, disk0p3. loaders build before bc9154a208248 are working fine, currdev is set to disk0p3.

Impact
------
This makes it impossible to boot from an encrypted /. At least not without manual interactions, like typing the correct path into the loader prompt.  I've testes only BIOS with GPT and UFS. I don't know if other combinations are also impacted.
Comment 1 ArturB 2022-05-28 23:02:34 UTC
I confirm this is a problem with the mentioned encrypted setup - it bricked one of my installs.

The dirty workaround might be to replace the /boot/loader with the one from the 13.0-RELEASE, that is, the last version when this disk encryption setup seems to have worked.

Great job on tracking the bug.
Comment 2 yamagi 2022-06-03 11:55:14 UTC
I had a deeper look. This is a long standing bug, it's been there since forever. bc9154a208248 exposed it and broke the BIOS bootloader on UFS on GELI encrypted devices. ZFS is not broken, because it uses another code path.

The bootloader tracks devices in `devsw` structs. These structs have a field `dv_name`, storing the name of the device type. For everything which qualifies as a harddisk the string is just "disk". The  BIOS loader (i386 loader in the code) derives the actual devicename from `dv_name` and some unit number magic, drive C: ends up as unit 0. That happens in `i386_fmtdev()`.

GELI employs a hack: GELI devices are treated like disks, the GELI initialisation code hijacks the `devsw` struct of the given disk and overwrites it with a specialized version. The hack is explained in stand/libsa/geli/gelidev.c:46. All lines are against 14-CURRENT as of 1326017849ee. This specialized version sets `dv_name` to "gelidisk". That's were the wrong string comes from.

Until bc9154a208248 the `currdev` variable got set before GELI was initalized. `dv_name` was still set to "disk". But with bc9154a208248 `currdev` is set after GELI was initialized and `dv_name` is already "gelidisk". The code doesn't take that into account and we end up with a wrong string.

I don't know enough about the bootloader to decide what the best fix is. We could add logic to handle "gelidisk", since GELI disks are just disks that looks overly complicated and error prone. We can introduce a new device type `DEVT_GELI` or something like that. That would require a lot of code handling that type, most of it would be a copy of the generic disk code. The easiest way is to change "gelidisk" to just "disk". That should be okay, because there's no code which handles "gelidisk" and GELI disks are disks.

The attached patch does exactly that. A oneliner, tested on my VM. I'm also attaching a prebuild loader with the patch applied. It can be used to recover bricked system, just replace /boot/loader with it.
Comment 3 yamagi 2022-06-03 11:55:59 UTC
Created attachment 234418 [details]
Remove special dv_name for GELI disks, handle them like normal disks.
Comment 4 yamagi 2022-06-03 11:56:41 UTC
Created attachment 234419 [details]
Prebuild /boot/loader with the patch applied, can be used to recover bricked systems
Comment 5 Alastair Hogge 2022-06-06 03:52:30 UTC
(In reply to yamagi from comment #3)

Hi yamagi,

Thanks for the amazing effort. I have tested your patch on three main-n256001-f1294737023 systems; they all booted expected and without intervention past the GELI passphrase prompt.

Thanks.
Comment 6 Allan Jude freebsd_committer freebsd_triage 2022-06-10 20:46:26 UTC
Does this only impact UFS?
Comment 7 yamagi 2022-06-11 07:13:32 UTC
It does NOT impact ZFS. I haven't looked into the details. I thing it has something to do with ZFS ending up with a currdev like 'zfs:system/ROOT/default:'. Since the device type isn't 'disk', the disk <-> gelidisk problem doesn't matter. Or something like that. That ZFS isn't affected is properly the reason why this wasn't noticed before. Most users boot from ZFS these days.

I haven't tested cd9660, ext2fs and msdosfs. From reading the source I would guess that they're affected, because they're using more or less the same path as UFS.
Comment 8 Kyle Evans freebsd_committer freebsd_triage 2022-06-19 06:55:28 UTC
Over to tsoome@ as committer of the identified commit... feel free to reassign.
Comment 9 commit-hook freebsd_committer freebsd_triage 2022-06-20 07:11:28 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e417249016efcca73c9edad21b94b1315bc44601

commit e417249016efcca73c9edad21b94b1315bc44601
Author:     Toomas Soome <tsoome@FreeBSD.org>
AuthorDate: 2022-06-20 06:51:44 +0000
Commit:     Toomas Soome <tsoome@FreeBSD.org>
CommitDate: 2022-06-20 07:10:14 +0000

    loader: GELI encrypted disk should still use device name disk

    geli_probe_and_attach() does pick geli_devsw structure for
    encrypted disks, the implementation depends on device
    name "disk" when device type is DEVT_DISK, but geli_devsw is
    setting name field "gelidisk".

    PR:             264282
    Submitted by:   yamagi@yamagi.org
    Reported by:    yamagi@yamagi.org
    MFC after:      2 weeks

 stand/libsa/geli/gelidev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 10 commit-hook freebsd_committer freebsd_triage 2022-07-04 06:16:24 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=67f8b6d985dbc85682d6f8b2dd5117068c19ca59

commit 67f8b6d985dbc85682d6f8b2dd5117068c19ca59
Author:     Toomas Soome <tsoome@FreeBSD.org>
AuthorDate: 2022-06-20 06:51:44 +0000
Commit:     Toomas Soome <tsoome@FreeBSD.org>
CommitDate: 2022-07-04 06:14:43 +0000

    loader: GELI encrypted disk should still use device name disk

    geli_probe_and_attach() does pick geli_devsw structure for
    encrypted disks, the implementation depends on device
    name "disk" when device type is DEVT_DISK, but geli_devsw is
    setting name field "gelidisk".

    PR:             264282
    Submitted by:   yamagi@yamagi.org
    Reported by:    yamagi@yamagi.org

    (cherry picked from commit e417249016efcca73c9edad21b94b1315bc44601)

 stand/libsa/geli/gelidev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 11 Graham Perrin freebsd_committer freebsd_triage 2022-12-13 04:21:05 UTC
<http://markmail.org/message/zwnm4lgzbvbtucei> in the midst of <https://lists.freebsd.org/archives/freebsd-hackers/2022-December/001714.html> discussion about a possible patch for 12.4-RELEASE.
Comment 12 Mark Linimon freebsd_committer freebsd_triage 2024-01-10 03:12:22 UTC
^Triage: MFC to 12 now Overcome By Events.