Bug 256422 - bhyve and Centos/Rocky 8.4 no boot after install
Summary: bhyve and Centos/Rocky 8.4 no boot after install
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-05 00:31 UTC by dave
Modified: 2021-06-10 16:38 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dave 2021-06-05 00:31:47 UTC
Installing Centos or Rocky 8.4 results in a failed boot.  The initial install works, but on reboot I get this while loading:  

Starting webhost04a
  * found guest in /storage/vm/webhost04a
  * booting...
BdsDxe: failed to load Boot0001 "UEFI bhyve-NVMe NVME-4-0" from PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,01-00-68-C1-20-FC-9C-58): Not Found  

Logging from vm-bhyve:  

Jun 04 17:18:00: booting 
Jun 04 17:18:00:  [bhyve options: -c 8 -m 16G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 62ff48d0-c58d-11eb-9187-f8bc1251963e]
Jun 04 17:18:00:  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,nvme,/dev/zvol/storage/vm/webhost04a/disk0 -s 5:0,virtio-net,tap0,mac=58:9c:fc:07:6d:b7 -s 6:0,fbuf,tcp=192.168.1.150:5900 -s 7:0,xhci,tablet]  


Note, Rocky 8.3 and Centos 8.3 both install and boot fine. with exactly the same configs in vm-bhyve
Comment 1 Chuck Tuffli freebsd_committer 2021-06-07 14:44:53 UTC
Does the guest boot if you change the device from nvme to ahci?
Comment 2 dave 2021-06-07 15:56:09 UTC
Installing guest again, using virtio-blk instead of nvme results in a working client.  

Same guest, just change nvme to ahci-hd, works.

Appears nvme support in UEFI bios has a change in the new RHEL 8.4 and successors.
Comment 3 Peter Grehan freebsd_committer 2021-06-07 20:28:25 UTC
I was able to repro this with Alma 8.3/8.4 (identical to Centos 8.3/8.4).

With a file-backed image on ZFS, the sectorsize parameter was forced to 4K and 512, with no difference in getting the system to boot.

The error appears to be in the EFI loader on Centos:

"
BdsDxe: loading Boot0001 "UEFI bhyve-NVMe NVME-4-0" from PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,01-00-DD-44-20-FC-9C-58)
BdsDxe: starting Boot0001 "UEFI bhyve-NVMe NVME-4-0" from PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,01-00-DD-44-20-FC-9C-58)
Unexpected return from initial read: Device Error, buffersize 0
Failed to load image \EFI\almalinux\grubx64.efi: Device Error
start_image() returned Device Error
StartImage failed: Device Error
"
Comment 4 Jason Tubnor 2021-06-09 02:18:58 UTC
Has this been tested against bare metal that has UEFI and NVMe?

I got the same as grehan@ when testing with both CentOS 8.4 and Stream. Observations suggest there is something up with the CentOS EFI shim for GRUB.

I have done testing against the following, fully updated as of 20210609 12:10 +10UTC:

openSUSE Tumbleweed
Ubuntu impish 21.10 nightly
Artix (Arch) GRUB 2.04-10 Linux 5.12.8

None of these experienced any issues with the NVMe device presented by bhyve.
Comment 5 Jason Tubnor 2021-06-09 03:41:28 UTC
I successfully updated CentOS from 8.3 to 8.4 and it is running fine on a NVMe bhyve device.

It is looking more like how the installer deals with determining the boot device and then writing this and the GRUB components to the storage.
Comment 6 Chuck Tuffli freebsd_committer 2021-06-09 23:28:11 UTC
Is it possible to recompile pci_nvme.c and enable debug in the failing case? I.e. change the code to:

    static int nvme_debug = 1;
Comment 7 Peter Grehan freebsd_committer 2021-06-10 15:24:59 UTC
This looks to be an edge condition in the EFI NVMe driver, caused by the large maximum data transfer size advertised by bhyve NVMe (2MB), and the increase in size of grubx64.efi from 1.9MB in centos 8.3, to 2.3MB in centos 8.4.

In 8.4, EFI attempts to read 2MB of grubx64.efi. However, the buffer starts at a non page-aligned address, using PRP1 in the command descriptor with an offset. PRP2 points to a PRP list, but with a 2MB transfer size, all 512 PRP entries in a page will be used. Since the first buffer was unaligned, there is a small amount left at the end, and EFI is putting garbage into that entry.

(Copying the smaller 8.3 grubx64.efi to an 8.4 system resulted in a successful boot).

A suggested fix is to drop the advertised mdts to something that isn't right on the verge of requiring a chained PRP list. Qemu defaults to 512KB, and h/w I've looked at advertises 256K. e.g.

--- a/usr.sbin/bhyve/pci_nvme.c
+++ b/usr.sbin/bhyve/pci_nvme.c
@@ -106,7 +106,7 @@ static int nvme_debug = 0;
 #define        NVME_MPSMIN_BYTES       (1 << (12 + NVME_MPSMIN))
 
 #define        NVME_PRP2_ITEMS         (PAGE_SIZE/sizeof(uint64_t))
-#define        NVME_MDTS               9
+#define        NVME_MDTS               7

(or 8)

8.4 boots fine with this change.
Comment 8 Jason Tubnor 2021-06-10 16:38:03 UTC
I can confirm the patch from grehan@ works as described. Tested against:

CentOS 8.4
Windows Server 2022
OpenBSD 6.9

No regression was introduced into the latter two existing operating systems on our system.