Bug 202687 - UEFI boot doesn't work on VMware virtual machines due to device path is broken
Summary: UEFI boot doesn't work on VMware virtual machines due to device path is broken
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Ed Maste
URL:
Keywords: uefi
Depends on:
Blocks: 203349
  Show dependency treegraph
 
Reported: 2015-08-27 03:44 UTC by Qi Zhang
Modified: 2015-12-23 03:01 UTC (History)
4 users (show)

See Also:
koobs: mfc-stable10?


Attachments
Screenshot of the UEFI boot (43.97 KB, image/jpeg)
2015-08-27 03:44 UTC, Qi Zhang
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Qi Zhang 2015-08-27 03:44:28 UTC
Created attachment 160393 [details]
Screenshot of the UEFI boot

I downloaded FreeBSD-10.2-RELEASE-amd64-uefi-dvd1.iso and installed it on a virtual machine (on vSphere/Fusion/Workstation). After booting the VM, it hang at:

Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [/boot/kernel/kernel]...
Start @ 0xffffffff802dfc70 ...

(refer to the attachment)

This issue was also found in FreeBSD 10.1 release. 

Based on our developer's investigation with FreeBSD 10.1 release, it was found that the bootloader corrupted one of our device path, which looked like this:
Before passing control to the bootloader, there's a handle which has a Device Path that looks like this:

0xcce4618: 0x02 0x01 0x0c 0x00 0xd0 0x41 0x03 0x0a
0xcce4620: 0x00 0x00 0x00 0x00 0x01 0x01 0x06 0x00
0xcce4628: 0x01 0x07 0x03 0x01 0x08 0x00 0x01 0x00
0xcce4630: 0x00 0x00 0x04 0x02 0x18 0x00 0x00 0x00
0xcce4638: 0x00 0x00 0x14 0x00 0x00 0x00 0x00 0x00
0xcce4640: 0x00 0x00 0x04 0x00 0x00 0x00 0x00 0x00
0xcce4648: 0x00 0x00 0x7f 0xff 0x04 0x00 0xaf 0xaf
                     ^^^^^^^^^ ^^^^^^^^^
                     end dev path length=4

That's the device path for the El Torito partition on the ATAPI CD drive.

Very shortly afterwards, the Device Path is truncated in a way that is not valid:

0xcce4618: 0x02 0x01 0x0c 0x00 0xd0 0x41 0x03 0x0a
0xcce4620: 0x00 0x00 0x00 0x00 0x01 0x01 0x06 0x00
0xcce4628: 0x01 0x07 0x03 0x01 0x08 0x00 0x01 0x00
0xcce4630: 0x00 0x00 0x7f 0xff 0x18 0x00 0x00 0x00
                     ^^^^^^^^^ ^^^^^^^^^
                     end dev path length=0x18


After digging through the bootloader source code, it's found that the bootloader is broken while introducing one of the most awesomely deadpan uses of an expletive in error-handling code:
https://svnweb.freebsd.org/base/projects/uefi/sys/boot/efi/libefi/efipart.c?annotate=247380&pathrev=247380

It should copy the Device Path before truncating it, and it should fix the length of the last node as it truncates the path. 


- Qi
Comment 1 Ed Maste freebsd_committer 2015-09-01 17:49:37 UTC
> This issue was also found in FreeBSD 10.1 release. 
>
> Based on our developer's investigation with FreeBSD 10.1 release, it was found that the bootloader corrupted one of our device path, which looked like this:

That is a different issue and should be fixed by the change in PR 197641 (but please confirm). That change is in FreeBSD 10.2.

I suspect you are running into another set of issues which have been fixed in HEAD and stable/10, but did not make it in time for 10.2-RELEASE.

See PR 191564 and related PRs for the details of those issues -- can you please retest with a stable/10 or HEAD snapshot?
Comment 2 zofrex 2015-09-05 02:42:57 UTC
I encountered these same symptoms using 10.2-RELEASE and VMWare Fusion 7.1.2.

I tested this with a few other versions (using the snapshot install media hosted by FreeBSD) to see if they worked. Results:

10.2-RELEASE - fails
10.2-STABLE (20150903-r287435) - fails
11.0-CURRENT (20150722-r285794) - works
11.0-CURRENT (20150903-r287437) - works

- zofrex
Comment 3 Ed Maste freebsd_committer 2015-09-08 15:42:48 UTC
I don't believe any of the failures reported in 10.2 and later are the issue reported in this PR (device path corruption), but we need more debugging information. Note that the symptom described here (nothing after "Start @ 0xffff....") will be the same for any panic or crash in early kernel startup.


(In reply to zofrex from comment #2)
> 10.2-RELEASE - fails

Failure here is not surprising, as this does not have Marcel's efifb fixes

> 10.2-STABLE (20150903-r287435) - fails

Further debugging is needed here, and I'm surprised that this fails. Are you able to attach a virtual serial port and see if you can obtain a backtrace?

> 11.0-CURRENT (20150722-r285794) - works
> 11.0-CURRENT (20150903-r287437) - works

Thanks, this is good to know.
Comment 4 zofrex 2015-09-08 20:20:35 UTC
Ed - if that's possible in VMWare I can certainly give it a go, but I'd appreciate you pointing me in the right direction :)

Some additional data I discovered by accident - I had this same issue (or rather, the same symptoms) on a physical machine - Intel motherboard. The exact same boot disk booted fine on a Supermicro board so it wasn't an issue with the USB stick or the image. This was with 10.2-RELEASE. I can dig up the exact model numbers of the motherboards next week if that's an interesting data point.
Comment 5 Ed Maste freebsd_committer 2015-09-08 20:45:06 UTC
> Ed - if that's possible in VMWare I can certainly give it a go, but I'd
> appreciate you pointing me in the right direction :)

I'm not sure how to do it in VMware -- in VirtualBox there's a "Serial Ports" config setting, which can be set to 'Raw File' and then have a path provided. All virtual serial output ends up in the file.

For Fusion Google turned up

https://pubs.vmware.com/fusion-5/index.jsp#com.vmware.fusion.help.doc/GUID-F1E20E9E-7588-4F3B-A0FC-A5FA7A68CFB4.html

although there's a clear mistake in that page (it refers to "the output file of the virtual parallel port") so I'm not sure that it's well-tested.

I'm not particularly interested in results from 10.2-RELEASE, unfortunately: there are known issues that will have these symptoms, so in the event that 10.2-RELEASE fails on given hardware or VM the first step is going to be to reproduce with a -STABLE snapshot anyhow.
Comment 6 Ed Maste freebsd_committer 2015-12-16 17:50:22 UTC
Qi Zhang, can you retest with a recent stable/10 snapshot? The original issue that was investigated on 10.1 is certainly fixed. If you can reproduce the problem on the snapshot, a backtrace from the virtual serial port will be very useful.
Comment 7 Ed Maste freebsd_committer 2015-12-21 21:19:35 UTC
Please test stable/10 after r292551, it will fix crashes during startup on certain systems / EFI implementations.

https://svnweb.freebsd.org/base?view=revision&revision=292551
Comment 8 Ed Maste freebsd_committer 2015-12-23 01:09:07 UTC
dim@ tested 10.2R and a stable/10 snapshot and this is fixed. The fix will be available in FreeBSD 10.3.

10.2:

OK boot
Booting...
Start @ 0xffffffff802dfc70 ...
EFI framebuffer information:
addr, size     0xf4000000, 0x300000
dimensions     1024 x 768
stride         1024
masks          0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
panic: BIOS smap did not include a basemem segment!
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80984e30 at ??+0
#1 0xffffffff809489e6 at ??+0
#2 0xffffffff809488b3 at ??+0
#3 0xffffffff80d35f6a at ??+0
#4 0xffffffff802dfc94 at ??+0
Uptime: 1s

Stable/10 after r292551

OK boot
Booting...
Start @ 0xffffffff802e05d0 ...
EFI framebuffer information:
addr, size     0xf4000000, 0x300000
dimensions     1024 x 768
stride         1024
masks          0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
Copyright (c) 1992-2015 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.2-STABLE #0 r292636: Wed Dec 23 00:35:22 CET 2015
    root@fleptest:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
VT(efifb): resolution 1024x768
CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (3999.99-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306c3  Family=0x6  Model=0x3c  Stepping=3
  Features=0xfa3fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,DTS,MMX,FXSR,SSE,SSE2,SS>
  Features2=0x9ed83203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,HV>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
Hypervisor: Origin = "VMwareVMware"
[...]
Comment 9 Qi Zhang 2015-12-23 03:01:20 UTC
Hi Ed,
Sorry for the delay and thanks for verifying it on 10.x. Our OS lead for FreeBSD Yanhui He <yanhuih@vmware.com> has tested UEFI installation with FreeBSD-11.0-CURRENT-amd64-20151102-r290273-disc1.iso, and it works well. 


Thanks
- Qi