|Summary:||UEFI boot doesn't work on VMware virtual machines due to device path is broken|
|Product:||Base System||Reporter:||Qi Zhang <qiz>|
|Component:||kern||Assignee:||Ed Maste <emaste>|
|Severity:||Affects Some People||CC:||emaste, freebsdbugs, lli, vmware-gos-qa|
|Bug Depends on:|
Description Qi Zhang 2015-08-27 03:44:28 UTC
Created attachment 160393 [details] Screenshot of the UEFI boot I downloaded FreeBSD-10.2-RELEASE-amd64-uefi-dvd1.iso and installed it on a virtual machine (on vSphere/Fusion/Workstation). After booting the VM, it hang at: Hit [Enter] to boot immediately, or any other key for command prompt. Booting [/boot/kernel/kernel]... Start @ 0xffffffff802dfc70 ... (refer to the attachment) This issue was also found in FreeBSD 10.1 release. Based on our developer's investigation with FreeBSD 10.1 release, it was found that the bootloader corrupted one of our device path, which looked like this: Before passing control to the bootloader, there's a handle which has a Device Path that looks like this: 0xcce4618: 0x02 0x01 0x0c 0x00 0xd0 0x41 0x03 0x0a 0xcce4620: 0x00 0x00 0x00 0x00 0x01 0x01 0x06 0x00 0xcce4628: 0x01 0x07 0x03 0x01 0x08 0x00 0x01 0x00 0xcce4630: 0x00 0x00 0x04 0x02 0x18 0x00 0x00 0x00 0xcce4638: 0x00 0x00 0x14 0x00 0x00 0x00 0x00 0x00 0xcce4640: 0x00 0x00 0x04 0x00 0x00 0x00 0x00 0x00 0xcce4648: 0x00 0x00 0x7f 0xff 0x04 0x00 0xaf 0xaf ^^^^^^^^^ ^^^^^^^^^ end dev path length=4 That's the device path for the El Torito partition on the ATAPI CD drive. Very shortly afterwards, the Device Path is truncated in a way that is not valid: 0xcce4618: 0x02 0x01 0x0c 0x00 0xd0 0x41 0x03 0x0a 0xcce4620: 0x00 0x00 0x00 0x00 0x01 0x01 0x06 0x00 0xcce4628: 0x01 0x07 0x03 0x01 0x08 0x00 0x01 0x00 0xcce4630: 0x00 0x00 0x7f 0xff 0x18 0x00 0x00 0x00 ^^^^^^^^^ ^^^^^^^^^ end dev path length=0x18 After digging through the bootloader source code, it's found that the bootloader is broken while introducing one of the most awesomely deadpan uses of an expletive in error-handling code: https://svnweb.freebsd.org/base/projects/uefi/sys/boot/efi/libefi/efipart.c?annotate=247380&pathrev=247380 It should copy the Device Path before truncating it, and it should fix the length of the last node as it truncates the path. - Qi
Comment 1 Ed Maste 2015-09-01 17:49:37 UTC
> This issue was also found in FreeBSD 10.1 release. > > Based on our developer's investigation with FreeBSD 10.1 release, it was found that the bootloader corrupted one of our device path, which looked like this: That is a different issue and should be fixed by the change in PR 197641 (but please confirm). That change is in FreeBSD 10.2. I suspect you are running into another set of issues which have been fixed in HEAD and stable/10, but did not make it in time for 10.2-RELEASE. See PR 191564 and related PRs for the details of those issues -- can you please retest with a stable/10 or HEAD snapshot?
Comment 2 zofrex 2015-09-05 02:42:57 UTC
I encountered these same symptoms using 10.2-RELEASE and VMWare Fusion 7.1.2. I tested this with a few other versions (using the snapshot install media hosted by FreeBSD) to see if they worked. Results: 10.2-RELEASE - fails 10.2-STABLE (20150903-r287435) - fails 11.0-CURRENT (20150722-r285794) - works 11.0-CURRENT (20150903-r287437) - works - zofrex
Comment 3 Ed Maste 2015-09-08 15:42:48 UTC
I don't believe any of the failures reported in 10.2 and later are the issue reported in this PR (device path corruption), but we need more debugging information. Note that the symptom described here (nothing after "Start @ 0xffff....") will be the same for any panic or crash in early kernel startup. (In reply to zofrex from comment #2) > 10.2-RELEASE - fails Failure here is not surprising, as this does not have Marcel's efifb fixes > 10.2-STABLE (20150903-r287435) - fails Further debugging is needed here, and I'm surprised that this fails. Are you able to attach a virtual serial port and see if you can obtain a backtrace? > 11.0-CURRENT (20150722-r285794) - works > 11.0-CURRENT (20150903-r287437) - works Thanks, this is good to know.
Comment 4 zofrex 2015-09-08 20:20:35 UTC
Ed - if that's possible in VMWare I can certainly give it a go, but I'd appreciate you pointing me in the right direction :) Some additional data I discovered by accident - I had this same issue (or rather, the same symptoms) on a physical machine - Intel motherboard. The exact same boot disk booted fine on a Supermicro board so it wasn't an issue with the USB stick or the image. This was with 10.2-RELEASE. I can dig up the exact model numbers of the motherboards next week if that's an interesting data point.
Comment 5 Ed Maste 2015-09-08 20:45:06 UTC
> Ed - if that's possible in VMWare I can certainly give it a go, but I'd > appreciate you pointing me in the right direction :) I'm not sure how to do it in VMware -- in VirtualBox there's a "Serial Ports" config setting, which can be set to 'Raw File' and then have a path provided. All virtual serial output ends up in the file. For Fusion Google turned up https://pubs.vmware.com/fusion-5/index.jsp#com.vmware.fusion.help.doc/GUID-F1E20E9E-7588-4F3B-A0FC-A5FA7A68CFB4.html although there's a clear mistake in that page (it refers to "the output file of the virtual parallel port") so I'm not sure that it's well-tested. I'm not particularly interested in results from 10.2-RELEASE, unfortunately: there are known issues that will have these symptoms, so in the event that 10.2-RELEASE fails on given hardware or VM the first step is going to be to reproduce with a -STABLE snapshot anyhow.
Comment 6 Ed Maste 2015-12-16 17:50:22 UTC
Qi Zhang, can you retest with a recent stable/10 snapshot? The original issue that was investigated on 10.1 is certainly fixed. If you can reproduce the problem on the snapshot, a backtrace from the virtual serial port will be very useful.
Comment 7 Ed Maste 2015-12-21 21:19:35 UTC
Please test stable/10 after r292551, it will fix crashes during startup on certain systems / EFI implementations. https://svnweb.freebsd.org/base?view=revision&revision=292551
Comment 8 Ed Maste 2015-12-23 01:09:07 UTC
dim@ tested 10.2R and a stable/10 snapshot and this is fixed. The fix will be available in FreeBSD 10.3. 10.2: OK boot Booting... Start @ 0xffffffff802dfc70 ... EFI framebuffer information: addr, size 0xf4000000, 0x300000 dimensions 1024 x 768 stride 1024 masks 0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000 panic: BIOS smap did not include a basemem segment! cpuid = 0 KDB: stack backtrace: #0 0xffffffff80984e30 at ??+0 #1 0xffffffff809489e6 at ??+0 #2 0xffffffff809488b3 at ??+0 #3 0xffffffff80d35f6a at ??+0 #4 0xffffffff802dfc94 at ??+0 Uptime: 1s Stable/10 after r292551 OK boot Booting... Start @ 0xffffffff802e05d0 ... EFI framebuffer information: addr, size 0xf4000000, 0x300000 dimensions 1024 x 768 stride 1024 masks 0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000 Copyright (c) 1992-2015 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10.2-STABLE #0 r292636: Wed Dec 23 00:35:22 CET 2015 root@fleptest:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 VT(efifb): resolution 1024x768 CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (3999.99-MHz K8-class CPU) Origin="GenuineIntel" Id=0x306c3 Family=0x6 Model=0x3c Stepping=3 Features=0xfa3fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,DTS,MMX,FXSR,SSE,SSE2,SS> Features2=0x9ed83203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,HV> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant Hypervisor: Origin = "VMwareVMware" [...]