As I mentioned in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195819#c24, I found the I couldn't boot UEFI VM (i.e. Hyper-V Generation-2 VM) with the 10.3 uefi iso, the 11-alpha iso and 11-beta1 iso. The symptom is the same (it looks the loader fails to load the kernel, or less likely the kernel crashes at the very early point where the uefi fb driver fails to load?). I'll upload the screenshot FYI. The bug happens on Windows Server 2012 R2, 2016 TP5 host or Windows 8.1, 10 hosts. Since the bug also exists in the 10.3 UEFI iso (which was released in March), I think the bug was introduced before that time. Note: the 10.2 uefi iso can boot (see bug 195819) to the "Welcome to FreeBSD" installation screen with "set machdep.disable_tsc_calibration=1" and UP VM (SMP VM still hangs), so the loader in 10.2 is good.
Created attachment 173534 [details] uefi vm can't boot. This is the screenshot.
By bisecting, I found the first "bad" commit for this bug is: https://github.com/freebsd/freebsd/commit/6471c2fc7c1fced2b5d2073b1629aa76588c61e2 (it changed EFI_STAGING_SIZE from 32MB to 48MB). It looks Windows Server 2012 R2 Hypervisor doesn't support UEFI VM boot with the 48MB? Trying to dig into this. Any help is appreciated!
It looks like AllocatePages() succeeded. Is that correct? If yes, then it's possible that there's a Hyper-V bug. Try to obtain the memory map before the call to AllocatePages and then again after the call to AllocatePages. The memory map should show a difference of exactly the amount of pages being allocated. Some unused memory region should be smaller by the same amount. Any discrepancy is an indication of a bug in AllocatePages. If the memory map looks fine, then the problem may be later: at the time the loader is ready to jump into the kernel. The loader calls ExitBootServices around that time for example... HTH,
(In reply to Marcel Moolenaar from comment #3) Hi Marcel, Thank you for the quick help! Yes, I checked all the AllocatePages() calls and they all succeeded, i.e. returning 0. I found the crash happened in elf64_exec() -> trampoline() -> efi_copy_finish -> *dst++ = *src++; In efi_copy_finish(), I added some printf's to dump the values of the varilables: /boot/kernel/kernel text=0xfe3048 data=0x128b68+0x207fa0 syms=[0x8+0x146f88+0x8+ Booting... Start @ 0xffffffff802e2640 ... EFI framebuffer information: addr, size 0xf8000000, 0x800000 dimensions 1024 x 768 stride 1024 masks 0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000 efi_copy_finish : staging=0xf37cb000 efi_copy_finish : staging_end=0xf67cb000 efi_copy_finish : staging_offset=0xf35cb000 efi_copy_finish : src=0xf37cb000, dst=0x200000, last=0xf67cb000 If I change the line last = (uint64_t *)staging_end; to last = (uint64_t *)staging + (1024*1024*45); The crash won't happen and the kernel can boot fine. I'm using the releng/10.3 branch, where EFI_STAGING_SIZE is 48MB. This is to say, the kernel can boot fine if I use EFI_STAGING_SIZE=45MB. Any idea? Why do you think is it a Hyper-V firmware bug in AllocatePages()? I'm not familar with UEFI Boot Services. :-) I'll check the memory map before/after the call to AllocatePages(). I'm going to use sys/boot/efi/loader/main.c: command_memmap() as an example to call GetMemoryMap.
(In reply to Dexuan Cui from comment #4) > last = (uint64_t *)staging + (1024*1024*45); I meant last = (uint64_t *) (staging + (1024*1024*45)); (I missed a pair of parentheses)
Check in the EFI memory map whether there's runtime-persistent memory at 0x200000 + 45MB (or abouts). Runtime persistent memory are memory allocations of type runtime, firmware, (e.g. ACPI non-reclaimable), etc. I.e anything that the kernel can't use during runtime. If the memory at 0x200000 isn't free during runtime, then we have a real problem... Some background: FreeBSD makes an invalid or at least a questionable assumption that the memory map under EFI is compatible with the memory layout as seen under the BIOS. As time goes by, this will probably be less and less valid. The staging area is a side-effect of the kernel wanting to be loaded in a contiguous memory region, starting at 0x200000. Something that is can't be done under EFI without a staging area. To wit: The old (and now removed) Itanium kernel used a virtual address space for the kernel. The contiguous virtual kernel address mapped to underlying EFI allocated pages that didn't have to be contiguous. On Itanium the physical memory map could be without addressable memory in the first 4GB even, so no amount of staging would be able to handle that. Hopefully the problem is much more trivial than all of that...
(In reply to Marcel Moolenaar from comment #6) I'll be attaching the memory map before and after efi_copy_init() -> BS->AllocatePages (..., 48MB, ...) BTW, in my test, STAGE_PAGES <= 45MB is OK, and >=46MB will cause the crash.
Created attachment 179977 [details] memory map before vs. after allocating 48MB memory
(In reply to Marcel Moolenaar from comment #6) Differences between the maps are: 1) In the first red rectangle, we can see 48MB memory (0x3000 pages) is allocated from ConvventionalMemory to LoaderData and this is the expected correct behavior. 2) In the second red rectangle, we can see 4 pages are allocated, probably by the firmmare itself. I guess we can ignore this. 3) No other difference except the above. So, Hyper-V's AllocatePages should be correct. :-) But there is indeed some BootServicesData memory starting at 0x2f73000 with 0x118d pages (i.e. starting at 47.449 MB with the length == 17.55MB)!!! "elf64_exec() -> trampoline() -> efi_copy_finish -> *dst++ = *src++;" tries to overwrite the memory between 2MB and 2+48=50MB, and we get the crash -- I guess Hyper-V has some mechanism to prevent the guest from writing into that BootServideData memory block. With STAGE_PAGES <= 45MB, we don't touch that BootServiceData memory block due to the fact 2+45 < 47.449 and hence we don't get the crash. Now, it looks to me this is a bug of FreeBSD? It looks the hardcoded macro for the 2MB kernel base is LOADER_ADDRESS. I guess it's not easy to make it a runtime dynamic value, but we should have a long term plan to fix this. And can we have a short term workaround for Hyper-V? e.g. making STAGE_PAGES a dynamic variable and using 45MB if the loader detecets the underlying hypervisor is Hyper-V??? :-) Thanks Marcel for the effective help and let's brainstorm.
I just realized that efi_copy_finish() is called via trampoline(). I presume that this means that it runs with the temporary mapping that was created in elf64_exec(). We only map the 1GB of physical memory (using 2MB pages). This means that the EFI allocation must be placed below 0x40000000. However, it is placed at 0xf37cb000. Maybe the problem is that for smaller allocations, EFI assigns low memory (i.e. under 1GB), but for larger allocations it favors high memory? To test this theory, change efi_copy_init() and instead of passing AllocateAnyPages as the first argument, pass AllocateMaxAddress as the first argument and put a maximum address of 1GB in the staging variable before the call to AllocatePages (see also elf64_exec). Either the allocation fails, or we're guaranteed to have memory below 1GB. HTH,
(In reply to Dexuan Cui from comment #9) In efi_copy_finish(), I added the below new line while (src < last) { + printf("trying to write: %lx(%p) to %lx(%p)\n", *src, src, *dst, dst); *dst++ = *src++; } and got the below log ... trying to write: 0(0xf653dfb0) to 0(0x2f72fb0) trying to write: 0(0xf653dfb8) to 0(0x2f72fb8) trying to write: 0(0xf653dfc0) to 0(0x2f72fc0) trying to write: 0(0xf653dfc8) to 0(0x2f72fc8) trying to write: 0(0xf653dfd0) to 0(0x2f72fd0) trying to write: 0(0xf653dfd8) to 0(0x2f72fd8) trying to write: 0(0xf653dfe0) to 0(0x2f72fe0) trying to write: 0(0xf653dfe8) to 0(0x2f72fe8) trying to write: 0(0xf653dff0) to 0(0x2f72ff0) trying to write: 0(0xf653dff8) to 0(0x2f72ff8) trying to write: 0(0xf653e000) to 2f74023(0x2f73000) This means the crash happened when dst==0x2f73000, i.e. the first 8-bytes of the BootServicesData. I suspect Hyper-V has a mechanism to mark the range of memory as read-only.
(In reply to Marcel Moolenaar from comment #10) Hi Mercel, You're correct about the second bug -- we don't hit the second bug just because we are lucky: when accessing 0xf37cb000, we actually access 0x337cb000, since 0xf - 0x4 * 3 == 0x3. This is my understanding. I made the below changes and got the new log: + staging = 0x40000000; /* 1GB */ - status = BS->AllocatePages(AllocateAnyPages, EfiLoaderData, + status = BS->AllocatePages(AllocateMaxAddress, EfiLoaderData, STAGE_PAGES, &staging); efi_copy_finish: calling trampoline efi_copy_finish: staging=3d000000 efi_copy_finish: staging_end=40000000 efi_copy_finish: staging_offset=3ce00000 efi_copy_finish: src=0x3d000000, dst=0x200000, last=0x40000000 ... trying to write: 0(0x3fd72fe0) to 0(0x2f72fe0) trying to write: 0(0x3fd72fe8) to 0(0x2f72fe8) trying to write: 0(0x3fd72ff0) to 0(0x2f72ff0) trying to write: 0(0x3fd72ff8) to 0(0x2f72ff8) trying to write: 0(0x3fd73000) to 2f74023(0x2f73000) Dexuan: Crash!!!
Created attachment 180002 [details] new mem map before/after the AllocatePages with staging below 1G
BTW, anyone knows how to change the kernel base address (physical address)? Currently it should be 2MB, and I tried to change it to 128MB like this: --- a/sys/conf/ldscript.amd64 +++ b/sys/conf/ldscript.amd64 SECTIONS { /* Read-only sections, merged into text segment: */ - kernphys = CONSTANT (MAXPAGESIZE); + kernphys = 0x8000000; . = kernbase + kernphys + SIZEOF_HEADERS; .interp : { *(.interp) } .hash : { *(.hash) } But it looks this doesn't work: in efi_copy_finish(), the 'dst' showed 0 rather than 128MB. I did "make cleanworld; make buildkernel build world".
Maybe because of sys/boot/common/load_elf.c, line 329?
(In reply to Marcel Moolenaar from comment #15) Probably. IMO this means we can't freely change the 2MB kernphys.
I think the complexity of having the kernel at any other physical address is what has us do the staging/copying. It was a quick-n-dirty mechanism that avoided a lot of work and complexity -- which is ok if you don't know it's worth/needed to go through all that hassle. And I guess it looks like we now hit a case that warrants us to start looking at a real solution. As an example (for inspiration): For Itanium I had the kernel link against a fixed virtual address. The loader built the VA-to-PA mapping based on where EFI allocated blobs of memory. The mapping was loaded/activated prior to booting the kernel and the loader gave the kernel all the information it needed to work with the mapping. This makes it possible to allocate memory before the VM system is up and running. Ultimately the mapping needs to be incorporated into the VM system and this is where different CPU architectures have different challenges and solutions. Note that there are advantages to having the kernel link against a virtual address. In general it makes it easier to load or relocate the kernel anywhere and this enables a few capabilities that other OSes already have and then some. There are also downsides. You may need to support a large VA range if you want to support pre-loading CD-ROM images or run entirely form a memory disk that's preloaded. A few GB of address space would be good to have. Anyway: It's probably time that to you restate this bug into an architectural (x86-specific for now) problem and have a discussion on the arch@ mailing list. We need a more people involved to bring this to a closure. Good luck
(In reply to Marcel Moolenaar from comment #17) Thanks very much for sharing the insights, Marcel! BTW, last year, people already reported an issue with mfs-based images + EFI_STAGING_SIZE=1024MB: https://lists.freebsd.org/pipermail/freebsd-hackers/2016-November/050142.html
(In reply to Dexuan Cui from comment #18) It looks this is indeed a complex issue and a lots of work is required to have a thorough fix. For now, I'm planning to make a patch to verify the assumption that the physical memory in the range [2MB, 2MB+EFI_STAGING_SIZE) is indeed Conventional Memory that doesn't contain firmware/loader data/code: if 'staging_end' is not in the same Conventional Memory block as 'staging', we print a warning and decrease 'staging_end' to the end of the memory block. This won't introduce new issues or risks, but can indeed help in the cases like this bug. :-)
(In reply to Dexuan Cui from comment #19) I posted https://reviews.freebsd.org/D9686 for this bug.
A commit references this bug: Author: dexuan Date: Thu Mar 2 07:25:50 UTC 2017 New revision: 314547 URL: https://svnweb.freebsd.org/changeset/base/314547 Log: loader.efi: reduce the size of the staging area if necessary The loader assumes physical memory in [2MB, 2MB + EFI_STAGING_SIZE) is Conventional Memory, but actually it may not, e.g. in the case of Hyper-V Generation-2 VM (i.e. UEFI VM) running on Windows Server 2012 R2 host, there is a BootServiceData memory block at the address 47.449MB and the memory is not writable. Without the patch, the loader will crash in efi_copy_finish(): see PR 211746. The patch verifies the end of the staging area, and reduces its size if necessary. This way, the loader will not try to write into the BootServiceData memory any longer. Thank Marcel Moolenaar for helping me on this issue! The patch also allocates the staging area in the first 1GB memory. See the comment in the patch for this. PR: 211746 Reviewed by: marcel, kib, sephe Approved by: sephe (mentor) MFC after: 2 weeks Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D9686 Changes: head/sys/boot/efi/loader/copy.c
(In reply to commit-hook from comment #21) Some people reported the fix was breaking their hosts. Chris H is the first person to dump the EFI memory map to me: on the host there is a 1MB LoaderData memory range, which splits the big Conventional Memory range into a small one (15MB) and a big one: the small one is too small to hold the staging area. Marcel did remind me of this: https://reviews.freebsd.org/D9686?id=25414#inline-56907 But unluckily I didn't realize the importance, since I didn't really have access to an EFI-boot physical FreeBSD host and FreeBSD VM running on Hyper-V didn't expose the issue. I'll attach the screen shot of Chris's mem map FYI. And I'm going to make a patch shortly.
Created attachment 180552 [details] uefi-mem-map-on-native.png
(In reply to Dexuan Cui from comment #23) FYI: Let me attach another sample of unusual EFI memory map from Alex's Supermicro A1SRM-2758F host. We can notice the 2MB range's type is BootServicesData instead of the usual ConventionalMemory: BootServicesData 000000200000 000000000000 00000040 UC WC WT WB Luckily it seems we can write the staging area into this range without any issue.
Created attachment 180585 [details] EFI mem map from Supermicro A1SRM-2758F host
(In reply to Dexuan Cui from comment #25) FYI: Let me attach 1 more sample of unusual EFI memory map from Roberto's physical host. We can notice there is an unusual 4MB range of BootServicesCode at [12MB, 16MB)...
Created attachment 180660 [details] The unusual EFI memory map on Roberto's physical host
(In reply to Dexuan Cui from comment #27) > We can notice there is a 4MB BootServicesCode range at [12MB, 16MB) ... loader.efi just writes into this range by force -- this is unsafe anyway! To fix this correctly & thoroughly, IMO we need a relocatable kernel, but that would require a lot of complicated long term work: https://reviews.freebsd.org/D9686?id=25414#inline-56969 For now, I suggest we should only apply the idea "reduce the size of the staging area if necessary" to VM running on Hyper-V, we should restore the old behavior on physical machines since that has been working for people for a long period of time, though it's potentially unsafe. I think in the loader we can use CPUID to tell if we're running on Hyper-V or not. I'm making a patch for this...
A commit references this bug: Author: dexuan Date: Tue Mar 14 08:12:14 UTC 2017 New revision: 315235 URL: https://svnweb.freebsd.org/changeset/base/315235 Log: loader.efi: use stricter check for Hyper-V Some other hypervisors like Xen can pretend to be Hyper-V but obviously they can't implement all Hyper-V features. Let's make sure we're genuine Hyper-V here. Also fix some minor coding style issues. PR: 211746 MFC after: 2 weeks Sponsored by: Microsoft Changes: head/sys/boot/efi/loader/copy.c
(In reply to Dexuan Cui from comment #27) Thanks for collecting the memory maps and resolving any issues!
A commit references this bug: Author: dexuan Date: Thu Mar 30 12:41:21 UTC 2017 New revision: 316272 URL: https://svnweb.freebsd.org/changeset/base/316272 Log: MFC: 314547, 314770, 314828, 314891, 314956, 314962, 315235 r314547 loader.efi: reduce the size of the staging area if necessary The loader assumes physical memory in [2MB, 2MB + EFI_STAGING_SIZE) is Conventional Memory, but actually it may not, e.g. in the case of Hyper-V Generation-2 VM (i.e. UEFI VM) running on Windows Server 2012 R2 host, there is a BootServiceData memory block at the address 47.449MB and the memory is not writable. Without the patch, the loader will crash in efi_copy_finish(): see PR 211746. The patch verifies the end of the staging area, and reduces its size if necessary. This way, the loader will not try to write into the BootServiceData memory any longer. Thank Marcel Moolenaar for helping me on this issue! The patch also allocates the staging area in the first 1GB memory. See the comment in the patch for this. PR: 211746 Reviewed by: marcel, kib, sephe Approved by: sephe (mentor) Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D9686 r314770 loader.efi: fix recent UEFI-boot regression on physical machines This patch fixes my recent patch "loader.efi: reduce the size of the staging area if necessary", which causes EFI-boot failure on physical machines since Mar 2: on the host there is a 1MB LoaderData memory range, which splits the big Conventional Memory range into a small one (15MB) and a big one: the small one is too small to hold the staging area. We can actually use the LoaderData range safely, because when amd64_tramp -> efi_copy_finish() starts to run, we're almost at the very end of the efi loader code and we're going to "return" to the kernel entry, so we're pretty sure we won't access any loader data any more. For people who are interested in the details: please see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211746#c22 PS, some people also reported the regression happened to FreeBSD VM running on Bhyve in EFI mode. This patch should resolve it too, though I don't have such a setup to test. Reviewed by: sephe Approved by: sephe (mentor) Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D9904 r314828 loader.efi: fix an off-by-one bug in efi_verify_staging_size() Also remove the warning message: it may not be unusual to see the memory range containing 2MB is not of EfiConventionalMemory. Sponsored by: Microsoft r314891 loader.efi: finally fix the off-by-one bug in efi_verify_staging_size() r314828(loader.efi: fix an off-by-one bug in efi_verify_staging_size()) doesn't really fix the bug and this patch adds the missing part. It's a shame that I didn't make everything correct at the very beginning... Sponsored by: Microsoft r314956 loader.efi: only reduce the size of the staging area on Hyper-V Doing this on physical hosts turns out to be problematic, e.g. see comment 24 and 28 in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211746. To fix the real underlying issue correctly & thoroughly, IMO we need a relocatable kernel, but that would require a lot of complicated long term work: https://reviews.freebsd.org/D9686?id=25414#inline-56969 For now, let's only apply efi_verify_staging_size() to VMs running on Hyper-V, and restore the old behavior on physical machines since that has been working for people for a long period of time, though that's potentially unsafe... Sponsored by: Microsoft r314962 loader.efi: only include the machine/ header files on x86 The 2 files may not exist on other archs like aarch64 and hence we can have a build failure there. Reported by: lwhsu Sponsored by: Microsoft r315235 loader.efi: use stricter check for Hyper-V Some other hypervisors like Xen can pretend to be Hyper-V but obviously they can't implement all Hyper-V features. Let's make sure we're genuine Hyper-V here. Also fix some minor coding style issues. PR: 211746 Sponsored by: Microsoft PR: 211746 Changes: _U stable/11/ stable/11/sys/boot/efi/loader/copy.c
A commit references this bug: Author: dexuan Date: Thu Mar 30 12:51:44 UTC 2017 New revision: 316273 URL: https://svnweb.freebsd.org/changeset/base/316273 Log: MFC: 314547, 314770, 314828, 314891, 314956, 314962, 315235 r314547 loader.efi: reduce the size of the staging area if necessary The loader assumes physical memory in [2MB, 2MB + EFI_STAGING_SIZE) is Conventional Memory, but actually it may not, e.g. in the case of Hyper-V Generation-2 VM (i.e. UEFI VM) running on Windows Server 2012 R2 host, there is a BootServiceData memory block at the address 47.449MB and the memory is not writable. Without the patch, the loader will crash in efi_copy_finish(): see PR 211746. The patch verifies the end of the staging area, and reduces its size if necessary. This way, the loader will not try to write into the BootServiceData memory any longer. Thank Marcel Moolenaar for helping me on this issue! The patch also allocates the staging area in the first 1GB memory. See the comment in the patch for this. Reviewed by: marcel, kib, sephe Approved by: sephe (mentor) Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D9686 r314770 loader.efi: fix recent UEFI-boot regression on physical machines This patch fixes my recent patch "loader.efi: reduce the size of the staging area if necessary", which causes EFI-boot failure on physical machines since Mar 2: on the host there is a 1MB LoaderData memory range, which splits the big Conventional Memory range into a small one (15MB) and a big one: the small one is too small to hold the staging area. We can actually use the LoaderData range safely, because when amd64_tramp -> efi_copy_finish() starts to run, we're almost at the very end of the efi loader code and we're going to "return" to the kernel entry, so we're pretty sure we won't access any loader data any more. For people who are interested in the details: please see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211746#c22 PS, some people also reported the regression happened to FreeBSD VM running on Bhyve in EFI mode. This patch should resolve it too, though I don't have such a setup to test. Reviewed by: sephe Approved by: sephe (mentor) Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D9904 r314828 loader.efi: fix an off-by-one bug in efi_verify_staging_size() Also remove the warning message: it may not be unusual to see the memory range containing 2MB is not of EfiConventionalMemory. Sponsored by: Microsoft r314891 loader.efi: finally fix the off-by-one bug in efi_verify_staging_size() r314828(loader.efi: fix an off-by-one bug in efi_verify_staging_size()) doesn't really fix the bug and this patch adds the missing part. It's a shame that I didn't make everything correct at the very beginning... Sponsored by: Microsoft r314956 loader.efi: only reduce the size of the staging area on Hyper-V Doing this on physical hosts turns out to be problematic, e.g. see comment 24 and 28 in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211746. To fix the real underlying issue correctly & thoroughly, IMO we need a relocatable kernel, but that would require a lot of complicated long term work: https://reviews.freebsd.org/D9686?id=25414#inline-56969 For now, let's only apply efi_verify_staging_size() to VMs running on Hyper-V, and restore the old behavior on physical machines since that has been working for people for a long period of time, though that's potentially unsafe... Sponsored by: Microsoft r314962 loader.efi: only include the machine/ header files on x86 The 2 files may not exist on other archs like aarch64 and hence we can have a build failure there. Reported by: lwhsu Sponsored by: Microsoft r315235 loader.efi: use stricter check for Hyper-V Some other hypervisors like Xen can pretend to be Hyper-V but obviously they can't implement all Hyper-V features. Let's make sure we're genuine Hyper-V here. Also fix some minor coding style issues. Sponsored by: Microsoft PR: 211746 Changes: _U stable/10/ stable/10/sys/boot/efi/loader/copy.c
Let's close the bug, since the fix is in HEAD, and has also been MFC'd to stable/10 and stable/11.