Created attachment 245990 [details] Vultr FreeBSD 14.0-RC3 crash I noticed this while upgrading one of my Vultr VM from 13.2 to 14.0-RC3. Managed to repeat this with a new VM, and FreeBSD-14.0-RC3-amd64-bootonly.iso from FreeBSD's public ftp site. A text version of the page fault (copied from the screen shot, at my best): current process = 0 (swapper ) rdi: 0000000000000000 rsi: 0000000000000000 rdx: 0000000000000000 rcx: 0000000000000000 r8: 0000000000000000 r9: fffffe00463a1000 rax: 0000000000000000 rbx: fffff80003610900 rbp: ffffffff828b6e00 r10: 0000000000000000 r11: fffff8000302ed10 r12: fffffe0044f51000 r13: 0000000000000000 r14: 0000000000000000 r15: fffffe0044f51000 trap number = 12 panic: page fault cpuid = 0 time = 2 DB: stack backtrace: #0 0xffffffff80b9002d at kdb_backtrace+0x5d #1 0xffffffff80b43132 at vpanic+0x132 #2 0xffffffff80b42ff3 at panic+0x43 #3 0xffffffff8100c85c at trap_fatal +0x48c #4 0xffffffff8100cBaf at trap_pfault+0x4f #5 0xffffffff80fe3818 at calltrap+0x8 #6 0xffffffff80f89a5b at vmbus_intrhook+0x27b #7 0xffffffff80blafe1 at run_interrupt_driven_config_hooks+0xd1 #8 0xffffffff88670431 at boot_run_interrupt_driven_config_hooks+0x21 #9 0xffffffff80acc3c5 at mi_startup+0xb5 #10 0xffffffff80376023 at btext+0x23 Uptime: 25 Automatic reboot in 15 seconds - press a key on the console to abort
Hi, I looked at the reported faulting address in objdump/addr2line. It seems to be the result of a bad call to acpi_get_handle(), whose definition is expanded from line 280 of acpivar.h. Consider the following two lines in vmbus_doattach(), added in e7a9817b8d32 (Sept 2023): dev_res = devclass_get_device(devclass_find("vmbus_res"), 0); handle = acpi_get_handle(dev_res); There is no NULL check for dev_res, which means if the vmbus_res0 device is not found (attached), we will get a page fault in the following call to acpi_get_handle(). Now, _why_ vmbus_res0 can't be found, I cannot guess. It has similar attachment criteria to vmbus0. Strangely, my Vultr VM doesn't run on Hyper-V, instead the kern.vm_guest sysctl reports "kvm". So this is all I can do when it comes to testing/debugging. Let me tag the maintainers.
(In reply to Mitchell Horne from comment #1) Based on your analysis I initially thought Vultr incorrectly include Hyper-V devices in the VM's config. Later I found this on the affected VM: ``` # sysctl kern.vm_guest kern.vm_guest: hv ``` I can confirm that by dmesg log from verbose boot (FreeBSD 13.2). This is interesting. Not all hypervisors of Vultr are KVM.
Created attachment 246010 [details] dmesg of affected vm
Created attachment 246012 [details] Patch against releng/14.0 Good news, Based on Mitchell's analysis, I made this patch (against releng/14.0). Now the affected VM boots finely!
(In reply to Zhenlei Huang from comment #0) Can you please collect acpidump -dt output and share that here. As this code path should only get hit if the environment is Hyper-V based.
Most likely the system is on gen1 Hyper-V, but we can confirm after checking the acpidump -dt output.
Also please share the dmesg output, as in gen1 also we have vmbus_res0. From dmesg in gen1 VM in Azure: vmbus_res0: <Hyper-V Vmbus Resource> irq 5,7 on acpi0
in gen1 this is the dev tree acpi0 pcib0 vmbus0 hvet0 storvsc0 storvsc1 hvheartbeat0 hvkvp0 hvshutdown0 hvtimesync0 storvsc2 storvsc3 hvkbd0 hn0 pcib1 pci1 mlx5_core0 pci0 hostb0 isab0 isa0 orm0 vga0 atapci0 ata0 ata1 vgapci0 atdma0 attimer0 atrtc0 atkbdc0 atkbd0 psm0 psmcpnp0 fpupnp0 uart0 uart1 fdc0 fd0 acpi_sysresource0 acpi_sysresource1 vmbus_res0 and in gen2 nexus0 acpi0 acpi_syscontainer0 vmbus0 hvhid0 hidbus0 hms0 hvkbd0 hvheartbeat0 hvkvp0 hvshutdown0 hvtimesync0 hn0 storvsc0 storvsc1 pcib0 pci0 mlx5_core0 uart0 uart1 vmbus_res0 so both cases vmbus_res0 is present. As this is a pseudo device which has been made a child of acpi and owns the resources of vmbus.
Created attachment 246019 [details] acpidump from vultr VM (In reply to schakrabarti@microsoft.com from comment #6) > Most likely the system is on gen1 Hyper-V, but we can confirm after > checking the acpidump -dt output. See the attachment "acpidump from vultr VM"
(In reply to schakrabarti@microsoft.com from comment #7) > Also please share the dmesg output, as in gen1 also we have vmbus_res0. > From dmesg in gen1 VM in Azure: > vmbus_res0: <Hyper-V Vmbus Resource> irq 5,7 on acpi0 I've uploaded the dmesg. No vmbus_res devices from the dmesg output.
*** UPDATE *** To be clear, the regression happens only on Vultr VMs with custom ISO. A normal installation, i.e. select FreeBSD server image from the VM creating step, is not affected.
I have contacted Vultr and the system admin Albert has confirmed that they are using QEMU. I believe, for the custom ISO installations the Hyper-V is emulated [1]. Probably Hyper-V is not fully emulated hence the guest VM lacks vmbus_res devices. Then the patch can still apply to fix such a corner case. 1. https://fuchsia.googlesource.com/third_party/qemu/+/refs/tags/v7.0.0-rc0/docs/hyperv.txt
See also "Hyper-V Enlightenments" from QEMU document [2]. 2. https://www.qemu.org/docs/master/system/i386/hyperv.html
(In reply to Zhenlei Huang from comment #12) Oh interesting... "what could go wrong?" :D Anyway, it is good if the scope of the problem is limited to custom ISO installations only, but it is still undesirable. You can consider your patch 'Reviewed by: mhorne'. Let's see what Souradeep says, but if you can sneak the fix into 14.0-RELEASE that would be excellent. Otherwise it could be distributed as an Errata Notice after the fact.
I managed to repeat this with QEMU 7.2.5 on Debian 12.2.0 host. ``` # uname -a Linux debian 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux # qemu-system-x86_64 --version QEMU emulator version 7.2.5 (Debian 1:7.2+dfsg-7+deb12u2) Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers ``` A minimal script to repeat (be sure firstly load kvm / kvm_intel, AMD is not tested yet): ``` #!/bin/sh qemu-system-x86_64 \ -vnc 0.0.0.0:1,password=on \ -monitor stdio \ --enable-kvm \ --cpu host,hv-vpindex,hv-synic \ --smp 1 \ --m 512M \ --cdrom FreeBSD-14.0-RC3-amd64-bootonly.iso ``` The Vultr 's enabled feature flags should be equivalent to ``` --enable-kvm \ --cpu host,hv-relaxed,hv-vapic,hv-vpindex,hv-synic,hv-time,hv-stimer,hv-xmm-input ``` I've tested the patch with QEMU, it still works :)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=63bf943d4af17799cef21e2bb78dd28003ce1ce5 commit 63bf943d4af17799cef21e2bb78dd28003ce1ce5 Author: Zhenlei Huang <zlei@FreeBSD.org> AuthorDate: 2023-11-02 09:07:11 +0000 Commit: Zhenlei Huang <zlei@FreeBSD.org> CommitDate: 2023-11-02 09:07:11 +0000 Hyper-V: vmbus: Add NULL check for vmbus_res QEMU emulates Hyper-V [1] but lacks the emulation for vmbus_res, thus no coherence information is available. Add NULL check for it and fallback to no coherence. This will prevent FreeBSD guests from panic on QEMU with the Hyper-V enlightenment hv-synic enabled. For real Hyper-V, both gen1 and gen2 have vmbus_res then they are not affected by this change. 1. https://www.qemu.org/docs/master/system/i386/hyperv.html PR: 274810 Reviewed by: mhorne, emaste, delphij, whu Diagnosed by: mhorne Fixes: e7a9817b8d32 Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus Insta-MFC approved by: re (delphij) for 14.0-RC4 Differential Revision: https://reviews.freebsd.org/D42414 sys/dev/hyperv/vmbus/vmbus.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1969d82fcf62f80c2047a53b42f501680b140b0d commit 1969d82fcf62f80c2047a53b42f501680b140b0d Author: Zhenlei Huang <zlei@FreeBSD.org> AuthorDate: 2023-11-02 09:07:11 +0000 Commit: Zhenlei Huang <zlei@FreeBSD.org> CommitDate: 2023-11-02 09:10:03 +0000 Hyper-V: vmbus: Add NULL check for vmbus_res QEMU emulates Hyper-V [1] but lacks the emulation for vmbus_res, thus no coherence information is available. Add NULL check for it and fallback to no coherence. This will prevent FreeBSD guests from panic on QEMU with the Hyper-V enlightenment hv-synic enabled. For real Hyper-V, both gen1 and gen2 have vmbus_res then they are not affected by this change. 1. https://www.qemu.org/docs/master/system/i386/hyperv.html PR: 274810 Reviewed by: mhorne, emaste, delphij, whu Diagnosed by: mhorne Fixes: e7a9817b8d32 Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus Insta-MFC approved by: re (delphij) for 14.0-RC4 Differential Revision: https://reviews.freebsd.org/D42414 (cherry picked from commit 63bf943d4af17799cef21e2bb78dd28003ce1ce5) sys/dev/hyperv/vmbus/vmbus.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=52dbe7401fba923bc18124190029e65b491a756e commit 52dbe7401fba923bc18124190029e65b491a756e Author: Zhenlei Huang <zlei@FreeBSD.org> AuthorDate: 2023-11-02 09:07:11 +0000 Commit: Zhenlei Huang <zlei@FreeBSD.org> CommitDate: 2023-11-02 09:13:18 +0000 Hyper-V: vmbus: Add NULL check for vmbus_res QEMU emulates Hyper-V [1] but lacks the emulation for vmbus_res, thus no coherence information is available. Add NULL check for it and fallback to no coherence. This will prevent FreeBSD guests from panic on QEMU with the Hyper-V enlightenment hv-synic enabled. For real Hyper-V, both gen1 and gen2 have vmbus_res then they are not affected by this change. 1. https://www.qemu.org/docs/master/system/i386/hyperv.html PR: 274810 Reviewed by: mhorne, emaste, delphij, whu Approved by: re (gjb) Diagnosed by: mhorne Fixes: e7a9817b8d32 Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus Insta-MFC approved by: re (delphij) for 14.0-RC4 Differential Revision: https://reviews.freebsd.org/D42414 (cherry picked from commit 63bf943d4af17799cef21e2bb78dd28003ce1ce5) (cherry picked from commit 1969d82fcf62f80c2047a53b42f501680b140b0d) sys/dev/hyperv/vmbus/vmbus.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
The fix will be in 14.0-RC4.