Created attachment 255299 [details] kdb backtrace Hello, FreeBSD 14.2 (Both the beta3 and the beta2 build) incurs in a kernel panic at boot and then fails to automatically reboot. This also affects FreeBSD 14.1-RELEASE. Tested when booting the install image from usb in both UEFI and legacy mode on a computer with the following hardware: asus prime X299-A motherboard (first revision, bios release number 4001), intel i9-7980XE cpu, no onboard graphics/igp, nvidia geforce rtx 3060 gpu Attached is a picture of the KDB backtrace
Does any version of FreeBSD boot successfully on this hardware? Are you able to boot if you enable "safe mode" from the loader menu?
(In reply to Mark Johnston from comment #1) Hello, I have never tried this hardware on FreeBSD before. I will try tomorrow to boot 14.2 in safe mode and FreeBSD 13, 12 and 11 too
Created attachment 255308 [details] freebsd 12 panic screen
Created attachment 255309 [details] freebsd 13 panic screen
Created attachment 255310 [details] freebsd 11 boot (no root dev)
Created attachment 255311 [details] freebsd 15 current kdb
Created attachment 255312 [details] netbsd10 boot
Created attachment 255313 [details] openbsd 7.6 boot
FreeBSD 12.0 to 14.2-beta3 all crash about in the same way (panic screens attached), both with default boot options and with the safe mode active. FreeBSD 11.0 does not panic, however it is unable to mount the root via usb, both when attaching the drive to a usb3 and when attaching it to an usb2 port. FreeBSD 15.0-CURRENT (20241115-79af8f72b3af-273651) crashes in a different way, but the kernel debugger prompt does not accept keyboard input so I'm unable to have it print more information As a side bit of information, both linux (debian 12 with kernel 6.1, *EL 9.4, more recent distributions like fedora 41 and ubuntu 24.10), OpenIndiana, NetBSD 10 and OpenBSD 7.6 boot successfully, but full ACPI support with graphical acceleration seems only to be available/working on linux at the time of writing
While FreeBSD 15-CURRENT does not allow me to interact with the debugger through the usb keyboard, it seems that the program counter value at which it fails is the same at which FreeBSD 14.2b3 panics: 0x4e6134ec. FreeBSD 15 prints out that the instruction at fault is movq %rcx,%rax. I don’t know if it is a fair assumption to think that FreeBSD 14.2 is running the same instruction at that problematic address, but on FreeBSD 14.2 the cpu registers are also printed out and the destination address in rax is set to 0000000000000000
The 15-CURRENT boot is with bootverbose enabled and in safe mode, I believe; could you please also try it with debug.verbose_sysinit=1 set from the loader prompt? I'm wondering which SYSINIT is triggering the problem. Does the 15-CURRENT kernel always crash at the same point?
(In reply to Mark Johnston from comment #11) Yes, it still crashes at the same point (seems to be always the same). with debug.verbose_sysinit=1 it reports: subsystem 3100000 configure_first(0)... done. module_register_init(&cam_moduledata)... done. fbd_evh_init(0)... done. configure(0)... [here it crashes] [ thread pid 0 tid 100000 ] Stopped at 0x4e6134ec: movq %rcx,%rax
(In reply to keivan from comment #12) I think we're crashing while probing ISA bus devices. Could you please try booting with hint.isa.0.disabled="1" hint.isab.0.disabled="1" set from the loader?
(In reply to Mark Johnston from comment #13) done, no effect in disabling ISA probing
(In reply to keivan from comment #14) It still crashes right after printing "configure(0)..."? Are you able to build a new kernel that can be booted on this system? If so, it would also be useful to try a GENERIC-KASAN kernel.
(In reply to Mark Johnston from comment #15) I can try. Which branch do you suggest to checkout for this build?
(In reply to keivan from comment #16) I would suggest trying "main", the default branch.
(In reply to Mark Johnston from comment #17) Ok, I will try to build a kernel with KASAN from main and then the memstick.img target (or can the newly built kernel just be replaced on the already written freebsd 15 install media?). Also I forgot to add, yes in the previous test it was still crashing on the same instruction/PC right after configure() even with: set debug.verbose_sysinit=1 set hint.isa.0.disabled="1" set hint.isab.0.disabled="1" or set debug.verbose_sysinit=1 set hint.isa.0.disabled=1 set hint.isab.0.disabled=1
Created attachment 255329 [details] FreeBSD15-CURRENT GENERIC-KASAN kernel
The FreeBSD 15-CURRENT kernel built with 'make buildkernel KERNCONF=GENERIC-KASAN' does not print any more warnings and it even panics on the same program counter address as before: 0x4e6134ec Maybe the bug is in asm platform code/not covered by KASAN?
(In reply to keivan from comment #20) no difference with a GENERIC-KMSAN build either, except for a warning about the WITNESS option being enabled
Created attachment 255330 [details] debug prints (In reply to keivan from comment #21) Ok, thanks for your patience thus far. I guess one of the driver identify routines is triggering the crash, somehow. I attached a small patch - could you try booting a GENERIC kernel built from main with that patch applied? It should tell us which driver is at fault.
Created attachment 255331 [details] FreeBSD15-CURRENT printf patch Here it is
Created attachment 255332 [details] more debug prints Ok, let's add some more printf()s. Please recompile the kernel with WITH_CLEAN=, i.e., "make buildkernel WITH_CLEAN=". Please also keep booting with debug.verbose_sysinit=1 set from the loader.
Created attachment 255333 [details] new freebsd15 debug output here is the new output, with set debug.verbose_sysinit=1 and the new patch
So perhaps there is something strange going on in the efirtc driver. Could you please remove the "device efirtc" line from GENERIC and try building a new kernel (again with WITH_CLEAN=)? I don't think setting hints.efirtc.0.disabled=1 will work, as that doesn't stop the driver from probing.
Hmm, there is a hack in efi_init() which looks related: 231 #if defined(__aarch64__) || defined(__amd64__) 232 /* 233 * Some UEFI implementations have multiple implementations of the 234 * RS->GetTime function. They switch from one we can only use early 235 * in the boot process to one valid as a RunTime service only when we 236 * call RS->SetVirtualAddressMap. As this is not always the case, e.g. 237 * with an old loader.efi, check if the RS->GetTime function is within 238 * the EFI map, and fail to attach if not. 239 */ 240 rtdm = (struct efi_rt *)efi_phys_to_kva((uintptr_t)efi_runtime); 241 if (rtdm == NULL || !efi_is_in_map(map, ndesc, efihdr->descriptor_size, 242 (vm_offset_t)rtdm->rt_gettime)) { 243 if (bootverbose) 244 printf( 245 "EFI runtime services table has an invalid pointer\n"); 246 efi_runtime = NULL; 247 efi_destroy_1t1_map(); 248 return (ENXIO); 249 } 250 #endif
It does boot successfully without "device efirtc"
Created attachment 255337 [details] poweroff crash without efirtc but it still crashes after issuing 'poweroff' from the shell, without the efirtc driver compiled
Try this https://reviews.freebsd.org/D47694
Created attachment 255354 [details] unrel mutex warnings The results of testing the D47694 patch are: -booting with the patch and efirtc compiled in the kernel: panic at 0x4e6134ec: movq %rcx,%rax during config(0) -booting with the patch and efirtc not compiled: the system boots. Issuing a shutdown causes a panic at 0x4e6134ec: movq %rcx,%rax pictures are not attached as the error is always the same as in the original reporting. Attached is another warning about a mutex not working as expected that shows up when booting without efirtc (not related to this report I think)
is there any step I can try to identify the offending line of code for instruction pointer 0x4e6134ec when successfully booting the system without efirtc?
(In reply to keivan from comment #32) The source code for the faulting instruction is stored somewhere at the BIOS vendor office. It is EFI RT call into BIOS that causing the #BP exception during execution. I updated the patch in the review with some debugging info, which should allow me to understand why the check did not detected the exception occurring at the RT call. I need the panic info with the patch applied.
Created attachment 255357 [details] FreeBSD15-CURRENT debug BP pflags output Attached is the new debug output
I wonder if Linux's efirt time works properly on this system? That is, is there a rtc-efi clock visible from procfs?
(In reply to Mark Johnston from comment #35) Debian 12 (linux 6.1) seems to be using the rtc-cmos driver instead of efi-rtc ls -l /dev/rtc0 crw------- 1 root root 251, 0 21 nov 13.00 /dev/rtc0 cat /sys/dev/char/251:0/name rtc_cmos rtc_cmos ======================= cat /proc/driver/rtc rtc_time : 15:47:28 rtc_date : 2024-11-21 alrm_time : 21:37:57 alrm_date : 2024-11-21 alarm_IRQ : no alrm_pending : no update IRQ enabled : no periodic IRQ enabled : no periodic IRQ frequency : 1024 max user IRQ frequency : 64 24hr : yes periodic_IRQ : no update_IRQ : no HPET_emulated : no BCD : yes DST_enable : no periodic_freq : 1024 batt_status : okay ======================== dmesg | grep -i rtc [ 0.579202] platform rtc_cmos: registered platform RTC device (no PNP device found) [ 0.931525] rtc_cmos rtc_cmos: RTC can wake from S4 [ 0.932252] rtc_cmos rtc_cmos: registered as rtc0 [ 0.932380] rtc_cmos rtc_cmos: setting system clock to 2024-11-21T11:57:52 UTC (1732190272) [ 0.932401] rtc_cmos rtc_cmos: alarms up to one month, y3k, 114 bytes nvram ======================== lsmod | grep efi efi_pstore 16384 0 efivarfs 24576 1
(In reply to keivan from comment #36) Does the Linux dmesg show anything relating to EFI at all? Are there any BIOS updates available for this motherboard?
(In reply to Mark Johnston from comment #37) Hello, the motherboard is running the last bios update (that was the first thing I updated when having problems some weeks ago once I got hold of this system) efi related output in linux: dmesg | grep -i efi [ 0.000000] efi: EFI v2.70 by American Megatrends [ 0.000000] efi: TPMFinalLog=0x4d1fd000 ACPI=0x4c14e000 ACPI 2.0=0x4c14e014 SMBIOS=0x4e393000 SMBIOS 3.0=0x4e392000 MEMATTR=0x4718b018 ESRT=0x47e89298 MOKvar=0x4e3c1000 [ 0.012687] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns [ 0.540481] pci 0000:65:00.0: BAR 1: assigned to efifb [ 0.543068] Registered efivars operations [ 0.936432] efifb: probing for efifb [ 0.936448] efifb: framebuffer at 0xc0000000, using 3072k, total 3072k [ 0.936450] efifb: mode is 1024x768x32, linelength=4096, pages=1 [ 0.936452] efifb: scrolling: redraw [ 0.936452] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0 [ 0.937084] fb0: EFI VGA frame buffer device [ 1.026018] integrity: Loading X.509 certificate: UEFI:db [ 1.026442] integrity: Loading X.509 certificate: UEFI:db [ 1.026837] integrity: Loading X.509 certificate: UEFI:db [ 1.026883] integrity: Loaded X.509 cert 'Microsoft Corporation UEFI CA 2011: 13adbf4309bd82709c8cd54f316ed522988a1bd4' [ 1.026886] integrity: Loading X.509 certificate: UEFI:db [ 1.026925] integrity: Loading X.509 certificate: UEFI:db [ 1.027336] integrity: Loading X.509 certificate: UEFI:db [ 1.027371] integrity: Loaded X.509 cert 'Microsoft UEFI CA 2023: 81aa6b3244c935bce0d6628af39827421e32497d' [ 1.027374] integrity: Loading X.509 certificate: UEFI:db [ 1.027411] integrity: Loaded X.509 cert 'Microsoft Corporation: Windows UEFI CA 2023: aefc5fbbbe055d8f8daa585473499417ab5a5272' [ 1.604755] tsc: Refined TSC clocksource calibration: 2592.008 MHz [ 145.017347] systemd[1]: Starting modprobe@efi_pstore.service - Load Kernel Module efi_pstore... [ 145.028456] pstore: Registered efi as persistent store backend [ 145.029412] systemd[1]: modprobe@efi_pstore.service: Deactivated successfully. [ 145.029525] systemd[1]: Finished modprobe@efi_pstore.service - Load Kernel Module efi_pstore.
Try the updated patch in the review.
Created attachment 255361 [details] with the last revision of the patch the system boots With the last revision of the patch to the trap handler, the system with the efirtc driver compiled in, boots up successfully into the installer. It seems it is not using the rtc timer via EFI in the end, but it doesn't panic anymore. The kernel panic on shutdown is not happening any more, too, with the system shutting down normally
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e6ec41fa86d88f80bd663e55455a6844619a9b24 commit e6ec41fa86d88f80bd663e55455a6844619a9b24 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2024-11-21 04:57:58 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2024-11-21 22:05:28 +0000 amd64 efi rt: handle #BP PR: 282860 Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D47694 sys/amd64/amd64/trap.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
Thank you for your support! I have tested the same patched kernel/installer image on some other systems which exhibited no boot problems before and the preliminary trap handler patch does not break them. Once the patch is finalized I'm willing to test it again. Also, the last memory I have of FreeBSD correctly resuming from S3 on some other computer I had is from around FreeBSD 8. On this system, even with the trap handler patch, it does not reactivate the video output after resuming from S3 with the vesa driver; hw.acpi.reset_video=1 before suspending and kdload vesa.ko after resuming have no effect. The same happens on another ryzen pc I tested and on my intel 12th gen laptop. Since the system this bug report is about has a handy COM/serial socket on the motherboard I can possibly connect to, I'm willing in the future to troubleshoot the ACPI S3 problems as well, with some guidance
Hello, I cloned today from main and it still works, without applying patches. I think you can close this bug report. As a side note, I also got the system to correctly resume from S3, but it did not work on any of my systems with the vt_vga driver. All the computers I tested were resuming with the screen off, but ssh working. An intel laptop running in text mode got S3 resume fixed by using drm-kmod/i915kms. The system of this bug report also got working ACPI S3 resume by loading the nvidia-drm driver (either when working in a console or suspending in X11), or the nvidia-modeset driver, but only from X11 or when X11 was active somewhere. So maybe there's something off for some moderately modern hardware in vt_vga or vt. (Probably for a new bug report)
(In reply to keivan from comment #42) Thanks for helping us narrow down the problem. (In reply to keivan from comment #43) Yes, this ought to be a new bug report. I believe there are some known problems with vt and suspend-to-S3 but I don't know the details.
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5f5b47e37416df46b52e0d43b94d9c2f37d15397 commit 5f5b47e37416df46b52e0d43b94d9c2f37d15397 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2024-11-21 04:57:58 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2024-11-28 12:53:17 +0000 amd64 efi rt: handle #BP PR: 282860 (cherry picked from commit e6ec41fa86d88f80bd663e55455a6844619a9b24) sys/amd64/amd64/trap.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)