Created attachment 255299 [details] kdb backtrace Hello, FreeBSD 14.2 (Both the beta3 and the beta2 build) incurs in a kernel panic at boot and then fails to automatically reboot. This also affects FreeBSD 14.1-RELEASE. Tested when booting the install image from usb in both UEFI and legacy mode on a computer with the following hardware: asus prime X299-A motherboard (first revision, bios release number 4001), intel i9-7980XE cpu, no onboard graphics/igp, nvidia geforce rtx 3060 gpu Attached is a picture of the KDB backtrace
Does any version of FreeBSD boot successfully on this hardware? Are you able to boot if you enable "safe mode" from the loader menu?
(In reply to Mark Johnston from comment #1) Hello, I have never tried this hardware on FreeBSD before. I will try tomorrow to boot 14.2 in safe mode and FreeBSD 13, 12 and 11 too
Created attachment 255308 [details] freebsd 12 panic screen
Created attachment 255309 [details] freebsd 13 panic screen
Created attachment 255310 [details] freebsd 11 boot (no root dev)
Created attachment 255311 [details] freebsd 15 current kdb
Created attachment 255312 [details] netbsd10 boot
Created attachment 255313 [details] openbsd 7.6 boot
FreeBSD 12.0 to 14.2-beta3 all crash about in the same way (panic screens attached), both with default boot options and with the safe mode active. FreeBSD 11.0 does not panic, however it is unable to mount the root via usb, both when attaching the drive to a usb3 and when attaching it to an usb2 port. FreeBSD 15.0-CURRENT (20241115-79af8f72b3af-273651) crashes in a different way, but the kernel debugger prompt does not accept keyboard input so I'm unable to have it print more information As a side bit of information, both linux (debian 12 with kernel 6.1, *EL 9.4, more recent distributions like fedora 41 and ubuntu 24.10), OpenIndiana, NetBSD 10 and OpenBSD 7.6 boot successfully, but full ACPI support with graphical acceleration seems only to be available/working on linux at the time of writing
While FreeBSD 15-CURRENT does not allow me to interact with the debugger through the usb keyboard, it seems that the program counter value at which it fails is the same at which FreeBSD 14.2b3 panics: 0x4e6134ec. FreeBSD 15 prints out that the instruction at fault is movq %rcx,%rax. I don’t know if it is a fair assumption to think that FreeBSD 14.2 is running the same instruction at that problematic address, but on FreeBSD 14.2 the cpu registers are also printed out and the destination address in rax is set to 0000000000000000
The 15-CURRENT boot is with bootverbose enabled and in safe mode, I believe; could you please also try it with debug.verbose_sysinit=1 set from the loader prompt? I'm wondering which SYSINIT is triggering the problem. Does the 15-CURRENT kernel always crash at the same point?
(In reply to Mark Johnston from comment #11) Yes, it still crashes at the same point (seems to be always the same). with debug.verbose_sysinit=1 it reports: subsystem 3100000 configure_first(0)... done. module_register_init(&cam_moduledata)... done. fbd_evh_init(0)... done. configure(0)... [here it crashes] [ thread pid 0 tid 100000 ] Stopped at 0x4e6134ec: movq %rcx,%rax
(In reply to keivan from comment #12) I think we're crashing while probing ISA bus devices. Could you please try booting with hint.isa.0.disabled="1" hint.isab.0.disabled="1" set from the loader?
(In reply to Mark Johnston from comment #13) done, no effect in disabling ISA probing
(In reply to keivan from comment #14) It still crashes right after printing "configure(0)..."? Are you able to build a new kernel that can be booted on this system? If so, it would also be useful to try a GENERIC-KASAN kernel.
(In reply to Mark Johnston from comment #15) I can try. Which branch do you suggest to checkout for this build?
(In reply to keivan from comment #16) I would suggest trying "main", the default branch.
(In reply to Mark Johnston from comment #17) Ok, I will try to build a kernel with KASAN from main and then the memstick.img target (or can the newly built kernel just be replaced on the already written freebsd 15 install media?). Also I forgot to add, yes in the previous test it was still crashing on the same instruction/PC right after configure() even with: set debug.verbose_sysinit=1 set hint.isa.0.disabled="1" set hint.isab.0.disabled="1" or set debug.verbose_sysinit=1 set hint.isa.0.disabled=1 set hint.isab.0.disabled=1
Created attachment 255329 [details] FreeBSD15-CURRENT GENERIC-KASAN kernel
The FreeBSD 15-CURRENT kernel built with 'make buildkernel KERNCONF=GENERIC-KASAN' does not print any more warnings and it even panics on the same program counter address as before: 0x4e6134ec Maybe the bug is in asm platform code/not covered by KASAN?
(In reply to keivan from comment #20) no difference with a GENERIC-KMSAN build either, except for a warning about the WITNESS option being enabled
Created attachment 255330 [details] debug prints (In reply to keivan from comment #21) Ok, thanks for your patience thus far. I guess one of the driver identify routines is triggering the crash, somehow. I attached a small patch - could you try booting a GENERIC kernel built from main with that patch applied? It should tell us which driver is at fault.
Created attachment 255331 [details] FreeBSD15-CURRENT printf patch Here it is
Created attachment 255332 [details] more debug prints Ok, let's add some more printf()s. Please recompile the kernel with WITH_CLEAN=, i.e., "make buildkernel WITH_CLEAN=". Please also keep booting with debug.verbose_sysinit=1 set from the loader.
Created attachment 255333 [details] new freebsd15 debug output here is the new output, with set debug.verbose_sysinit=1 and the new patch
So perhaps there is something strange going on in the efirtc driver. Could you please remove the "device efirtc" line from GENERIC and try building a new kernel (again with WITH_CLEAN=)? I don't think setting hints.efirtc.0.disabled=1 will work, as that doesn't stop the driver from probing.
Hmm, there is a hack in efi_init() which looks related: 231 #if defined(__aarch64__) || defined(__amd64__) 232 /* 233 * Some UEFI implementations have multiple implementations of the 234 * RS->GetTime function. They switch from one we can only use early 235 * in the boot process to one valid as a RunTime service only when we 236 * call RS->SetVirtualAddressMap. As this is not always the case, e.g. 237 * with an old loader.efi, check if the RS->GetTime function is within 238 * the EFI map, and fail to attach if not. 239 */ 240 rtdm = (struct efi_rt *)efi_phys_to_kva((uintptr_t)efi_runtime); 241 if (rtdm == NULL || !efi_is_in_map(map, ndesc, efihdr->descriptor_size, 242 (vm_offset_t)rtdm->rt_gettime)) { 243 if (bootverbose) 244 printf( 245 "EFI runtime services table has an invalid pointer\n"); 246 efi_runtime = NULL; 247 efi_destroy_1t1_map(); 248 return (ENXIO); 249 } 250 #endif
It does boot successfully without "device efirtc"
Created attachment 255337 [details] poweroff crash without efirtc but it still crashes after issuing 'poweroff' from the shell, without the efirtc driver compiled
Try this https://reviews.freebsd.org/D47694
Created attachment 255354 [details] unrel mutex warnings The results of testing the D47694 patch are: -booting with the patch and efirtc compiled in the kernel: panic at 0x4e6134ec: movq %rcx,%rax during config(0) -booting with the patch and efirtc not compiled: the system boots. Issuing a shutdown causes a panic at 0x4e6134ec: movq %rcx,%rax pictures are not attached as the error is always the same as in the original reporting. Attached is another warning about a mutex not working as expected that shows up when booting without efirtc (not related to this report I think)
is there any step I can try to identify the offending line of code for instruction pointer 0x4e6134ec when successfully booting the system without efirtc?