Bug 282860 - Kernel panic at boot on intel i9-7980XE / asus prime x299-A rev1
Summary: Kernel panic at boot on intel i9-7980XE / asus prime x299-A rev1
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.2-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2024-11-19 11:23 UTC by keivan
Modified: 2024-11-21 09:14 UTC (History)
3 users (show)

See Also:


Attachments
kdb backtrace (282.58 KB, image/jpeg)
2024-11-19 11:23 UTC, keivan
no flags Details
freebsd 12 panic screen (159.38 KB, image/jpeg)
2024-11-20 00:59 UTC, keivan
no flags Details
freebsd 13 panic screen (163.08 KB, image/jpeg)
2024-11-20 01:00 UTC, keivan
no flags Details
freebsd 11 boot (no root dev) (147.79 KB, image/jpeg)
2024-11-20 01:01 UTC, keivan
no flags Details
freebsd 15 current kdb (202.34 KB, image/jpeg)
2024-11-20 01:01 UTC, keivan
no flags Details
netbsd10 boot (101.48 KB, image/jpeg)
2024-11-20 01:02 UTC, keivan
no flags Details
openbsd 7.6 boot (122.69 KB, image/jpeg)
2024-11-20 01:02 UTC, keivan
no flags Details
FreeBSD15-CURRENT GENERIC-KASAN kernel (177.50 KB, image/jpeg)
2024-11-20 19:19 UTC, keivan
no flags Details
debug prints (423 bytes, patch)
2024-11-20 20:23 UTC, Mark Johnston
no flags Details | Diff
FreeBSD15-CURRENT printf patch (140.72 KB, image/jpeg)
2024-11-20 20:40 UTC, keivan
no flags Details
more debug prints (2.73 KB, patch)
2024-11-20 21:22 UTC, Mark Johnston
no flags Details | Diff
new freebsd15 debug output (316.88 KB, image/jpeg)
2024-11-20 21:42 UTC, keivan
no flags Details
poweroff crash without efirtc (299.36 KB, image/jpeg)
2024-11-21 00:12 UTC, keivan
no flags Details
unrel mutex warnings (186.07 KB, image/jpeg)
2024-11-21 09:07 UTC, keivan
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description keivan 2024-11-19 11:23:32 UTC
Created attachment 255299 [details]
kdb backtrace

Hello,
FreeBSD 14.2 (Both the beta3 and the beta2 build) incurs in a kernel panic at boot and then fails to automatically reboot. This also affects FreeBSD 14.1-RELEASE.

Tested when booting the install image from usb in both UEFI and legacy mode on a computer with the following hardware:

asus prime X299-A motherboard (first revision, bios release number 4001), intel i9-7980XE cpu, no onboard graphics/igp, nvidia geforce rtx 3060 gpu

Attached is a picture of the KDB backtrace
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2024-11-19 15:16:31 UTC
Does any version of FreeBSD boot successfully on this hardware?

Are you able to boot if you enable "safe mode" from the loader menu?
Comment 2 keivan 2024-11-19 15:57:16 UTC
(In reply to Mark Johnston from comment #1)
Hello, 
I have never tried this hardware on FreeBSD before.
I will try tomorrow to boot 14.2 in safe mode and FreeBSD 13, 12 and 11 too
Comment 3 keivan 2024-11-20 00:59:45 UTC
Created attachment 255308 [details]
freebsd 12 panic screen
Comment 4 keivan 2024-11-20 01:00:45 UTC
Created attachment 255309 [details]
freebsd 13 panic screen
Comment 5 keivan 2024-11-20 01:01:28 UTC
Created attachment 255310 [details]
freebsd 11 boot (no root dev)
Comment 6 keivan 2024-11-20 01:01:58 UTC
Created attachment 255311 [details]
freebsd 15 current kdb
Comment 7 keivan 2024-11-20 01:02:19 UTC
Created attachment 255312 [details]
netbsd10 boot
Comment 8 keivan 2024-11-20 01:02:43 UTC
Created attachment 255313 [details]
openbsd 7.6 boot
Comment 9 keivan 2024-11-20 01:03:28 UTC
FreeBSD 12.0 to 14.2-beta3 all crash about in the same way (panic screens attached), both with default boot options and with the safe mode active.

FreeBSD 11.0 does not panic, however it is unable to mount the root via usb, both when attaching the drive to a usb3 and when attaching it to an usb2 port.

FreeBSD 15.0-CURRENT (20241115-79af8f72b3af-273651) crashes in a different way, but the kernel debugger prompt does not accept keyboard input so I'm unable to have it print more information

As a side bit of information, both linux (debian 12 with kernel 6.1, *EL 9.4, more recent distributions like fedora 41 and ubuntu 24.10), OpenIndiana, NetBSD 10 and OpenBSD 7.6 boot successfully, but full ACPI support with graphical acceleration seems only to be available/working on linux at the time of writing
Comment 10 keivan 2024-11-20 10:52:15 UTC
While FreeBSD 15-CURRENT does not allow me to interact with the debugger through the usb keyboard, it seems that the program counter value at which it fails is the same at which FreeBSD 14.2b3 panics: 0x4e6134ec.
FreeBSD 15 prints out that the instruction at fault is movq %rcx,%rax.

I don’t know if it is a fair assumption to think that FreeBSD 14.2 is running the same instruction at that problematic address, but on FreeBSD 14.2 the cpu registers are also printed out and the destination address in rax is set to 0000000000000000
Comment 11 Mark Johnston freebsd_committer freebsd_triage 2024-11-20 16:02:01 UTC
The 15-CURRENT boot is with bootverbose enabled and in safe mode, I believe; could you please also try it with debug.verbose_sysinit=1 set from the loader prompt?  I'm wondering which SYSINIT is triggering the problem.

Does the 15-CURRENT kernel always crash at the same point?
Comment 12 keivan 2024-11-20 16:20:42 UTC
(In reply to Mark Johnston from comment #11)
Yes, it still crashes at the same point (seems to be always the same).

with debug.verbose_sysinit=1 it reports:

subsystem 3100000
configure_first(0)... done.
module_register_init(&cam_moduledata)... done.
fbd_evh_init(0)... done.
configure(0)... [here it crashes]
[ thread pid 0 tid 100000 ]
Stopped at 0x4e6134ec: movq %rcx,%rax
Comment 13 Mark Johnston freebsd_committer freebsd_triage 2024-11-20 16:38:08 UTC
(In reply to keivan from comment #12)
I think we're crashing while probing ISA bus devices.

Could you please try booting with

hint.isa.0.disabled="1"
hint.isab.0.disabled="1"

set from the loader?
Comment 14 keivan 2024-11-20 16:57:50 UTC
(In reply to Mark Johnston from comment #13)
done, no effect in disabling ISA probing
Comment 15 Mark Johnston freebsd_committer freebsd_triage 2024-11-20 17:05:58 UTC
(In reply to keivan from comment #14)
It still crashes right after printing "configure(0)..."?

Are you able to build a new kernel that can be booted on this system?  If so, it would also be useful to try a GENERIC-KASAN kernel.
Comment 16 keivan 2024-11-20 17:10:59 UTC
(In reply to Mark Johnston from comment #15)
I can try. Which branch do you suggest to checkout for this build?
Comment 17 Mark Johnston freebsd_committer freebsd_triage 2024-11-20 17:11:38 UTC
(In reply to keivan from comment #16)
I would suggest trying "main", the default branch.
Comment 18 keivan 2024-11-20 17:22:17 UTC
(In reply to Mark Johnston from comment #17)
Ok, I will try to build a kernel with KASAN from main and then the memstick.img target (or can the newly built kernel just be replaced on the already written freebsd 15 install media?).


Also I forgot to add, yes in the previous test it was still crashing on the same instruction/PC right after configure() even with:

set debug.verbose_sysinit=1
set hint.isa.0.disabled="1"
set hint.isab.0.disabled="1"

or 

set debug.verbose_sysinit=1
set hint.isa.0.disabled=1
set hint.isab.0.disabled=1
Comment 19 keivan 2024-11-20 19:19:02 UTC
Created attachment 255329 [details]
FreeBSD15-CURRENT GENERIC-KASAN kernel
Comment 20 keivan 2024-11-20 19:24:08 UTC
The FreeBSD 15-CURRENT kernel built with 'make buildkernel KERNCONF=GENERIC-KASAN' does not print any more warnings and it even panics on the same program counter address as before: 0x4e6134ec
Maybe the bug is in asm platform code/not covered by KASAN?
Comment 21 keivan 2024-11-20 20:02:17 UTC
(In reply to keivan from comment #20)
no difference with a GENERIC-KMSAN build either, except for a warning about the WITNESS option being enabled
Comment 22 Mark Johnston freebsd_committer freebsd_triage 2024-11-20 20:23:21 UTC
Created attachment 255330 [details]
debug prints

(In reply to keivan from comment #21)
Ok, thanks for your patience thus far.

I guess one of the driver identify routines is triggering the crash, somehow.  I attached a small patch - could you try booting a GENERIC kernel built from main with that patch applied?  It should tell us which driver is at fault.
Comment 23 keivan 2024-11-20 20:40:42 UTC
Created attachment 255331 [details]
FreeBSD15-CURRENT printf patch

Here it is
Comment 24 Mark Johnston freebsd_committer freebsd_triage 2024-11-20 21:22:16 UTC
Created attachment 255332 [details]
more debug prints

Ok, let's add some more printf()s.

Please recompile the kernel with WITH_CLEAN=, i.e., "make buildkernel WITH_CLEAN=".

Please also keep booting with debug.verbose_sysinit=1 set from the loader.
Comment 25 keivan 2024-11-20 21:42:23 UTC
Created attachment 255333 [details]
new freebsd15 debug output

here is the new output, with set debug.verbose_sysinit=1 and the new patch
Comment 26 Mark Johnston freebsd_committer freebsd_triage 2024-11-20 21:50:27 UTC
So perhaps there is something strange going on in the efirtc driver.  Could you please remove the "device efirtc" line from GENERIC and try building a new kernel (again with WITH_CLEAN=)?

I don't think setting hints.efirtc.0.disabled=1 will work, as that doesn't stop the driver from probing.
Comment 27 Mark Johnston freebsd_committer freebsd_triage 2024-11-20 21:52:19 UTC
Hmm, there is a hack in efi_init() which looks related:

231 #if defined(__aarch64__) || defined(__amd64__)                                                                                                                                                                                                                                                                            
232         /*                                                                                                                                                                                                                                                                                                                
233          * Some UEFI implementations have multiple implementations of the                                                                                                                                                                                                                                                 
234          * RS->GetTime function. They switch from one we can only use early                                                                                                                                                                                                                                               
235          * in the boot process to one valid as a RunTime service only when we                                                                                                                                                                                                                                             
236          * call RS->SetVirtualAddressMap. As this is not always the case, e.g.                                                                                                                                                                                                                                            
237          * with an old loader.efi, check if the RS->GetTime function is within                                                                                                                                                                                                                                            
238          * the EFI map, and fail to attach if not.                                                                                                                                                                                                                                                                        
239          */                                                                                                                                                                                                                                                                                                               
240         rtdm = (struct efi_rt *)efi_phys_to_kva((uintptr_t)efi_runtime);                                                                                                                                                                                                                                                  
241         if (rtdm == NULL || !efi_is_in_map(map, ndesc, efihdr->descriptor_size,                                                                                                                                                                                                                                           
242             (vm_offset_t)rtdm->rt_gettime)) {                                                                                                                                                                                                                                                                             
243                 if (bootverbose)                                                                                                                                                                                                                                                                                          
244                         printf(                                                                                                                                                                                                                                                                                           
245                          "EFI runtime services table has an invalid pointer\n");                                                                                                                                                                                                                                          
246                 efi_runtime = NULL;                                                                                                                                                                                                                                                                                       
247                 efi_destroy_1t1_map();                                                                                                                                                                                                                                                                                    
248                 return (ENXIO);                                                                                                                                                                                                                                                                                           
249         }                                                                                                                                                                                                                                                                                                                 
250 #endif
Comment 28 keivan 2024-11-20 22:03:18 UTC
It does boot successfully without "device efirtc"
Comment 29 keivan 2024-11-21 00:12:30 UTC
Created attachment 255337 [details]
poweroff crash without efirtc

but it still crashes after issuing 'poweroff' from the shell, without the efirtc driver compiled
Comment 30 Konstantin Belousov freebsd_committer freebsd_triage 2024-11-21 05:02:17 UTC
Try this https://reviews.freebsd.org/D47694
Comment 31 keivan 2024-11-21 09:07:19 UTC
Created attachment 255354 [details]
unrel mutex warnings

The results of testing the D47694 patch are:

-booting with the patch and efirtc compiled in the kernel: panic at 0x4e6134ec: movq %rcx,%rax during config(0)

-booting with the patch and efirtc not compiled: the system boots. Issuing a shutdown causes a panic at 0x4e6134ec: movq %rcx,%rax


pictures are not attached as the error is always the same as in the original reporting. Attached is another warning about a mutex not working as expected that shows up when booting without efirtc (not related to this report I think)
Comment 32 keivan 2024-11-21 09:14:55 UTC
is there any step I can try to identify the offending line of code for instruction pointer 0x4e6134ec when successfully booting the system without efirtc?