Bug 273151 - Kernel panic caused by audio driver on cold boot - AMD Ryzen 9 7900
Summary: Kernel panic caused by audio driver on cold boot - AMD Ryzen 9 7900
Status: Closed DUPLICATE of bug 268393
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.2-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-15 19:10 UTC by Alex Matei
Modified: 2023-08-30 17:07 UTC (History)
2 users (show)

See Also:


Attachments
Shows the DELAY lines inserted in trap.c (1.98 KB, text/plain)
2023-08-15 19:10 UTC, Alex Matei
no flags Details
See the Description (11.74 KB, text/plain)
2023-08-15 19:12 UTC, Alex Matei
no flags Details
See the Description (11.42 KB, text/plain)
2023-08-15 19:13 UTC, Alex Matei
no flags Details
See the Description (60.37 KB, image/jpeg)
2023-08-15 19:15 UTC, Alex Matei
no flags Details
core.txt.6 (78.23 KB, text/plain)
2023-08-20 18:33 UTC, Alex Matei
no flags Details
info.6 (365 bytes, text/plain)
2023-08-20 18:37 UTC, Alex Matei
no flags Details
bounds (2 bytes, text/plain)
2023-08-20 18:38 UTC, Alex Matei
no flags Details
screen messages for the kernel panic (102.87 KB, image/jpeg)
2023-08-20 18:39 UTC, Alex Matei
no flags Details
error when trying to attach vmcore.6.gz (74.97 KB, image/jpeg)
2023-08-20 19:06 UTC, Alex Matei
no flags Details
kgdb information (4.63 KB, text/plain)
2023-08-20 20:54 UTC, Alex Matei
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Matei 2023-08-15 19:10:04 UTC
Created attachment 244128 [details]
Shows the DELAY lines inserted in trap.c

Attached are the following files:
dmesg.boot_with_audio_driver
dmesg.boot_without_audio_driver
kernel_panic.MOV
kernel_panic_screen_messages.jpg
kernel_panic_with_DELAY.MOV
kernel_without_audio_driver.MOV
trap.c_diff

Desktop components:
cpu: AMD Ryzen 9 7900
motherboard: Gigabyte B650 Aorus Elite AX
memory: Corsair 32 GB (2x16GB)
storage: SSD Gigabyte 1 TB

ZFS is being used with 2GB swap.

Fot this panic, the kernel crash dump is not created in /var/crash.
I can simulate a kernel crash with sysctl debug.kdb.panic and
the dump is created in /var/crash.

Replacing the Corsair memory with G.Skill and installing FreeBSD
on a HDD instead of SDD does not make a difference.

When cold booting, the kernel panics as shown in kernel_panic.MOV.
After the panic, the system reboots and it's ok.

The call stack in /usr/src/sys/amd64/amd64/trap.c:
trap_fatal()   called from line 795
trap_pfault()  called from line 385
trap()         called from line 665
trap_check()   called from /usr/src/sys/amd64/amd64/exception.S, line 290

I changed the file /usr/src/sys/amd64/amd64/trap.c as shown in
trap.c_diff in order to get a better view of the screen messages as
you can see in kernel_panic_with_DELAY.MOV.

The screen messages can also be seen in the file kernel_panic_screen_messages.jp
g and are also shown below:
...
acpi_tz0: <Thermal Zone> on acpi0
cpu0: <ACPI CPU> on acpi0
hwpstate0: <Cool'n'Quiet 2.0> on cpu0
Timecounter "TSC-low" frequency 1846531959 Hz quality 1000

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80972392
stack pointer           = 0x20:0xfffffe01072d5de0
frame pointer           = 0x20:0xfffffe01072d5e00
code segment            = base rx0, limit 0xfffff, type 0x1b
                          DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq78: hdac1) <------------- AAAAA
trap number             = 12


TRAPFRAME:
   tf_rdi = -8796050178040
   tf_rsi = 0
   tf_rdx = 1
   tf_rcx = -8796054324096
   tf_r8  = -2194564528880
   tf_r9  = -2194607874048
   tf_rax = 1
   tf_rbx = -8796050178048
   tf_rbp = -2194607874560
   tf_r10 = 2000
   tf_r11 = 2146883647
   tf_r12 = 233
   tf_r13 = -8796050178048
   tf_r14 = 0
   tf_r15 = 0
   tf_trapno = 12
   tf_fs = 19
   tf_gs = 40
   tf_addr = 8
   tf_flags = 1
   tf_es = 59
   tf_ds = 59
   tf_err = 0
   tf_rip = -2137578606
   tf_cs = 32
   tf_rflags = 66066
   tf_rsp = -2194607874592
   tf_ss = 40

Workaround:
based on the line marked with "AAAAA" above and dmesg.boot_with_audio_driver, I
disalbed the audio driver by adding the following lines to /boot/device.hints:
hint.pcm.4.disabled="1"
hint.pcm.5.disabled="1"
hint.hdaa.1.disabled="1"
hint.hdacc.1.disabled="1"
hint.hdac.1.disabled="1"
and the kernel no longer panics as shown in kernel_without_audio_driver.MOV.
Comment 1 Alex Matei 2023-08-15 19:12:40 UTC
Created attachment 244129 [details]
See the Description
Comment 2 Alex Matei 2023-08-15 19:13:14 UTC
Created attachment 244130 [details]
See the Description
Comment 3 Alex Matei 2023-08-15 19:15:59 UTC
Created attachment 244131 [details]
See the Description
Comment 4 Alex Matei 2023-08-15 19:31:32 UTC
It does not let me attach the following files:

-rw-r--r--  1 root  wheel  6782462 Aug 15 10:58 kernel_panic.MOV
-rw-r--r--  1 root  wheel  4409731 Aug 14 17:11 kernel_panic_with_DELAY.MOV
-rw-r--r--  1 root  wheel  4902469 Aug 15 10:59 kernel_without_audio_driver.MOV
Comment 5 Alex Matei 2023-08-15 19:35:57 UTC
The output of kldstat is the same with or without the audio driver.
Comment 6 Alex Matei 2023-08-20 18:30:05 UTC
The kernel panic is caused by snd_hda.ko.

In the file /usr/src/sys/amd64/conf/GENERIC, section 'Sound support', I commented out all the lines except one:
...
# Sound support
device          sound                   # Generic sound driver (required)
#device         snd_cmi                 # CMedia CMI8338/CMI8738
#device         snd_csa                 # Crystal Semiconductor CS461x/428x
#device         snd_emu10kx             # Creative SoundBlaster Live! and Audigy
#device         snd_es137x              # Ensoniq AudioPCI ES137x
#device         snd_hda                 # Intel High Definition Audio
#device         snd_ich                 # Intel,NVidia and other ICH AC'97 Audio
#device         snd_via8233             # VIA VT8233x Audio
...
        
swift@eagle:~ cat /boot/device.hints
...
hint.hdac.1.cad0.nid20.config="as=1"
hint.hdac.1.cad0.nid27.config="as=1 seq=15"

With the new kernel, after logging in, I run the following command:

swift@eagles:~ sudo /sbin/kldload -n /boot/kernel/snd_hda.ko

which intermittently generates the kernel panic as you can see in
 kernelPanicScreenMessge.jpg.
Attached are also the crash files from /var/crash:
core.txt.6
vmcore.6
info.6
bounds
Comment 7 Alex Matei 2023-08-20 18:33:06 UTC
Created attachment 244232 [details]
core.txt.6
Comment 8 Alex Matei 2023-08-20 18:37:46 UTC
Created attachment 244233 [details]
info.6
Comment 9 Alex Matei 2023-08-20 18:38:41 UTC
Created attachment 244234 [details]
bounds
Comment 10 Alex Matei 2023-08-20 18:39:42 UTC
Created attachment 244235 [details]
screen messages for the kernel panic
Comment 11 Alex Matei 2023-08-20 19:06:21 UTC
Created attachment 244236 [details]
error when trying to attach vmcore.6.gz
Comment 12 Alex Matei 2023-08-20 19:08:01 UTC
I cannot attach vmcore.6.gz because of the error shown in bugzillaError.jpg.
Comment 13 Alex Matei 2023-08-20 19:24:08 UTC
bugzilla works like a crap.
Comment 14 Alex Matei 2023-08-20 20:54:58 UTC
Created attachment 244238 [details]
kgdb information
Comment 15 Alex Matei 2023-08-20 20:59:45 UTC
It does not let me attach vmcore.6.gz.

I ran the following command:

$ sudo kgdb /boot/kernel/kernel /var/crash/vmcore.6

and I attach kgdb.out with kgdb data.
Comment 16 John Grafton 2023-08-24 15:39:17 UTC
I'm fairly certain this is the same issue as bug #268393.  Jonathan created a patch in bug #268393, comment #48 to add a delay in the hda_intr_handler function that seems to workaround the issue for now but is not a permanent fix.

If the patch works, this bug should be marked as a duplicate of bug #268393.
Comment 17 Alex Matei 2023-08-25 13:34:14 UTC
The patch works.
The kernel panic no longer occurs. Both the headphone and the monitor
speakers work fine.
Thank you very much!
Comment 18 Mark Linimon freebsd_committer freebsd_triage 2023-08-30 17:07:18 UTC
^Triage: submitter confirms the following patch works:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268393#c48

So even though there is a lot of debugging information here (thanks!) that is not in 268393, I am going to go ahead and mark this as a duplicate of 268393.  I hope this will decrease the confusion of people trying to follow along with these PRs.

*** This bug has been marked as a duplicate of bug 268393 ***