268393 – system always reboots once from a powered off state

Bug 268393 - system always reboots once from a powered off state

Summary: system always reboots once from a powered off state

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	13.1-STABLE
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	Mark Johnston

URL:
Keywords:	crash

Duplicates (3):	264305 272878 273151 (view as bug list)
Depends on:
Blocks:	14.0r
	Show dependency tree / graph

Reported:	2022-12-15 13:22 UTC by Jonathan Vasquez
Modified:	2023-10-05 15:10 UTC (History)
CC List:	13 users (show)

See Also:	273151 272878 264305 https://reviews.freebsd.org/D41883

Attachments
dmesg (19.07 KB, text/plain) 2022-12-15 13:24 UTC, Jonathan Vasquez	no flags	Details
loader.conf (112 bytes, text/plain) 2022-12-15 13:24 UTC, Jonathan Vasquez	no flags	Details
pciconf -vl (11.90 KB, text/plain) 2022-12-15 13:24 UTC, Jonathan Vasquez	no flags	Details
rc.conf (647 bytes, text/plain) 2022-12-15 13:25 UTC, Jonathan Vasquez	no flags	Details
1_with_amdgpu_core.txt (126.02 KB, text/plain) 2023-03-02 00:56 UTC, Jonathan Vasquez	no flags	Details
2_clean_core.txt (87.70 KB, text/plain) 2023-03-02 00:57 UTC, Jonathan Vasquez	no flags	Details
3_clean_core.txt (90.43 KB, text/plain) 2023-03-02 00:57 UTC, Jonathan Vasquez	no flags	Details
debugging printfs for hda driver (933 bytes, patch) 2023-03-07 19:31 UTC, John Grafton	no flags	Details \| Diff
2023-03-12-1625 - cold boot (with print patch applied) (96.52 KB, application/x-troff-man) 2023-03-12 20:25 UTC, Jonathan Vasquez	no flags	Details
2023-03-12-1625 - hot boot dmesg (with print patch applied) (223.87 KB, text/plain) 2023-03-12 20:25 UTC, Jonathan Vasquez	no flags	Details
2023-03-16-1919-cold-core (94.87 KB, text/plain) 2023-03-16 23:23 UTC, Jonathan Vasquez	no flags	Details
2023-03-16-1919-hot-dmesg (35.41 KB, text/plain) 2023-03-16 23:23 UTC, Jonathan Vasquez	no flags	Details
lock hdac during init (680 bytes, patch) 2023-03-21 16:45 UTC, John Grafton	no flags	Details \| Diff
cold-2023-04-09-2125.txt (35.80 KB, text/plain) 2023-04-10 01:30 UTC, Jonathan Vasquez	no flags	Details
hot-2023-04-09-2125.txt (19.47 KB, text/plain) 2023-04-10 01:30 UTC, Jonathan Vasquez	no flags	Details
revert commit that coincides with occurance of problem (661 bytes, patch) 2023-04-13 17:52 UTC, John Grafton	no flags	Details \| Diff
delay driver attach by 10 ms (399 bytes, patch) 2023-04-13 17:52 UTC, John Grafton	no flags	Details \| Diff
Fedora 37 dmesg (2023-04-24) (123.12 KB, text/plain) 2023-04-24 20:17 UTC, Jonathan Vasquez	no flags	Details
bad.0.txt (37.11 KB, text/plain) 2023-07-07 01:38 UTC, Jonathan Vasquez	no flags	Details
bad.1.txt (35.50 KB, text/plain) 2023-07-07 01:38 UTC, Jonathan Vasquez	no flags	Details
good.0.txt (22.03 KB, text/plain) 2023-07-07 01:39 UTC, Jonathan Vasquez	no flags	Details
patch (1.18 KB, patch) 2023-09-07 23:03 UTC, Ivan Rozhuk	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jonathan Vasquez 2022-12-15 13:22:38 UTC

Hello all,

It seems that there is a weird bug that is affecting my motherboard. Each time that I turn on my computer (cold boot, and not a reboot), the freebsd kernel starts up, I see the first few messages, and then the system will reboot. Upon reboot, the second time it will succeed and there will be no issues. If I do a 'reboot', it will work. However, if I do a 'poweroff' and then manually start the machine up, it will again reboot once upon booting and then continue working.

I tried isolating the issue by commenting out any modules that load (amdgpu, vboxdrv, and if_re. The built in if_re doesn't support my mobo's ethernet chipset (RTL8125 2.5GbE Controller) so I have to use the net/realtek-re-kmod port's version of it).

Attached is some information about my system:

- FreeBSD 13.1-STABLE #4 stable/13-n253282-50f61166f7b9
- Motherboard: ASUS TUF GAMING X670E-PLUS WIFI -> https://www.asus.com/us/motherboards-components/motherboards/tuf-gaming/tuf-gaming-x670e-plus-wifi/

I'll be linking a video demonstrating the issue as well.

Comment 1 Jonathan Vasquez 2022-12-15 13:24:29 UTC

Created attachment 238814 [details]
dmesg

The video reboots a little after the following message (but as described, this shows up even when a subsequent reboot works, but there maybe something to that message on a cold boot):

Firmware Error (ACPI): Could not resolve symbol [\134_SB.PCI0.GPP7.UP00.DP40.UP00.DP68], AE_NOT_FOUND (20201113/dswload2-315)
ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20201113/psobject-372)

Comment 2 Jonathan Vasquez 2022-12-15 13:24:42 UTC

Created attachment 238815 [details]
loader.conf

Comment 3 Jonathan Vasquez 2022-12-15 13:24:55 UTC

Created attachment 238816 [details]
pciconf -vl

Comment 4 Jonathan Vasquez 2022-12-15 13:25:06 UTC

Created attachment 238817 [details]
rc.conf

Comment 5 Jonathan Vasquez 2022-12-15 13:26:23 UTC

Video can be found here: https://xyinn.org/freebsd/bugs/268393/1.mp4

Comment 6 Jonathan Vasquez 2022-12-20 18:20:39 UTC

I compiled a debugging kernel with the following options (in order to attempt and make the panic not reboot, and also display more info). This successfully yielded more information :D.

There seems to be a page fault happening in the AMD Raven HDA Controller.


Picture and Video
--------------
https://xyinn.org/freebsd/bugs/268393/268393-2.jpg
https://xyinn.org/freebsd/bugs/268393/268393-2.mp4


Kernel Options
--------------
include GENERIC
ident GENERIC-DEBUG

options KDB
options DDB
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options DIAGNOSTIC


Info
----------
root@leslie:~ # uname -a
FreeBSD leslie 13.1-STABLE FreeBSD 13.1-STABLE #0 stable/13-n253296-384a885111ad: Tue Dec 20 12:46:46 EST 2022     root@leslie:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-DEBUG amd64

Comment 7 Jonathan Vasquez 2022-12-20 18:33:19 UTC

I forgot to also mention that the keyboard isn't functional at this stage. I tried disconnecting my keyboard usb receiver from my KVM to the front IO ports (USB 3.0) to see if that worked but nothing. I could try connecting it to the back ports to see if that helps. I'll need to check again if this motherboard has any USB 2.0 rear ports left since sometimes (on Linux, maybe FreeBSD) USB 3.0 ports have issues - especially if IOMMU is disabled on the BIOS and the system tries to remap the memory space to allow those ports to work.

Comment 8 Jonathan Vasquez 2023-01-02 22:23:42 UTC

I experimented a little more today to further narrow it down. The following allows the system to boot from a powered off (cold) state without crashing and rebooting once:

/boot/loader.conf:
hint.hdac.2.disabled=1

where hdac2 is the audio connector directly on the back of my mobo. Once the driver is attached, the Realtek ALC1220 is expanded and used.

jon@leslie:~ $ dmesg | grep hdac
hdac0: <ATI (0xab28) HDA Controller> mem 0xfcb20000-0xfcb23fff at device 0.1 on pci3
hdac1: <ATI (0x1640) HDA Controller> mem 0xfc988000-0xfc98bfff at device 0.1 on pci19
hdac2: <AMD Raven HDA Controller> mem 0xfc980000-0xfc987fff at device 0.6 on pci19
hdacc0: <ATI R6xx HDA CODEC> at cad 0 on hdac0
hdaa0: <ATI R6xx Audio Function Group> at nid 1 on hdacc0
hdacc1: <ATI R6xx HDA CODEC> at cad 0 on hdac1
hdaa1: <ATI R6xx Audio Function Group> at nid 1 on hdacc1
hdacc2: <Realtek ALC1220 HDA CODEC> at cad 0 on hdac2
hdaa2: <Realtek ALC1220 Audio Function Group> at nid 1 on hdacc2

The only problem with this is that I no longer have audio on this machine. I do see audio activity being redirected to my monitor's HDMI connection but my monitor doesn't have any speakers, but just writing this for completeness.

If there is anything I can provide to someone to further debug this page fault, please lmk.

Comment 9 John Grafton 2023-01-31 20:20:21 UTC

Huh, what a strange problem.  Nice work narrowing it down.  The ALC1220 is a very common chipset, I wonder if you've stumbled upon a weird edge case.

Can you compile and boot a MINIMAL config 13 STABLE kernel and then load snd_hda by hand with kldload? (make sure you disable the sysctl that disables hdac2) Does the system still panic?

Comment 10 Jonathan Vasquez 2023-03-02 00:55:14 UTC

Hey John,

Thanks for that. I got some interesting results!

It's been a few months since my last post and since then I've reinstalled FreeBSD, it's currently on 13.2-STABLE (stable/13-n254729-3912f99ecae6/GENERIC). There is nothing in /etc/sysctl.conf at the moment. So let's begin from what perspective.

I first compiled the /usr/src/sys/amd64/conf/MINIMAL kernel and rebooted. The first time I did this the system locked up since it couldn't find my root filesystem, which is on ZFS on an NVMe drive. After some digging, I added a few options (not the minimum options needed but I casted a wide enough net within reason to allow the system to boot). After I got it booting successfully, I wasn't able to type anything. Makes sense.. MINIMAL has no USB support lol. I added those in as well, so I ended up with a MINIMAL config with the following extra info:

device crypto
device acpi
device nvme
device nvd

options ZSTDIO

device uhci
device ohci
device ehci
device xhci
device usb
device hid

---------

Now that I was in the system successfully, we can notice that the system didn't crash. I did a 'poweroff' as well to get the system back to the cold state which causes it to crash on boot (first time, once it's hot it won't crash). I did the 'kldload snd_hda' and the system immediately crashed, and I noticed that I saw some messages regarding 'drm-510-kmod'. I thought, ah! yea I forgot I needed to uncomment the kld_list in my /etc/rc.conf since I have 'amdgpu vboxdrv' in there. So I was thinking, the AMD Radeon XT 6900 (sienna_cichlid) and the snd_hda may be having a conflict. I commented out the kld_list line and did a 'poweroff' again. I turned the machine back on immediately and booted up. I did another 'kldload snd_hda' and the system didn't crash! I was like yea .. maybe there is a conflict between those two drivers. But I was skeptical. I decided to do another 'poweroff' and wait 5 seconds before I continue, to give any internal system components time to properly reset themselves, just in case. After the 5 seconds, I turned it back on and booted. I did another 'kldload snd_hda', and the system crashed again! This time with no 'drm-510-kmod' messages, just a clean dump. So that makes me think that there potentially could be two issues here, or it could just be one underlying issue (the page fault) that's causing it to appear in two places.

I also did a final test with loading the 'vboxdrv' and re-testing, that driver didn't conflict, I just crashed with the same scenario as just mentioned (without amdgpu loaded.. our clean dump).

I've attached the following crash dumps for inspection:

- 1_with_amdgpu_core.txt
- 2_clean_core.txt
- 3_clean_core.txt (this is the third run that has vboxdrv loaded but same info as without vboxdrv.. so vboxdrv doesnt seem to cause an issue).

Thank you!

Comment 11 Jonathan Vasquez 2023-03-02 00:56:52 UTC

Created attachment 240519 [details]
1_with_amdgpu_core.txt

Comment 12 Jonathan Vasquez 2023-03-02 00:57:06 UTC

Created attachment 240520 [details]
2_clean_core.txt

Comment 13 Jonathan Vasquez 2023-03-02 00:57:19 UTC

Created attachment 240521 [details]
3_clean_core.txt

Comment 14 Jonathan Vasquez 2023-03-02 00:57:49 UTC

I forgot to mention, I also tested this on 13.1-RELEASE and it also happens, so it's not something that was introduced after 13.1-RELEASE and is an existing bug.

Comment 15 John Grafton 2023-03-07 19:31:48 UTC

Created attachment 240647 [details]
debugging printfs for hda driver

Comment 16 John Grafton 2023-03-07 19:36:58 UTC

(In reply to Jonathan Vasquez from comment #14)
Hi Jonathan,

All of the crash dump reports you posted appear to have panicked in the hdac_rirb_flush function.  Specifically dereferencing the `rirb` pointer on line 968 from `sys/dev/sound/pci/hda/hdac.c` 

I'm thinking there *may* be a bug in calculating the `rirb` read pointer in the code just above the dereference.

Would patch your kernel hda driver (should work with stable 13) with the attached patch and output the results?  It just adds a few debug prints to the hda driver.

You'll also need to update a sysctl variable to ensure the debugs actually print.

So the procedure is:
1) sysctl debug.bootverbose=1
2) patch hda driver and recompile
3) unload snd_hda and reload

Here's an example of what it looks like on a bhyve VM:

root@fbsd-current:~ # sysctl debug.bootverbose=1
root@fbsd-current:~ # kldload snd_hda
pci0: driver added                                                                      
found-> vendor=0x8086, dev=0x27d8, revid=0x00                                           
       domain=0, bus=0, slot=6, func=0                                                 
       class=04-03-00, hdrtype=0x00, mfdev=0                                           
       cmdreg=0x0406, statreg=0x0000, cachelnsz=0 (dwords)                             
       lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)                    
       intpin=a, irq=18                                                                
pci0:0:6:0: reprobing on driver added                                                   
hdac0: <Intel 82801G HDA Controller> mem 0xc0004000-0xc0007fff irq 18 at device 6.0 on pci0  
hdac0: PCI card vendor: 0x0000, device: 0x0000                                          
hdac0: HDA Driver Revision: 20120126_0002                                               
hdac0: Config options: on=0x00000000 off=0x00000000                                     
ioapic0: routing intpin 18 (PCI IRQ 18) to lapic 2 vector 51                            
hdac0: Caps: OSS 4, ISS 4, BSS 0, NSDO 1, 64bit, CORB 256, RIRB 256  
hdac0: rirb_base 0xfffffe01205ff000    
hdac0: rirb_size 256    
hdac0: sc->rirb_rp 1    
hdac0: rirb address 0xfffffe01205ff008 hdac0:  response 00008086  
hdac0: rirb_base 0xfffffe01205ff000    
hdac0: rirb_size 256    
hdac0: sc->rirb_rp 2    
hdac0: rirb address 0xfffffe01205ff010 hdac0:  response 0000ffff  
hdacc0: <Generic (0x8086) HDA CODEC> at cad 0 on hdac0  
hdac0: rirb_base 0xfffffe01205ff000    
hdac0: rirb_size 256    
hdac0: sc->rirb_rp 3    
hdac0: rirb address 0xfffffe01205ff018 hdac0:  response 00010001  
hdac0: rirb_base 0xfffffe01205ff000    
hdac0: rirb_size 256    
hdac0: sc->rirb_rp 4
...

Comment 17 Jonathan Vasquez 2023-03-12 20:24:35 UTC

Hey John,

Thanks for that. I've attached the crash dump with the patch applied to my MINIMAL config for both a cold boot and my dmesg output for a hot boot.

Comment 18 Jonathan Vasquez 2023-03-12 20:25:12 UTC

Created attachment 240800 [details]
2023-03-12-1625 - cold boot (with print patch applied)

Comment 19 Jonathan Vasquez 2023-03-12 20:25:31 UTC

Created attachment 240801 [details]
2023-03-12-1625 - hot boot dmesg (with print patch applied)

Comment 20 John Grafton 2023-03-13 15:46:47 UTC

This bug appears to be a duplicate of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264305

Comment 21 John Grafton 2023-03-13 18:38:19 UTC

Hi Jonathan,

Is the hot boot dmesg output truncated or was that all of it?  From the output it appears hdac2 isn't properly initialized.  The ring buffer read pointer (rirb_rp) starts at 189 instead of 1 like hdac_attach hasn't been run.  There's no 'hdac2: <AMD Raven HDA Controller> mem 0xfc980000-0xfc987fff at device 0.6 on pci19' line.

Does the hdac2 device work?  Is it assigned an IRQ?

sysctl hw.intrs | grep hdac

Comment 22 Jonathan Vasquez 2023-03-13 19:41:41 UTC

The hot boot logs are truncated from /var/log/messages directly at the point of BOOT. Everything else before that is from a previous boot instance.

The 'dmesg' output with the print patch will immediately overload the dmesg ring buffer so I wouldn't be able to get the output of the beginning of the boot sessions through it after doing 'kldload snd_hda'. 

Regarding your other comment about the other bug, just it seems to be similar if not identical.

Comment 23 John Grafton 2023-03-14 15:11:18 UTC

(In reply to Jonathan Vasquez from comment #22)

The duplicate comment was just to alert FreeBSD devs that this issue may be affecting others with the same hardware.

No problem about the dmesg buffer, you can increase the buffer size to 1M by adding the following line to /boot/loader.conf:

kern.msgbufsize=1048576

Thanks for the help debugging!

Comment 24 Jonathan Vasquez 2023-03-16 23:22:44 UTC

Hey John,

No, thank you for helping me look into this :).

I've attached a new cold core dump and a hot dmesg output that has an increased kernel buffer size. Now we can see everything. The interesting part starts at line 230 when the kldload snd_hda happened (snd_unit_init() u=0x00ff8000 [512] d=0x00007c00 [32] c=0x000003ff [1024]).

- Jonathan

Comment 25 Jonathan Vasquez 2023-03-16 23:23:06 UTC

Created attachment 240909 [details]
2023-03-16-1919-cold-core

Comment 26 Jonathan Vasquez 2023-03-16 23:23:23 UTC

Created attachment 240910 [details]
2023-03-16-1919-hot-dmesg

Comment 27 John Grafton 2023-03-21 16:45:27 UTC

Created attachment 241040 [details]
lock hdac during init

I have a theory that an interrupt is being triggered before the dma memory is allocated correctly for hdac2.  I've included a patch that locks the hdac driver initialization sequence which should block any interrupts (that call hdac_rirb_flush) before the initialization completes.

This patch is just for testing.  You should see some WITNESS debugger complaints about bouncing off the lock. If the system actually boots and doesn't crash, will you send the dmesg?

Comment 28 Jonathan Vasquez 2023-03-22 12:15:10 UTC

Hey John,

That sounds good, will do. Do you want me to run this with the `snd_hda` automatically being loaded as normal, or from the MINIMAL kernel? I'm guessing I'll need to enable WITNESS and other debugging features for this to be displayed? Could you give me a list of your suggested debugging options so I can add them to my MINIMAL kernel config? Thank you!

I also noticed something today (an anamoly but still interesting). Today when I booted my machine from a cold boot (for the first time this morning), the system didn't crash. I suspect this may be because yesterday when I powered off my machine, FreeBSD didn't actually shut down completely when I did the `poweroff` command. Usually when I run that command (which is my normal way of shutting down the machine for the most part), it would run its normal shutdown sequence, and it would say "all buffers synced" at the end, and then power off the machine completely. Yesterday it displayed everything up to "all buffers synced", but it got stuck and didn't shut down (almost as if it did a `halt` instead of a `poweroff`). After that, I hard powered off the machine by holding the power button, and that was the end of that until this morning. I wonder if there was some type of state remaining some way even though it was powered off completely. Doing another `poweroff` this morning correctly shutdown the machine, and turning it back on allowed the crash to re-occur. Weird, but noteworthy of a mention.

- Jonathan

Comment 29 John Grafton 2023-03-22 15:06:30 UTC

(In reply to Jonathan Vasquez from comment #28)
Yes, compile using the GENERIC kernel config, it includes the witness debugger.  Then boot from the kernel as normal and see if the system still crashes.

There *may* be a race within the driver which only crashes given specific parameters.  From the data you've already provided, it appears an interrupt handler is called before memory is initialized for it.  Hopefully, locking the initialization functions will give us a clue to what's going on in your system.

Intermittent bugs are the most annoying to debug because they can sometimes be difficult to replicate.  We just have to keep poking at it to see if we can find a solution.  :)

Comment 30 Jonathan Vasquez 2023-04-10 01:30:28 UTC

Hey John,

I just got some time to try out the latest lock patch. I also updated to stable/13-n255040-535fc5f75e20 as well.

Using the lock patch, the system didn't crash at boot up and we got some nice messages:

hdac2: <AMD Raven HDA Controller> mem 0xfc980000-0xfc987fff at device 0.6 on pci19
hdac2: Unexpected unsolicited response from address 0: 00000000
...
pcib20: <ACPI PCI-PCI bridge>hdac2: Unexpected unsolicited response from address 0: 00000000
...
hdac2: Unexpected unsolicited response from address 0: 00000000
 at device 8.3 on pci0
pci20: <ACPI PCI bus> on pcib20
xhci4: <XHCI (generic) USB 3.0 controller> mem 0xfcd00000-0xfcdfffff at device 0.0 on pci20
...

When in this state the system starts up without crashing, but the audio device is missing (which makes sense). Once I do another reboot (hot start), no more error messages show up and the audio devices are available.

I've attached the hot/cold.txt labeled with "2023-04-09-2125".

Thank you!

Jonathan

Comment 31 Jonathan Vasquez 2023-04-10 01:30:47 UTC

Created attachment 241393 [details]
cold-2023-04-09-2125.txt

Comment 32 Jonathan Vasquez 2023-04-10 01:30:58 UTC

Created attachment 241394 [details]
hot-2023-04-09-2125.txt

Comment 33 John Grafton 2023-04-13 17:50:13 UTC

Interesting! Okay, I have a couple of patches for you to try. The first patch reverts a commit to hdac.c that may have caused the problem to occur. The second patch introduces a delay before the driver attaches to the device

Thanks!

Comment 34 John Grafton 2023-04-13 17:52:17 UTC

Created attachment 241460 [details]
revert commit that coincides with occurance of problem

Comment 35 John Grafton 2023-04-13 17:52:54 UTC

Created attachment 241461 [details]
delay driver attach by 10 ms

Comment 36 Jonathan Vasquez 2023-04-13 17:56:49 UTC

Thanks John! I'll try those patches independently when I get a chance and report back.

Comment 37 Jonathan Vasquez 2023-04-24 11:34:43 UTC

Hey John,

I just tested both patches (separately) on stable/13-n255158 and unfortunately no effect. The first one (revert) didn't work, and the second one I thought there may have been something to it since I didn't see the second freebsd logo (1st one is initial power on, second one shows up since the machine crashed), I was thinking maybe it was working, but there is a chance it wasn't since the monitor lost signal twice between booting up / delay / and re-presenting to the screen. So I compiled the kernel a few times with 1000, 5000, and again with 10000. The 1000 showed the second freebsd logo, 5000 also showed it, and then 10000 showed it when I tried it again.

- Jonathan

Comment 38 John Grafton 2023-04-24 18:18:39 UTC

Hi Jonathan,

You have a common sound chipset that comes with a lot of recent AMD motherboards (within last couple of years).  Here's a link to systems that have run a BSD hardware probe that have your same chipset: https://bsd-hardware.info/?view=search&vendorid=1022&deviceid=15e3&typeid=sound&page=1#list

I'd expect to see other folks running into the same issue as they upgrade to their AMD systems.

Have you booted other OS's on this system?  Like Windows or Linux?  Are there any problems with the sound on them?  Is the BIOS upgraded with the latest revision?

Thanks!
John

Comment 39 Jonathan Vasquez 2023-04-24 19:27:49 UTC

Hey John,

I've just uploaded my hw-probe just in case: https://bsd-hardware.info/?probe=d3373e972b

I believe I ran Linux (and Windows 10) I believe maybe for like a few days, but I've been mostly on FreeBSD on this machine. I don't remember having any audio issues or reboot issues at start up, which is what caught me off guard in the beginning.

I'm going to re-check my BIOS firmware now and update/re-test if anything. Will report back soon.

- Jonathan

Comment 40 Jonathan Vasquez 2023-04-24 19:36:01 UTC

(will report back in a bit after the update, but there have been new updates since I bought/built this machine):

Current BIOS Version: 0821 (2022/11/15 reported on computer. Website says 2022/11/25).

Previous BIOSes (I already have these as part of 0821 but listed for the record)

- 0613 (2022/09/26)

No patch notes other than:

Before running the USB BIOS Flashback tool, please rename the BIOS file (TX670ELW.CAP) using BIOSRenamer.

- 0809 (2022/10/25)

"1. Update AGESA version to ComboAM5PI 1.0.0.3 patch A
2. Improve system performance and stability
3. Improve GPU compatibility for GeForce RTX 40 series

------------

BIOS updates released since (I don't have these yet but will update. I will be applying patch 1409, the latest version that isn't beta).

- 1222 (2023/02/24)

"1. Update AGESA version to ComboAM5PI 1.0.0.5 patch C
2. Improve better performance for AMD new CPUs
Before running the USB BIOS Flashback tool, please rename the BIOS file (TX670ELW.CAP) using BIOSRenamer."

- 1223 (2023/03/20)

"Improve memory compatibility
Before running the USB BIOS Flashback tool, please rename the BIOS file (TX670ELW.CAP) using BIOSRenamer."

- 1406 (2023/04/07)

"1. Update AGESA version to ComboAM5PI 1.0.0.6
2. Please make sure to update to BIOS 1406 for better compatibility with the Ryzen™ 7000X3D series processor.
3. TPM 2.0 security update

- 1408 (2023/04/13)

"Improve system performance and stabilize AMD Ryzen 7000 X3D series processors.
Before running the USB BIOS Flashback tool, please rename the BIOS file (TX670ELW.CAP) using BIOSRenamer."

- 1409 (2023/04/21)

1. Update AGESA version to ComboAM5PI 1.0.0.6
2. TPM 2.0 security update
3. Recommended for optimum performance with AMD Ryzen™ 7000X3D series processors
Before running the USB BIOS Flashback tool, please rename the BIOS file (TX670ELW.CAP) using BIOSRenamer.

- 1410 (BETA - 2023/04/14)

"Beta BIOS
1. Update AGESA version to ComboAM5PI 1.0.0.6
2. Supports high density DDR5 module
3. TPM 2.0 security update

Please note that this is a beta BIOS version of the motherboard which is still undergoing final testing before its official release. The UEFI, its firmware and all content found on it are provided on an “as is” and “as available” basis. ASUS does not give any warranties, whether express or limited, as to the suitability, compatibility, or usability of the UEFI, its firmware or any of its content. Except as provided in the Product warranty and to the maximum extent permitted by law, ASUS is not responsible for direct, special, incidental or consequential damages resulting from using this beta BIOS.
Before running the USB BIOS Flashback tool, please rename the BIOS file (TX670ELW.CAP) using BIOSRenamer."

Comment 41 Jonathan Vasquez 2023-04-24 20:16:34 UTC

Alright, I'm back and updated on the latest stable bios 1409. Unfortunately, this did not fix the issue. But I have some very interesting information to report.

1. I'm able to reproduce the error by booting from the FreeBSD 13.2 USB from a cold state.
2. Booting from a Fedora 37 USB from a cold boot worked properly.
3. This is the interesting part................ If I boot from Fedora 37 from a cold boot, and then I completely power off the machine, and then boot up FreeBSD from a cold boot, the machine will _NOT_ crash. I tried this entire cycle twice and both times it worked fine. Booting FreeBSD and then shutting it down completely, and then starting it up again will cause the problem to arise.

This makes me think that there is a driver issue in either the FreeBSD start up sequence, or most specifically, the shutdown sequence, where some flags on the hardware are set to an invalid state, which causes FreeBSD to freak out during a subsequent cold boot. The Linux side must be "correcting" this issue by either clearing out some flags on the chip, or just overriding the flags already on the chip with the correct ones in a way that doesn't cause a crash.

I've attached the Fedora 37 cold boot dmesg just in case it may provide anything (it is just a normal dmesg, not a verbose one).

Comment 42 Jonathan Vasquez 2023-04-24 20:17:02 UTC

Created attachment 241709 [details]
Fedora 37 dmesg (2023-04-24)

Comment 43 Jonathan Vasquez 2023-04-24 20:20:09 UTC

I vaguely remember booting from the FreeBSD 13.0 (or 13.1) USB on this machine and remembering it not crashing, but I couldn't remember correctly if this was the case. The above situation would make me believe that my memory wasn't failing me and it was just the sequence of events I did things in when I had Linux on the machine (or Windows) and then was installing FreeBSD. If it was a Linux/Windows -> FreeBSD migration, it makes sense that from that state, the machine would be "good" so FreeBSD wouldn't freak out. But I could have also been booting from a hot boot if I did something like "boot machine, backup files, reboot to FreeBSD USB". Hard to know at this point lol.

Another thing was that the UEFI BIOS Firmware file on my UEFI USB for my old BIOS firmware (0821) had a last modified timestamp of 2022/12/05, this is probably the date I downloaded the firmware and upgraded my BIOS from whatever version it came with from the store, to this one.

Comment 44 John Grafton 2023-04-25 17:49:12 UTC

(In reply to Jonathan Vasquez from comment #43)

Huh, likely a driver problem then.  Have you tested booting a 13.0 or 12.x kernel on this system?  I'm wondering if the problem was introduced in 13.1.

Comment 45 Jonathan Vasquez 2023-04-25 19:50:52 UTC

It existed throughout all of the 13.X releases I've used on this machine, I haven't tried 12.X on it. But I think I'm good on calling it quits on this one, it's getting close to the half a year mark and I don't want to keep putting more energy into this. Thanks for helping debugging this John, I really appreciated it. Hopefully this ticket can be resolved eventually for some future people. I'll see what I'll do about continued FreeBSD usage on my desktops/laptops.

Comment 46 John Grafton 2023-04-26 14:18:37 UTC

Thanks for the bug report.  Hopefully someone finds it helpful in the future.  Sorry we didn't get it fixed!

Comment 47 Jonathan Vasquez 2023-07-03 02:22:13 UTC

I did a quick search online to see if anyone else was having this issue, it seems I found someone else with a different machine but seems to be the same driver:

https://www.reddit.com/r/freebsd/comments/vwkx2w/kernel_paniccrash_every_boot_after_a_shutdown/

https://forum.opnsense.org/index.php?topic=29276.msg141378#msg141378

We can see from their screenshot (https://imgur.com/a/0UGkRta) that they also happen to be using an "AMD Raven HDA Controller". Their workaround is to disable the HD Audio Controller in the BIOS (which we already tested as a workaround that stops the crashing, but then I have no audio, given that this is for desktop use and not server, I can't really go with this). However, I'm thinking it may just be easier to disable it and buy a better supported Audio card, avoiding the integrated one completely. I'll need to research for better supported audio cards for FreeBSD and see what happens.

Comment 48 Jonathan Vasquez 2023-07-07 01:38:12 UTC

Hey all,

So I spent a few hours today debugging this issue on 13.2-RELEASE and I have interesting stuff to report.

TLDR:

1. There definitely seems to be a race condition somewhere with how either the AMD Raven HDA Controller is being enumerated, or how it's being accessed.

2. I was able to build on John's idea regarding the delays and come up with something that seems to no longer crash my system. Although I don't think it might be an acceptable solution since it would introduce a delay to all "hdac_intr_handler()" calls for any device that uses that function. But I'll keep testing it locally to see if I notice any new types of weirdness (outside of any known ones that I've experienced before this patch), and also because I don't want to have my system continuing to crash. A side note is that I ordered 2 PCIe sound cards that I want to see if they are FreeBSD compatible, which would help mitigate this issue if anything. Best case scenario, we fix this issue, and I also end up having a better sounding sound card that's not the on-board sound :).

3. We can experience different types of severity levels depending on the length of the delay.

-----

So this is how the patch looks like in order to allow my system to no longer crash on first boot:

diff --git a/sys/dev/sound/pci/hda/hdac.c b/sys/dev/sound/pci/hda/hdac.c
index 9aa0e4bffdc8..e9d581a422cb 100644
--- a/sys/dev/sound/pci/hda/hdac.c
+++ b/sys/dev/sound/pci/hda/hdac.c
@@ -378,6 +378,11 @@ hdac_one_intr(struct hdac_softc *sc, uint32_t intsts)
 static void
 hdac_intr_handler(void *context)
 {
+       /*
+        * Add slight delay to avoid crashes with AMD Raven HDA Controllers
+        */
+       DELAY(5000);
+
        struct hdac_softc *sc;
        uint32_t intsts;


-----

- If there is no DELAY (the default), the system will crash.
- If there is a DELAY of 1000, the system won't crash, but we will see access errors! Which is revealing.

Example:

hdac2: <AMD Raven HDA Controller> mem 0xfc980000-0xfc987fff at device 0.6 on pci19
hdac2: Unexpected unsolicited response from address 0: 00000000
hdac2: Unexpected unsolicited response from address 0: 00000000
hdac2: Unexpected unsolicited response from address 0: 00000000
hdac2: Unexpected unsolicited response from address 0: 00000000


- If there is a DELAY of 5000, the system won't crash, and we no longer see any errors.

In the situations where I don't use delays (and leading up to this reduced solution), I was able to have the machine stop crashing if I added at least 4 printf statements lol. If I used 3 printf, it would crash. I suppose 4 printf is relatively equal to a DELAY of 5000 for me.

As stated before, with the above patch, the machine no longer crashes for me on a cold boot. I was also able to access and use my pcm8 device immediately and sound worked. This is progress.

I've attached the following files:

- bad.0.txt - Shows the access errors with a delay of 1000 with my previous expanded debug messages.
- good.0.txt - Shows a good cold boot with a delay of 5000 with my previous expanded debug messages.
- bad.1.txt - Shows the access errors with a delay of 1000 (minimal logging).

root@weshly:/usr/src # uname -a
FreeBSD weshly 13.2-RELEASE-p1 FreeBSD 13.2-RELEASE-p1 #23 releng/13.2-n254621-08b87f63a046-dirty: Thu Jul  6 21:22:10 EDT 2023     root@weshly:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

debugging on:

commit 08b87f63a046bd966bd0ed548211ae98ff50e638 (HEAD -> releng/13.2, origin/releng/13.2)
Author: Gordon Tetlow <gordon@FreeBSD.org>
Date:   Tue Jun 20 22:40:02 2023 -0700

    Add UPDATING entries and bump version.
    
    Approved by:    so

Comment 49 Jonathan Vasquez 2023-07-07 01:38:29 UTC

Created attachment 243289 [details]
bad.0.txt

Comment 50 Jonathan Vasquez 2023-07-07 01:38:48 UTC

Created attachment 243290 [details]
bad.1.txt

Comment 51 Jonathan Vasquez 2023-07-07 01:39:01 UTC

Created attachment 243291 [details]
good.0.txt

Comment 52 Alexander Sherikov 2023-08-13 13:18:01 UTC

Just to add some more info and subscribe myself to this issue:
I am having the same problem on 14.0-CURRENT (main-n263810-eb550615eff7) using Tuxedo Pulse 15 Gen1 laptop, https://bsd-hardware.info/?probe=6f779d5170. If I reboot the system from FreeBSD boot menu on the first start, it boots successfully on the second attempt. No issues with Ubuntu 20/22 on the same machine.

Comment 53 Graham Perrin 2023-08-29 06:38:04 UTC

For convenience

(In reply to Jonathan Vasquez from comment #39)

> … my hw-probe …

Three hdac detected lines under <https://bsd-hardware.info/?probe=d3373e972b#pci:1002-1640-1043-8877>. Condensed: 

1002:1640:1043:8877
AMD Rembrandt Radeon High Definition Audio Controller

1002:ab28:1002:ab28
AMD Navi 21/23 HDMI/DP Audio Controller

1022:15e3:1043:886d
AMD Family 17h/19h HD Audio Controller


(In reply to Alexander Sherikov from comment #52)

Two hdac detected lines under <https://bsd-hardware.info/?probe=6f779d5170#pci:1002-1637-1d05-109f>. Condensed: 

1002:1637:1d05:109f
AMD Renoir Radeon High Definition Audio Controller

1022:15e3:1d05:109f
AMD Family 17h/19h HD Audio Controller

Comment 54 Warner Losh freebsd_committer

2023-08-29 13:22:45 UTC

So what happens if you just revert the locking part of the commit that created the problem? Eg aff just the 0xffffffff check?

Comment 55 Mark Linimon freebsd_committer

2023-08-30 17:01:31 UTC

*** Bug 272878 has been marked as a duplicate of this bug. ***

Comment 56 Mark Linimon freebsd_committer

2023-08-30 17:07:18 UTC

*** Bug 273151 has been marked as a duplicate of this bug. ***

Comment 57 John Grafton 2023-08-30 17:26:24 UTC

(In reply to Warner Losh from comment #54)
You mean like this?

diff --git a/sys/dev/sound/pci/hda/hdac.c b/sys/dev/sound/pci/hda/hdac.c
index 79ab71516cd9..78c99db8e813 100644
--- a/sys/dev/sound/pci/hda/hdac.c
+++ b/sys/dev/sound/pci/hda/hdac.c
@@ -393,13 +393,13 @@ hdac_intr_handler(void *context)
         * re-examine GIS then we can leave it set and never get an interrupt
         * again.
         */
-       hdac_lock(sc);
        intsts = HDAC_READ_4(&sc->mem, HDAC_INTSTS);
        while (intsts != 0xffffffff && (intsts & HDAC_INTSTS_GIS) != 0) {
+               hdac_lock(sc);
                hdac_one_intr(sc, intsts);
+               hdac_unlock(sc);
                intsts = HDAC_READ_4(&sc->mem, HDAC_INTSTS);
        }
-       hdac_unlock(sc);
 }

Comment 58 Oleh Hushchenkov 2023-08-30 17:58:28 UTC

(In reply to Warner Losh from comment #54)

I just removed
  intsts != 0xffffffff &&
from
  while (intsts != 0xffffffff && (intsts & HDAC_INTSTS_GIS) != 0) {
to get
  while ((intsts & HDAC_INTSTS_GIS) != 0) {
and no more hangs on cold boot. Sound also works.

Reverting https://reviews.freebsd.org/D34117 should also fix the issue.

Comment 59 John Grafton 2023-08-30 18:15:57 UTC

(In reply to Oleh Hushchenkov from comment #58)
We tested reverting the patch completely in comment #34 and Jonathan reported that the sound didn't work but the system booted.

Removing 'intsts != 0xffffffff &&' will cause my laptop to freeze after waking from suspend!  :)

Comment 60 Oleh Hushchenkov 2023-08-30 18:25:15 UTC

(In reply to John Grafton from comment #59)
> We tested reverting the patch completely in comment #34 and Jonathan reported that the sound didn't work but the system booted.
I see.

> Removing 'intsts != 0xffffffff &&' will cause my laptop to freeze after waking from suspend!  :)
I didn't notice because I don't use suspend. So for me it's a viable workaround.

Comment 61 Jonathan Vasquez 2023-08-30 20:39:02 UTC

To be clear, the DELAY(5000) that I mention in comment 48 (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268393#c48) allowed the system to boot fine and sound worked (without reverting any current code), but I don't know if adding a generic DELAY like that is good since every interrupt call will have that delay.. atm we only know that some AMD specific audio chips would require that workaround. However, if reverting existing code can help fix the issue and doesn't cause issues for others, that would be much better than the DELAY.

Comment 62 Ivan Rozhuk 2023-08-30 23:16:30 UTC

https://reviews.freebsd.org/D34117 is best candidate :)
It is after new year and before my report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264305
and I do no see other relevant changes in this time frame.

I do not like DELAY() in intr handler code, probably this can break something else.

I will test revert D34117 in near future.

Comment 63 Ivan Rozhuk 2023-08-31 00:04:30 UTC

Why no one suggest patch like this?

diff --git a/sys/dev/sound/pci/hda/hdac.c b/sys/dev/sound/pci/hda/hdac.c
index 82b1baacfa9..f7be436aef6 100644
--- a/sys/dev/sound/pci/hda/hdac.c
+++ b/sys/dev/sound/pci/hda/hdac.c
@@ -956,6 +956,8 @@ hdac_rirb_flush(struct hdac_softc *sc)
 	int ret;
 
 	rirb_base = (struct hdac_rirb *)sc->rirb_dma.dma_vaddr;
+	if (rirb_base == NULL)
+		return (0);
 	rirbwp = HDAC_READ_1(&sc->mem, HDAC_RIRBWP);
 	bus_dmamap_sync(sc->rirb_dma.dma_tag, sc->rirb_dma.dma_map,
 	    BUS_DMASYNC_POSTREAD);
@@ -965,6 +967,8 @@ hdac_rirb_flush(struct hdac_softc *sc)
 		sc->rirb_rp++;
 		sc->rirb_rp %= sc->rirb_size;
 		rirb = &rirb_base[sc->rirb_rp];
+		if (rirb == NULL)
+			break;
 		resp = le32toh(rirb->response);
 		resp_ex = le32toh(rirb->response_ex);
 		cad = HDAC_RIRB_RESPONSE_EX_SDATA_IN(resp_ex);

Comment 64 Oleh Hushchenkov 2023-09-05 08:55:57 UTC

(In reply to Ivan Rozhuk from comment #63)
This one also avoids crash on first boot.

Comment 65 Ivan Rozhuk 2023-09-07 23:03:02 UTC

Created attachment 244703 [details]
patch

Comment 66 Ivan Rozhuk 2023-09-07 23:04:07 UTC

(In reply to Oleh Hushchenkov from comment #64)

Can you please test attached patch?
(it use continue instead break on NULL pointer)

Comment 67 Oleh Hushchenkov 2023-09-08 11:11:29 UTC

(In reply to Ivan Rozhuk from comment #66)

> Can you please test attached patch?
> (it use continue instead break on NULL pointer)
Attached patch also works for me.
No panic on first boot, sound works.

Comment 68 Mark Johnston freebsd_committer

2023-09-10 16:25:06 UTC

(In reply to Jonathan Vasquez from comment #18)
Looking at this log, it seems that we're getting an interrupt before hdac2 has finished initializing itself.  In particular, hdac_attach() allocates an MSI vector prior to allocating the RIRB DMA buffer.

I'm not particularly sure why my commit would introduce a problem there.  Could anyone affected by the problem please test moving the hdac_irq_alloc() call in hdac_attach() to just after the hdac_rirb_init() call?  I can't easily provide a patch at the moment but it should be easy to do.

Comment 69 Oleh Hushchenkov 2023-09-10 17:31:12 UTC

(In reply to Mark Johnston from comment #68)
Just moved hdac_irq_alloc() call in hdac_attach() to be after the hdac_rirb_init() call and rebuilt the kernel.

It fixed the issue for me.

Now I'm wondering which patch should I use?
1. Removing "intsts != 0xffffffff" from "while (intsts != 0xffffffff && (intsts & HDAC_INTSTS_GIS) != 0)" in "hdac_intr_handler()".
2. Adding "rirb_base == NULL" and "rirb == NULL" checks in "hdac_rirb_flush()".
3. Moving "hdac_irq_alloc()" to be after "hdac_rirb_init()" in "hdac_attach()".

Comment 70 Oleh Hushchenkov 2023-09-10 19:23:26 UTC

(In reply to Mark Johnston from comment #68)
> I'm not particularly sure why my commit would introduce a problem there.

I'm not familiar with the code base, but for me it looks like D34117 added "intsts != 0xffffffff" to "while()" condition and thus changed some timings.

Important part is that removing "intsts != 0xffffffff" form "while()" condition fixes the issue, but suspend stops working again.

Comment 71 Warner Losh freebsd_committer

2023-09-10 19:33:10 UTC

This and Mark's comment lead me to believe we are getting stray interrupts of various flavors. The 0xfffffff hack will keep things working while we're accessing the device in an isr with the card unmapped or asleep.I think this also suggests some sloppiness with establishing or tearing down interrupts. Marks case of getting an interrupt before things are complete is another example.

Comment 72 Ivan Rozhuk 2023-09-11 02:38:37 UTC

(In reply to Oleh Hushchenkov from comment #69)

I suggest commit 2 and 3 together to make sure that we never see this panic again.

Comment 73 Oleh Hushchenkov 2023-09-12 07:26:32 UTC

(In reply to Warner Losh from comment #71)

So how it should be fixed it the right way?

The issue affects many different laptops/computers, looks like AMD based only. Thas said we can't blame buggy hardware/firmware. Too many different devices from different vendors. At least my laptop running latest BIOS/UEFI. At the same time Linux works well on such hardware.

Comment 74 Mark Johnston freebsd_committer

2023-09-12 08:04:59 UTC

(In reply to Oleh Hushchenkov from comment #69)
The right way is option number 3.  I'll get a patch along those lines reviewed and committed later this week.

Comment 75 Oleh Hushchenkov 2023-09-12 08:16:08 UTC

(In reply to Mark Johnston from comment #74)

I got it. Thank you Mark.

Comment 76 Mark Johnston freebsd_committer

2023-09-16 09:50:46 UTC

https://reviews.freebsd.org/D41883

Any additional testing would be welcome.

Comment 77 Ivan Rozhuk 2023-09-16 10:11:01 UTC

(In reply to Mark Johnston from comment #76)
Why not add additional checks from https://bugs.freebsd.org/bugzilla/attachment.cgi?id=244703&action=diff ?

Comment 78 Mark Johnston freebsd_committer

2023-09-16 11:31:44 UTC

(In reply to Ivan Rozhuk from comment #77)
Because they do not fix the underlying problem, and the second check is incorrect.

Comment 79 Alexander Sherikov 2023-09-18 04:30:47 UTC

(In reply to Mark Johnston from comment #76)
Works for me on Tuxedo laptop (https://bsd-hardware.info/?probe=6f779d5170#pci:1002-1637-1d05-109f), FreeBSD 15.0-CURRENT #2 main-n265359-d643925a79ca-dirty.

Comment 80 commit-hook freebsd_committer

2023-09-27 12:37:06 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=015daf5221f7588b9258fe0242cee09bde39fe21

commit 015daf5221f7588b9258fe0242cee09bde39fe21
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-09-27 12:23:58 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-09-27 12:28:27 +0000

    hdac: Defer interrupt allocation in hdac_attach()

    hdac_attach() registers an interrupt handler before allocating various
    driver resources which are accessed by the interrupt handler.  On some
    platforms we observe what appear to be spurious interrupts upon a cold
    boot, resulting in panics.

    Partially work around the problem by deferring irq allocation until
    after other resources are allocated.  I think this is not a complete
    solution, but is correct and sufficient to work around the problems
    reported in the PR.

    PR:             268393
    Tested by:      Alexander Sherikov <asherikov@yandex.com>
    Tested by:      Oleh Hushchenkov <o.hushchenkov@gmail.com>
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D41883

 sys/dev/sound/pci/hda/hdac.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comment 81 Mark Johnston freebsd_committer

2023-09-27 12:50:19 UTC

*** Bug 264305 has been marked as a duplicate of this bug. ***

Comment 82 commit-hook freebsd_committer

2023-10-04 13:44:09 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=32ca3712a8c4295d14770ad9cc55c1c837d834ad

commit 32ca3712a8c4295d14770ad9cc55c1c837d834ad
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-09-27 12:23:58 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-04 13:43:14 +0000

    hdac: Defer interrupt allocation in hdac_attach()

    hdac_attach() registers an interrupt handler before allocating various
    driver resources which are accessed by the interrupt handler.  On some
    platforms we observe what appear to be spurious interrupts upon a cold
    boot, resulting in panics.

    Partially work around the problem by deferring irq allocation until
    after other resources are allocated.  I think this is not a complete
    solution, but is correct and sufficient to work around the problems
    reported in the PR.

    PR:             268393
    Tested by:      Alexander Sherikov <asherikov@yandex.com>
    Tested by:      Oleh Hushchenkov <o.hushchenkov@gmail.com>
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D41883

    (cherry picked from commit 015daf5221f7588b9258fe0242cee09bde39fe21)

 sys/dev/sound/pci/hda/hdac.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comment 83 commit-hook freebsd_committer

2023-10-04 13:44:15 UTC

A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1e8737f4e884fdf4b966056662e4e6003d3379d9

commit 1e8737f4e884fdf4b966056662e4e6003d3379d9
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-09-27 12:23:58 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-04 13:41:52 +0000

    hdac: Defer interrupt allocation in hdac_attach()

    hdac_attach() registers an interrupt handler before allocating various
    driver resources which are accessed by the interrupt handler.  On some
    platforms we observe what appear to be spurious interrupts upon a cold
    boot, resulting in panics.

    Partially work around the problem by deferring irq allocation until
    after other resources are allocated.  I think this is not a complete
    solution, but is correct and sufficient to work around the problems
    reported in the PR.

    PR:             268393
    Tested by:      Alexander Sherikov <asherikov@yandex.com>
    Tested by:      Oleh Hushchenkov <o.hushchenkov@gmail.com>
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D41883

    (cherry picked from commit 015daf5221f7588b9258fe0242cee09bde39fe21)

 sys/dev/sound/pci/hda/hdac.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comment 84 Mark Johnston freebsd_committer

2023-10-04 13:48:23 UTC

The patch will appear in the next 14.0 pre-release build.

Comment 85 Jonathan Vasquez 2023-10-04 14:20:57 UTC

Thanks for all the hard work everyone on this. Much appreciated!

Comment 86 commit-hook freebsd_committer

2023-10-05 15:10:34 UTC

A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=901d81c3e0f43cb0e4e10bb42ab9f0a71cfcda0a

commit 901d81c3e0f43cb0e4e10bb42ab9f0a71cfcda0a
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-09-27 12:23:58 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-05 14:01:19 +0000

    hdac: Defer interrupt allocation in hdac_attach()

    hdac_attach() registers an interrupt handler before allocating various
    driver resources which are accessed by the interrupt handler.  On some
    platforms we observe what appear to be spurious interrupts upon a cold
    boot, resulting in panics.

    Partially work around the problem by deferring irq allocation until
    after other resources are allocated.  I think this is not a complete
    solution, but is correct and sufficient to work around the problems
    reported in the PR.

    Approved by:    re (gjb)
    PR:             268393
    Tested by:      Alexander Sherikov <asherikov@yandex.com>
    Tested by:      Oleh Hushchenkov <o.hushchenkov@gmail.com>
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D41883

    (cherry picked from commit 015daf5221f7588b9258fe0242cee09bde39fe21)
    (cherry picked from commit 1e8737f4e884fdf4b966056662e4e6003d3379d9)

 sys/dev/sound/pci/hda/hdac.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)