FreeBSD 12.2-RC2 reboots after printing a panic message when booted with hw.acpi.verbose=1 following a warm reboot. HW: Asus X202E laptop. Possibly related bug: #249952 Steps to reproduce: 1. Boot FreeBSD 12.2-RC2 with hw.acpi.verbose=1 set in loader. 2. At user login prompt, invoke a warm reboot with CTRL-ALT-DEL. 3. Boot 12.2-RC2 again with hw.acpi.verbose=1. Kernel will flash an error message (after battery initialization is done), and what looks like a backtrace, then reboot. If you power the laptop off instead using CTRL-ALT-DEL, then there is no problem.
Try to take a video of the boot process and then go frame by frame to find one where the panic message is clearly visible. If you could do that, that would help a lot.
Created attachment 218691 [details] Screenshot of kernel panic message Screenshot of kernel panic message.
Looks like it's possibly related to HDA audio driver. The crash is in hdac_unsol_task.
I think that the problem could be that we enable interrupts a little bit too early, before initializing sc->codecs. And it looks that in this case the hardware posts an event very early.
(In reply to Andriy Gapon from comment #4) Quite possible: If I delay things a bit by doing "gop set 0", I see this after the warm reboot: --- hdacc0: unknown: <VIA VT1802_1 HDA CODEC> at cad 0Unexpected unsolicited response with tag 37: 94000000 on hdac0 ---
(In reply to Rajeev Pillai from comment #5) If you can some (light) code modifications and can build a kernel from sources, then I can try to "throw" a couple of ideas at you. First, in hdac_attach2(), you can try moving a block of code that starts with printing of "Enabling controller interrupt" and goes up to "Scanning HDA codecs" to a place right after the loop that scans for codecs. But I am not sure if that will really work. There are calls to hdac_send_command() in the loop. I am not sure if that function depends on interrupts. Another idea is to move just HDAC_WRITE_4(... HDAC_GCTL_UNSOL) line. Yet another idea is to modify hdac_methods[] array and add device_shutdown method there. It can point to hdac_detach. The idea is to completely reset the hardware _before_ the warm reboot, so it is more predictable afterwards.
(In reply to Andriy Gapon from comment #6) > If you can some (light) code modifications and can build a kernel from sources > ... > The idea is to completely reset the hardware _before_ the warm reboot, so it is > more predictable afterwards. Sure, I can do those code changes. I'll install 12.2-RC2 (was booting from USB all this while) and let you know how those changes go.
Created attachment 218713 [details] patch 1: add "device_shutdown" method hook. Adding a "device_shutdown" method hook works--once you are running the patched kernel. However, going from GENERIC -> Ctrl-Alt-Del -> patched-GENERIC kernel panics--as expected--because the "device_shutdown" method never gets called.
Created attachment 218714 [details] patch 2: enable interrupts after codec initialization. This patch does interrupt enabling _after_ codec initialization. This works just fine in all cases tested; even warm-booting from Linux into FreeBSD.
(In reply to Rajeev Pillai from comment #9) Rajeev, thank you very much! BTW, in the second patch you actually moved only setting of HDAC_GCTL_UNSOL. The interrupts are enabled in the same place as before (the write to HDAC_INTCTL). Apparently, this is sufficient. Alexander, what do you of these patches? Especially, the second one.
(In reply to Andriy Gapon from comment #10) > in the second patch you actually moved only setting of HDAC_GCTL_UNSOL. Yes, of course...sorry if my message was misleading. I should've written "moved unsolicited messages enabling after codec initialization". I didn't move the interrupt enabling bits because you said you weren't sure if hdac_send_command() depended on interrupts. I decided to see if the least intrusive changes would work, first, before doing a more wholesale cut-and-paste. Cheers! RVP
About the first patch adding shutdown handling I think it is an overkill for workaround and wrong approach in general, since there is no reason to detach the device on shutdown. It is designed to flush some caches and do alike things, and I know only one kind of devices detaching on shutdown -- USB, though I don't know why Hans done it that way. The second patch moving enabling UNSOL looks better to me, since I don't see what good for us could those message be before we even probe the codecs, not mentioning attaching them. Theoretically, if we ever implement codec hot-plug (for which I've never had a hardware), it would be required to better handle UNSOL events during the CODEC probe, since it may happen during run time. Meanwhile if this patch works -- great, but generally it looks like a partial case to me, since hdac_unsolq_flush() already checks for sc->codecs[cad].dev to be NULL, that is assigned the last by the probe loop inside hdac_attach2(). So it may just affect some timings, not really fix the issue.
(In reply to Alexander Motin from comment #12) I believe that the problem with too early unsolicited notifications is that dev can be not NULL but it is not attached yet (meaning that there are no driver methods). bus_generic_attach() is called after the codec discovery loop. So, an alternative solution could be for hdac_unsolq_flush() to check both dev != NULL and device_is_attached(dev). Still could be somewhat racy. Regarding the shutdown method -- yes, there is no need to detach HDA driver during shutdown. At the same time, there is no need to keep it attached too. So, I am ambivalent about that.
To me check for device_is_attached(dev) sounds the best possible, if it works.
Created attachment 218726 [details] patch 3: added check: device_is_attached() If the patch above is what you intended, then, yes, it does fix the panic.
(In reply to Rajeev Pillai from comment #15) I think you nailed it. Thank you very much again for the patch and the testing!
(In reply to Andriy Gapon from comment #16) No--You guys nailed it, actually. I've just been a test-monkey following both your directions. Feel free to close this bug as fixed. Many thanks.
A commit references this bug: Author: mav Date: Thu Oct 15 17:40:02 UTC 2020 New revision: 366733 URL: https://svnweb.freebsd.org/changeset/base/366733 Log: Drop unsolicited responses to the still attaching CODECs. It is reported to fix kernel panics when early unsolicited responses delivered to the CODEC device not having driver attached yet. PR: 250248 Reported by: Rajeev Pillai <rajeev_v_pillai@yahoo.com> Reviewed by: avg MFC after: 2 weeks Changes: head/sys/dev/sound/pci/hda/hdac.c
The last patch is committed to head. I'll MFC in couple weeks.
A commit references this bug: Author: mav Date: Thu Oct 29 00:30:38 UTC 2020 New revision: 367124 URL: https://svnweb.freebsd.org/changeset/base/367124 Log: MFC r366733: Drop unsolicited responses to the still attaching CODECs. It is reported to fix kernel panics when early unsolicited responses delivered to the CODEC device not having driver attached yet. PR: 250248 Changes: _U stable/12/ stable/12/sys/dev/sound/pci/hda/hdac.c