Bug 250248 - 12.2-RC2 kernel reboots with hw.acpi.debug=1 after a warm reboot.
Summary: 12.2-RC2 kernel reboots with hw.acpi.debug=1 after a warm reboot.
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: amd64 Any
: --- Affects Many People
Assignee: Alexander Motin
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2020-10-10 12:03 UTC by Rajeev Pillai
Modified: 2020-10-15 17:51 UTC (History)
2 users (show)

See Also:


Attachments
Screenshot of kernel panic message (357.22 KB, image/jpeg)
2020-10-12 12:18 UTC, Rajeev Pillai
no flags Details
patch 1: add "device_shutdown" method hook. (407 bytes, patch)
2020-10-13 12:57 UTC, Rajeev Pillai
no flags Details | Diff
patch 2: enable interrupts after codec initialization. (900 bytes, patch)
2020-10-13 13:06 UTC, Rajeev Pillai
no flags Details | Diff
patch 3: added check: device_is_attached() (418 bytes, patch)
2020-10-14 04:40 UTC, Rajeev Pillai
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rajeev Pillai 2020-10-10 12:03:17 UTC
FreeBSD 12.2-RC2 reboots after printing a panic message when booted with hw.acpi.verbose=1 following a warm reboot.

HW: Asus X202E laptop.

Possibly related bug: #249952

Steps to reproduce:

1. Boot FreeBSD 12.2-RC2 with hw.acpi.verbose=1 set in loader.

2. At user login prompt, invoke a warm reboot with CTRL-ALT-DEL.

3. Boot 12.2-RC2 again with hw.acpi.verbose=1.

Kernel will flash an error message (after battery initialization is done),
and what looks like a backtrace, then reboot. If you power the laptop off
instead using CTRL-ALT-DEL, then there is no problem.
Comment 1 Andriy Gapon freebsd_committer 2020-10-12 08:11:49 UTC
Try to take a video of the boot process and then go frame by frame to find one where the panic message is clearly visible.
If you could do that, that would help a lot.
Comment 2 Rajeev Pillai 2020-10-12 12:18:32 UTC
Created attachment 218691 [details]
Screenshot of kernel panic message

Screenshot of kernel panic message.
Comment 3 Andriy Gapon freebsd_committer 2020-10-12 14:39:27 UTC
Looks like it's possibly related to HDA audio driver.
The crash is in hdac_unsol_task.
Comment 4 Andriy Gapon freebsd_committer 2020-10-12 14:44:14 UTC
I think that the problem could be that we enable interrupts a little bit too early, before initializing sc->codecs.  And it looks that in this case the hardware posts an event very early.
Comment 5 Rajeev Pillai 2020-10-12 21:41:24 UTC
(In reply to Andriy Gapon from comment #4)
Quite possible: If I delay things a bit by doing "gop set 0", I see this
after the warm reboot:
---
 hdacc0: unknown: <VIA VT1802_1 HDA CODEC> at cad 0Unexpected unsolicited response with tag 37: 94000000
 on hdac0
---
Comment 6 Andriy Gapon freebsd_committer 2020-10-13 05:57:03 UTC
(In reply to Rajeev Pillai from comment #5)
If you can some (light) code modifications and can build a kernel from sources, then I can try to "throw" a couple of ideas at you.

First, in hdac_attach2(), you can try moving a block of code that starts with printing of "Enabling controller interrupt" and goes up to "Scanning HDA codecs" to a place right after the loop that scans for codecs.

But I am not sure if that will really work.
There are calls to hdac_send_command() in the loop.  I am not sure if that function depends on interrupts.

Another idea is to move just HDAC_WRITE_4(... HDAC_GCTL_UNSOL) line.

Yet another idea is to modify hdac_methods[] array and add device_shutdown method there.  It can point to hdac_detach.
The idea is to completely reset the hardware _before_ the warm reboot, so it is more predictable afterwards.
Comment 7 Rajeev Pillai 2020-10-13 06:25:02 UTC
(In reply to Andriy Gapon from comment #6)
> If you can some (light) code modifications and can build a kernel from sources
> ...
> The idea is to completely reset the hardware _before_ the warm reboot, so it is
> more predictable afterwards.

Sure, I can do those code changes. I'll install 12.2-RC2 (was booting from USB
all this while) and let you know how those changes go.
Comment 8 Rajeev Pillai 2020-10-13 12:57:55 UTC
Created attachment 218713 [details]
patch 1: add "device_shutdown" method hook.

Adding a "device_shutdown" method hook works--once you are running the patched
kernel. However, going from GENERIC -> Ctrl-Alt-Del -> patched-GENERIC kernel
panics--as expected--because the "device_shutdown" method never gets called.
Comment 9 Rajeev Pillai 2020-10-13 13:06:42 UTC
Created attachment 218714 [details]
patch 2: enable interrupts after codec initialization.

This patch does interrupt enabling _after_ codec initialization.
This works just fine in all cases tested; even warm-booting from Linux into
FreeBSD.
Comment 10 Andriy Gapon freebsd_committer 2020-10-13 14:06:18 UTC
(In reply to Rajeev Pillai from comment #9)

Rajeev, thank you very much!
BTW, in the second patch you actually moved only setting of HDAC_GCTL_UNSOL.
The interrupts are enabled in the same place as before (the write to HDAC_INTCTL).
Apparently, this is sufficient.

Alexander,
what do you of these patches?
Especially, the second one.
Comment 11 Rajeev Pillai 2020-10-13 20:09:04 UTC
(In reply to Andriy Gapon from comment #10)
> in the second patch you actually moved only setting of HDAC_GCTL_UNSOL.

Yes, of course...sorry if my message was misleading. I should've written "moved
unsolicited messages enabling after codec initialization".

I didn't move the interrupt enabling bits because you said you weren't sure if
hdac_send_command() depended on interrupts. I decided to see if the least
intrusive changes would work, first, before doing a more wholesale cut-and-paste.

Cheers!
RVP
Comment 12 Alexander Motin freebsd_committer 2020-10-13 20:44:24 UTC
About the first patch adding shutdown handling I think it is an overkill for workaround and wrong approach in general, since there is no reason to detach the device on shutdown.  It is designed to flush some caches and do alike things, and I know only one kind of devices detaching on shutdown -- USB, though I don't know why Hans done it that way.

The second patch moving enabling UNSOL looks better to me, since I don't see what good for us could those message be before we even probe the codecs, not mentioning attaching them.  Theoretically, if we ever implement codec hot-plug (for which I've never had a hardware), it would be required to better handle UNSOL events during the CODEC probe, since it may happen during run time.  Meanwhile if this patch works -- great, but generally it looks like a partial case to me, since hdac_unsolq_flush() already checks for sc->codecs[cad].dev to be NULL, that is assigned the last by the probe loop inside hdac_attach2().  So it may just affect some timings, not really fix the issue.
Comment 13 Andriy Gapon freebsd_committer 2020-10-13 21:54:29 UTC
(In reply to Alexander Motin from comment #12)
I believe that the problem with too early unsolicited notifications is that dev can be not NULL but it is not attached yet (meaning that there are no driver methods).  bus_generic_attach() is called after the codec discovery loop.

So, an alternative solution could be for hdac_unsolq_flush() to check both dev != NULL and device_is_attached(dev).
Still could be somewhat racy.

Regarding the shutdown method -- yes, there is no need to detach HDA driver during shutdown.
At the same time, there is no need to keep it attached too.
So, I am ambivalent about that.
Comment 14 Alexander Motin freebsd_committer 2020-10-13 22:39:31 UTC
To me check for device_is_attached(dev) sounds the best possible, if it works.
Comment 15 Rajeev Pillai 2020-10-14 04:40:28 UTC
Created attachment 218726 [details]
patch 3: added check: device_is_attached()

If the patch above is what you intended, then, yes, it does fix the panic.
Comment 16 Andriy Gapon freebsd_committer 2020-10-14 08:03:38 UTC
(In reply to Rajeev Pillai from comment #15)
I think you nailed it.
Thank you very much again for the patch and the testing!
Comment 17 Rajeev Pillai 2020-10-14 08:25:13 UTC
(In reply to Andriy Gapon from comment #16)
No--You guys nailed it, actually. I've just been a test-monkey following both your
directions.

Feel free to close this bug as fixed. Many thanks.
Comment 18 commit-hook freebsd_committer 2020-10-15 17:41:00 UTC
A commit references this bug:

Author: mav
Date: Thu Oct 15 17:40:02 UTC 2020
New revision: 366733
URL: https://svnweb.freebsd.org/changeset/base/366733

Log:
  Drop unsolicited responses to the still attaching CODECs.

  It is reported to fix kernel panics when early unsolicited responses
  delivered to the CODEC device not having driver attached yet.

  PR:		250248
  Reported by:	Rajeev Pillai <rajeev_v_pillai@yahoo.com>
  Reviewed by:	avg
  MFC after:	2 weeks

Changes:
  head/sys/dev/sound/pci/hda/hdac.c
Comment 19 Alexander Motin freebsd_committer 2020-10-15 17:51:53 UTC
The last patch is committed to head.  I'll MFC in couple weeks.