Bug 282241 - panic on boot on Parallels arm64
Summary: panic on boot on Parallels arm64
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Andrew Turner
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2024-10-21 10:35 UTC by Edward Tomasz Napierala
Modified: 2024-11-25 05:32 UTC (History)
2 users (show)

See Also:
linimon: mfc-stable14?
linimon: mfc-stable13?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Edward Tomasz Napierala freebsd_committer freebsd_triage 2024-10-21 10:35:44 UTC
Trying to boot new FreeBSD CURRENT on Parallels results in panic on boot which looks like this:

panic: sleepq_add: td XXX to sleep on wchan XXX with sleeping prohibited

Backtrace suggests ACPI:

panic()
sleepq_add()
_sleep()
AcpiOsAcquireMutex()
AcpiUtAcquireMutex()
AcpiExEnterInterpreter()
AcpiEvaluateObject()
ithread_loop()
Comment 1 Edward Tomasz Napierala freebsd_committer freebsd_triage 2024-10-21 10:37:49 UTC
Correction: seems triggered by devmatch(8) during boot, not kernel boot itself.
Comment 2 Edward Tomasz Napierala freebsd_committer freebsd_triage 2024-10-21 10:44:23 UTC
But also happens randomly after booting even with devmatch disabled.
Comment 3 Edward Tomasz Napierala freebsd_committer freebsd_triage 2024-10-24 09:43:43 UTC
I finally figured how to reproduce it: just pull the power cable from the laptop the VM is running on and then plug it in again.  I can't get a proper backtrace for some reason:

(kgdb) bt
#0  0xffff0000004b5da0 in doadump (textdump=0) at /usr/home/trasz/git/freebsd/sys/kern/kern_shutdown.c:404
#1  0x96be0000000e9bf0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Comment 4 Edward Tomasz Napierala freebsd_committer freebsd_triage 2024-10-24 10:35:11 UTC
Reverting 9129a1c39172 ("arm64: Switch to ACPI by default") gets rid of the panic.
Comment 5 Zhenlei Huang freebsd_committer freebsd_triage 2024-10-24 15:27:50 UTC
(In reply to Edward Tomasz Napierala from comment #4)
> Reverting 9129a1c39172 ("arm64: Switch to ACPI by default") gets rid of the panic.

No git revision 9129a1c39172 found. Probably that is 33f2cf4ad460 ("arm64: Switch to ACPI by default").

CC the author Andrew who may have glues for this panic.
Comment 6 Andrew Turner freebsd_committer freebsd_triage 2024-10-25 08:32:03 UTC
It's probably coming from acpi_ged0 as it has some events that look like they are triggered on battery status change.

You could try setting debug.acpi.ged_defer to 1 to see if that works around the issue.
Comment 7 Edward Tomasz Napierala freebsd_committer freebsd_triage 2024-10-27 12:47:07 UTC
Setting debug.acpi.ged_defer to 1 does indeed work around the panic.  However, it spams the console with "AcpiOsExecute: failed to enqueue task, consider increasing the debug.acpi.max_tasks tunable", with vmstat reporting over 50% system CPU.
Comment 8 Andrew Turner freebsd_committer freebsd_triage 2024-11-08 16:51:19 UTC
Can you try with https://reviews.freebsd.org/D47487. It looks like the issue is the interrupt is configured as level when it should be edge so we get an interrupt storm leading to lock contention that causes the ithread to sleep waiting for the mutex to be unlocked.
Comment 9 Edward Tomasz Napierala freebsd_committer freebsd_triage 2024-11-09 03:43:20 UTC
I've tested D47487 and it fixes the problem, thank you!  It also avoids the CPU consumption problem when debug.acpi.ged_defer is set to 1.
Comment 10 commit-hook freebsd_committer freebsd_triage 2024-11-19 17:47:31 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a1330a71d20d862eb9d930d87245f23ee4853527

commit a1330a71d20d862eb9d930d87245f23ee4853527
Author:     Andrew Turner <andrew@FreeBSD.org>
AuthorDate: 2024-11-18 15:29:42 +0000
Commit:     Andrew Turner <andrew@FreeBSD.org>
CommitDate: 2024-11-19 17:14:42 +0000

    acpi: Handle multiple interrupts

    When multiple IRQs are specified in a single resource then we only
    check the first. Change this to check all interrupts for the value
    we expect to find.

    Without this we may still enable the interrupt, but it can have the
    wrong polatiry or trigger. This can cause an interrupt storm if the
    interrupt was configured with a level trigger when it should have
    been an edge.

    PR:             282241
    Reported by:    trasz
    Sponsored by:   Arm Ltd
    Differential Revision:  https://reviews.freebsd.org/D47487

 sys/dev/acpica/acpi_resource.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)