This only happens every other week or so; I don't know what's the trigger, I believe the VM is idle when it happens. Looks like this: panic: sleepq_add: td 0xffff0000b661e640 to sleep on wchan 0xffffa000c0a0e380 with sleeping prohibited cpuid = 0 time = 1733037443 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1ac panic() at panic+0x48 sleepq_add() at sleepq_add+0x2c0 _sleep() at _sleep+0x1d4 AcpiOsAcquireMutex() at AcpiOsAcquireMutex+0x9c AcpiUtAcquireMutex() at AcpiUtAcquireMutex+0xac AcpiOsAcquireObject() at AcpiOsAcquireObject+0x1c AcpiUtCreateGenericState() at AcpiUtCreateGenericState+0x14 AcpiPsPushScope() at AcpiPsPushScope+0x28 AcpiPsParseLoop() at AcpiPsParseLoop+0x300 AcpiPsParseAml() at AcpiPsParseAml+0xb0 AcpiPsExecuteMethod() at AcpiPsExecuteMethod+0x138 AcpiNsEvaluate() at AcpiNsEvaluate+0x1d8 AcpiEvaluateObject() at AcpiEvaluateObject+0x13c ithread_loop() at ithread_loop+0x288 fork_exit() at fork_exit+0x78 fork_trampoline() at fork_trampoline+0x18 KDB: enter: panic For some reason I can't do backtrace with kgdb(1), but I've seen the same problem previously, so probably unrelated: (kgdb) bt #0 0xffff0000004b68f4 in doadump (textdump=0) at /usr/home/trasz/git/freebsd/sys/kern/kern_shutdown.c:404 #1 0x2cb00000000e9bf0 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) I took a VM snapshot this time with kernel still in ddb(4).
Do you know which driver is causing this? That is, can you dump info about the current thread from ddb, so we know which ithread it is?
Yup, it's: * 79 Thread 100070 (PID=12: intr/gic0,s16: acpi_ged0) 0xffff0000004b5aa8 in doadump (textdump=0) at /usr/home/trasz/git/freebsd/sys/kern/kern_shutdown.c:404 I should have more luck with the backtrace next time this happens.
The problem is the ged driver needs to call AcpiEvaluateObject from the ithread, however as taking the APCI lock can sleep if another thread already holds it the ithread can panic. We could either make the ACPI lock spin, or allow some ithreads to sleep. I have a patch to allow some ithreads to sleep, however am unsure if this is the best option.
Can't we defer it to a separate thread? There's the acpi_ged_defer sysctl; could it help? If so, could we flip it to 1 by default and call it done?
The sysctl was added in https://cgit.freebsd.org/src/commit/?id=be91b4797e2c to get the old behaviour, so that would be a regression.
Hm, you're right. Okay, perhaps a weird idea, but can we make AcpiOsAcquireMutex() return error when it can't take the mutex without blocking? ACPI seems to support it, eg AcpiOsAcquireObject() checks for that error return.
(In reply to Andrew Turner from comment #3) I've been thinking about sleeping ithreads for another reason, sorry for hijacking this bug report. There could be a chip connected, e.g., via I2C. And that chip could also be capable of generating a signal (an interrupt) via a GPIO line. It's not a problem to configure the GPIO interrupt but it may become a problem if the interrupt handler needs to access the chip via a bus that requires waiting / sleeping. I think that at present the described setup can only work if we force the bus to spinning / polling but that may be sub-optimal.
I don't see any particular reason an ithread must not sleep, so long as it is not shared with any other interrupt. I think INTR_EXCL provides the desired semantics there?
I've created https://reviews.freebsd.org/D48283 and https://reviews.freebsd.org/D48284 to allow sleepable ithreads, and use it in the GED driver.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=dbedcc169f70c924a680e02bc86d7419682a70ac commit dbedcc169f70c924a680e02bc86d7419682a70ac Author: Andrew Turner <andrew@FreeBSD.org> AuthorDate: 2025-01-13 05:37:52 +0000 Commit: Andrew Turner <andrew@FreeBSD.org> CommitDate: 2025-01-24 15:08:09 +0000 acpi: Mark the GED ithread as sleepable We need to complete calling the ACPI method before marking the interrupt as complete. If two threads are inspecting the ACPI tables at the same time they may both try to lock the ACPI mutex causing one to sleep. If this is the ithread it will panic the kernel as this is not allowed. Update the ged ithread to allow sleeping as it is expected this lock will be uncommon enough any sleep will be short. PR: 283103 Reviewed by: markj (earlier version) Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D48284 sys/dev/acpica/acpi_ged.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)