Hi, I've upgraded my laptops Dell Vostro V13 and V130 to 11.1-BETA3. While on V13 everything is ok, on V130 it hangs during the boot. The last message is: acpi_tz0: <Thermal Zone> on acpi0 But booting in safe mode works. I can provide you more details if needed... TIA, Zdenek
(In reply to Zdenek Zavadil from comment #0) > Hi, > I've upgraded my laptops Dell Vostro V13 and V130 to 11.1-BETA3. While on > V13 everything is ok, on V130 it hangs during the boot. The last message is: > > acpi_tz0: <Thermal Zone> on acpi0 > > But booting in safe mode works. > Can you boot in verbose mode, and provide the output? (Or pictures, if you do not have serial console.) At the loader prompt, 'boot -v'.
Created attachment 183819 [details] the last screen of the boot I am only able to capture the last screen of the boot because it locks completely...
Created attachment 183820 [details] /var/log/messages And this is the /var/log/messages file from the normal boot of 11.0-RELEASE (for the comparison).
I've tried 11.1-RC1, unfortunately the problem persists. I've made some other experiments: only setting kern.smp.disabled=1 made it to boot again.
John, could this somehow be related to EARLY_AP_STARTUP?
Possibly. Can you boot via kern.smp.disabled=1 to get the box installed and then build a custom kernel that includes DDB? You can install that kernel as a test kernel (e.g. to /boot/test) see if Ctrl-Alt-Esc works to break into the db> prompt when it hangs. If so, please get the output of 'ps' and 'tr'.
I'm seeing identical symptoms on HEAD on a Dell Inspiron 5748. I'm not sure when the regression was introduced; for the past several months this laptop has been using a custom kernel config with which I'm able to boot. The config omits EARLY_AP_STARTUP, but I haven't verified whether adding it triggers a hang. I don't appear to be able to break into DDB at this point during boot; the keyboard driver hasn't attached yet.
Created attachment 184014 [details] DDB - "next" until it hangs Thanks for comments. I've successfully built the custom kernel and here are the outputs from DDB... The first one is the result of pressing "next" until it hangs.
Created attachment 184015 [details] DDB - "step" + "tr" The next one is the result of pressing "step" just before the hang (the previous screenshot) until it seems looping forever in sched_idled function... At the end is output of "tr".
Created attachment 184016 [details] DDB - "ps" And the last one is the output of "ps" (3 screenshots awkwardly pasted together) gotten at the same time as previous one. Would it be helpful to try to build another kernel without EARLY_AP_STARTUP? Thanks!
(In reply to Mark Johnston from comment #7) Mark, I also wasn't able to break into debugging using Ctrl-Alt-Esc, so I used "boot -d" from the boot loader.
I managed to capture the hang in ddb using boot -d. thread0 is mtx_sleep()ing in AcpiOsAcquireMutex() with a stack that goes through acpi_button_probe(). A taskqueue thread is pause()d with the following stack: pause_sbt() AcpiExSystemDoSleep() AcpiDsExecEndOp AcpiPsParseLoop() AcpiPsParseAml() AcpiPsExecuteMethod() AcpiNsEvaluate() AcpiEvaluateObject() acpi_cmbat_get_bst() acpi_cmbat_init_battery() <taskqueue stuff> I presume that this thread is holding the ACPI mutex that thread0 is waiting for. So it seems that r310336 wasn't sufficient.
(In reply to Zdenek Zavadil from comment #9) Just to be more specific, it seems to be looping among these functions (in this order): sched_idletd cpu_idle sched_runnable cpu_idle
(In reply to Zdenek Zavadil from comment #13) That would just be an idle thread running. The bug appears to be a sort of deadlock, and the participating threads are not on-CPU while the hang occurs.
(In reply to Mark Johnston from comment #12) It's probably worth looking at the stacks of all the other threads to find which one owns the lock and then determine why it isn't running. Are there any other threads that are runnable? It might be that we need to have pause() on thread0 still not sleep via callouts, but perhaps it needs to call mi_switch/sched_yield in a loop to give other runnable threads a chance in conjunction with spinning.
(In reply to John Baldwin from comment #15) The taskqueue thread owns the lock. There aren't any other runnable threads. The issue is that the taskqueue thread can't wake up until the eventtimer is configured during SI_SUB_CLOCK. thread0 is blocked on the lock held by the taskqueue thread and is still in SI_SUB_CONFIGURE.
(In reply to Mark Johnston from comment #16) We could defer handling ACPI tasks until timers are working. That is probably the simplest / shortest solution. More complicated would be to fix pause() to always use sched_yield + spinning for all threads (not just thread0 as I suggested in my previous comment). The first approach can be tried by changing the SYSINIT for acpi_taskq_init in sys/dev/acpica/Osd/OsdSchedule.c: Index: sys/dev/acpica/Osd/OsdSchedule.c =================================================================== --- sys/dev/acpica/Osd/OsdSchedule.c (revision 320674) +++ sys/dev/acpica/Osd/OsdSchedule.c (working copy) @@ -128,7 +128,8 @@ acpi_taskq_started = 1; } -SYSINIT(acpi_taskq, SI_SUB_CONFIGURE, SI_ORDER_SECOND, acpi_taskq_init, NULL); +SYSINIT(acpi_taskq, SI_SUB_KICK_SCHEDULER, SI_ORDER_SECOND, acpi_taskq_init, + NULL); /* * Bounce through this wrapper function since ACPI-CA doesn't understand
(In reply to John Baldwin from comment #17) That seems to do the trick in my case, and is probably the best way to solve this for 11.1.
Can you please commit the fix to HEAD? We defer starting other ACPI threads (such as the ACPI thermal zone threads) until this same time anyway. If we get another glitch with EARLY_AP_STARTUP for 11.1 we will just disable it though.
(In reply to John Baldwin from comment #19) Sure, will do.
A commit references this bug: Author: markj Date: Wed Jul 5 17:39:17 UTC 2017 New revision: 320690 URL: https://svnweb.freebsd.org/changeset/base/320690 Log: Defer ACPI taskqueue creation to SI_SUB_KICK_SCHEDULER. This addresses a deadlock during boot when EARLY_AP_STARTUP is configured: a taskqueue thread may call pause() with an ACPI mutex held, and thread0 may block on this mutex before configuring the eventtimer. In this case the taskqueue thread will sleep forever waiting for its callout to fire. PR: 220277 Submitted by: jhb MFC after: 3 days Changes: head/sys/dev/acpica/Osd/OsdSchedule.c
A commit references this bug: Author: markj Date: Thu Jul 6 17:20:36 UTC 2017 New revision: 320744 URL: https://svnweb.freebsd.org/changeset/base/320744 Log: MFC r320690: Defer ACPI taskqueue creation to SI_SUB_KICK_SCHEDULER. PR: 220277 Changes: _U stable/11/ stable/11/sys/dev/acpica/Osd/OsdSchedule.c
A commit references this bug: Author: markj Date: Thu Jul 6 17:31:39 UTC 2017 New revision: 320746 URL: https://svnweb.freebsd.org/changeset/base/320746 Log: MFS r320744: MFC r320690: Defer ACPI taskqueue creation to SI_SUB_KICK_SCHEDULER. PR: 220277 Approved by: re (gjb) Changes: _U releng/11.1/ releng/11.1/sys/dev/acpica/Osd/OsdSchedule.c