Beagle Bone fails to boot in HW test lab since this build: https://ci.freebsd.org/hwlab/job/FreeBSD-device-head-beaglebone-test/808/ The build contains changes between r354707 and r354730. Console log: https://ci.freebsd.org/hwlab/job/FreeBSD-device-head--test/808/artifact/device_tests/beaglebone.boot.log/*view*/ FreeBSD 13.0-CURRENT #65 r354730: Fri Nov 15 14:24:51 UTC 2019 root@FreeBSD-head-armv7-build_devtest.jail:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm FreeBSD clang version 9.0.0 (tags/RELEASE_900/final 372316) (based on LLVM 9.0.0) WARNING: WITNESS option enabled, expect reduced performance. panic: acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) pmap @ /usr/src/sys/arm/arm/pmap-v6.c:6473 cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self() at db_trace_self pc = 0xc05d49bc lr = 0xc007a374 (db_trace_self_wrapper+0x30) sp = 0xc0e148f0 fp = 0xc0e14a08 db_trace_self_wrapper() at db_trace_self_wrapper+0x30 pc = 0xc007a374 lr = 0xc02b0e68 (vpanic+0x164) sp = 0xc0e14a10 fp = 0xc0e14a30 r4 = 0x00000100 r5 = 0xc0b1b920 r6 = 0xc06e666a r7 = 0x00000000 vpanic() at vpanic+0x164 pc = 0xc02b0e68 lr = 0xc02b0c34 (doadump) sp = 0xc0e14a38 fp = 0xc0e14a3c r4 = 0x00001949 r5 = 0xc06c0a49 r6 = 0xc06e0a68 r7 = 0x00000000 r8 = 0xc0e8e450 r9 = 0xc071206e r10 = 0xc0b3eda0 doadump() at doadump pc = 0xc02b0c34 lr = 0xc0320d38 (witness_checkorder+0xd0c) sp = 0xc0e14a44 fp = 0xc0e14aa0 r4 = 0xc0b3eda0 r5 = 0xc0e14a3c r6 = 0xc02b0c34 r10 = 0xc0e14a44 witness_checkorder() at witness_checkorder+0xd0c pc = 0xc0320d38 lr = 0xc0294ee4 (__mtx_lock_flags+0xb4) sp = 0xc0e14aa8 fp = 0xc0e14ad0 r4 = 0xc0b3edb0 r5 = 0x00000000 r6 = 0x00000000 r7 = 0x00000000 r8 = 0xc071206e r9 = 0x00001949 r10 = 0xc0b3eda0 __mtx_lock_flags() at __mtx_lock_flags+0xb4 pc = 0xc0294ee4 lr = 0xc05f1e60 (pmap_fault+0x74) sp = 0xc0e14ad8 fp = 0xc0e14af8 r4 = 0x00000005 r5 = 0x000002d4 r6 = 0x00000005 r7 = 0xc0b3eda0 r8 = 0xc0b3edb0 r9 = 0xc0b40d80 r10 = 0x00000000 pmap_fault() at pmap_fault+0x74 pc = 0xc05f1e60 lr = 0xc05f6968 (abort_handler+0x10c) sp = 0xc0e14b00 fp = 0xc0e14b98 r4 = 0x00000005 r5 = 0x00000000 r6 = 0xc0e14ba0 r7 = 0x00000005 r8 = 0x00000013 r9 = 0xc0b1b920 r10 = 0x000002d4 abort_handler() at abort_handler+0x10c pc = 0xc05f6968 lr = 0xc05d72dc (exception_exit) sp = 0xc0e14ba0 fp = 0xc0e14ca8 r4 = 0x00000000 r5 = 0xc0e14c78 r6 = 0xc0e14c88 r7 = 0x00000000 r8 = 0xc0b1b920 r9 = 0x00000000 r10 = 0xc0a9ee98 exception_exit() at exception_exit pc = 0xc05d72dc lr = 0xc0ae231c (0xc0ae231c) sp = 0xc0e14c30 fp = 0xc0e14ca8 r0 = 0x00000000 r1 = 0x00000000 r2 = 0x00000001 r3 = 0xc0b1b920 r4 = 0x00000000 r5 = 0xc0e14c78 r6 = 0xc0e14c88 r7 = 0x00000000 r8 = 0xc0b1b920 r9 = 0x00000000 r10 = 0xc0a9ee98 r12 = 0xc08b4f38 _sx_xlock_hard() at _sx_xlock_hard+0x2d0 pc = 0xc02ba3f0 lr = 0xc02ba080 (_sx_xlock+0x124) sp = 0xc0e14cb0 fp = 0xc0e14cd8 r4 = 0xc06c03ab r5 = 0x00000250 r6 = 0xc0a9ee98 r7 = 0xc0a9ee88 r8 = 0x00000000 r9 = 0xc0ae3408 r10 = 0xc0b31d00 _sx_xlock() at _sx_xlock+0x124 pc = 0xc02ba080 lr = 0xc0285174 (linker_file_foreach+0x2c) sp = 0xc0e14ce0 fp = 0xc0e14cf0 r4 = 0xc0e14d00 r5 = 0xc05f7d20 r6 = 0xc05f3a20 r7 = 0xc0879e3c r8 = 0xc0e8e520 r10 = 0xc0b31d00 linker_file_foreach() at linker_file_foreach+0x2c pc = 0xc0285174 lr = 0xc05f7918 (unwind_stack_one+0x4c) sp = 0xc0e14cf8 fp = 0xc0e14d28 r4 = 0xc0e14d34 r5 = 0x00000000 r6 = 0xc05f3a20 r7 = 0xc0879e3c unwind_stack_one() at unwind_stack_one+0x4c pc = 0xc05f7918 lr = 0xc05f3a64 (stack_save+0x44) sp = 0xc0e14d30 fp = 0xc0e14d90 r4 = 0xc0b0e7b4 r5 = 0xc0e14d34 r6 = 0xc0e8d1a0 r7 = 0xc0b0e7b4 r8 = 0xc0e8e520 r9 = 0xc0ae3408 r10 = 0xc0b31d00 stack_save() at stack_save+0x44 pc = 0xc05f3a64 lr = 0xc0320540 (witness_checkorder+0x514) sp = 0xc0e14d98 fp = 0xc0e14de8 r4 = 0x00000004 r5 = 0x00000034 witness_checkorder() at witness_checkorder+0x514 pc = 0xc0320540 lr = 0xc0294ee4 (__mtx_lock_flags+0xb4) sp = 0xc0e14df0 fp = 0xc0e14e18 r4 = 0xc0b31d10 r5 = 0x00000000 r6 = 0x00000000 r7 = 0x00000000 r8 = 0xc0701b45 r9 = 0x00000339 r10 = 0xc0b31d00 __mtx_lock_flags() at __mtx_lock_flags+0xb4 pc = 0xc0294ee4 lr = 0xc05b0950 (vm_page_startup+0x8fc) sp = 0xc0e14e20 fp = 0xc0e14e50 r4 = 0x87f00000 r5 = 0xc11ff000 r6 = 0x80000000 r7 = 0xc0701b45 r8 = 0xc0b31d10 r9 = 0x00000000 r10 = 0xc0b31c00 vm_page_startup() at vm_page_startup+0x8fc pc = 0xc05b0950 lr = 0xc059df98 (vm_mem_init+0x20) sp = 0xc0e14e58 fp = 0xc0e14e60 r4 = 0xc0b40a44 r5 = 0xc0a9e61c r6 = 0x00800001 r7 = 0x00000000 r8 = 0x01000000 r9 = 0xc0b1b90c r10 = 0xc0b1b8fc vm_mem_init() at vm_mem_init+0x20 pc = 0xc059df98 lr = 0xc024ad64 (mi_startup+0x2a4) sp = 0xc0e14e68 fp = 0xc0e14e90 r4 = 0xc090503c r10 = 0xc0b1b8fc mi_startup() at mi_startup+0x2a4 pc = 0xc024ad64 lr = 0xc00002c4 (_start+0x144) sp = 0xc0e14e98 fp = 0x00000000 r4 = 0xc00003f8 r5 = 0xc0b78000 r6 = 0x9cf1f8c0 r7 = 0x00c52078 r8 = 0xc0d46000 r9 = 0x9cf09380 r10 = 0x00000000 _start() at _start+0x144 pc = 0xc00002c4 lr = 0xc00002c4 (_start+0x144) sp = 0xc0e14e98 fp = 0x00000000 KDB: enter: panic [ thread pid 0 tid 0 ] Stopped at kdb_enter+0x58: ldrb r15, [r15, r15, ror r15]! db>
ian: it seems that only r354709 and 354710 are directly related to sys/arm, could you help to find the root cause?
(In reply to Li-Wen Hsu from comment #1) The fact that stack unwinding and linker_file_foreach appears in the backtrace means I think I know what's happening, and it's my bad. But I don't know offhand how to fix it, I'll start looking into it.
A commit references this bug: Author: ian Date: Sun Nov 24 21:08:57 UTC 2019 New revision: 355069 URL: https://svnweb.freebsd.org/changeset/base/355069 Log: When doing ARM stack unwinding as part of stack_save(9), do not search loaded modules (pass 0/false for the can_lock arg). Searching the unwind info in modules acquires an exclusive sxlock, and the stack(9) functions can be called in a context where unbounded sleeps are forbidden (such as from the witness checkorder code). Just ignoring the existence of modules in stack_save() is not ideal, so I'm looking for a better solution, but this commit will make it possible to boot an ARM kernel with WITNESS enabled again, until I get something better. PR: 242200 Changes: head/sys/arm/arm/stack_machdep.c
I forgot to cite this PR number when I committed a real fix for this problem: Author: ian Date: Sun Dec 15 21:16:35 2019 New Revision: 355780 URL: https://svnweb.freebsd.org/changeset/base/355780 Log: Rewrite arm kernel stack unwind code to work when unwinding through modules. The arm kernel stack unwinder has apparently never been able to unwind when the path of execution leads through a kernel module. There was code that tried to handle modules by looking for the unwind data in them, but it did so by trying to find symbols which have never existed in arm kernel modules. That caused the unwind code to panic, and because part of panic handling calls into the unwind code, that just created a recursion loop. Locating the unwind data in a loaded module requires accessing the Elf section headers to find the SHT_ARM_EXIDX section. For preloaded modules those headers are present in a metadata blob. For dynamically loaded modules, the headers are present only while the loading is in progress; the memory is freed once the module is ready to use. For that reason, there is new code in kern/link_elf.c, wrapped in #ifdef __arm__, to extract the unwind info while the headers are loaded. The values are saved into new fields in the linker_file structure which are also conditional on __arm__. In arm/unwind.c there is new code to locally cache the per-module info needed to find the unwind tables. The local cache is crafted for lockless read access, because the unwind code often needs to run in context where sleeping is not allowed. A large comment block describes the local cache list, so I won't repeat it all here. Modified: head/sys/arm/arm/elf_machdep.c head/sys/arm/arm/unwind.c head/sys/arm/include/stack.h head/sys/kern/kern_linker.c head/sys/kern/link_elf.c head/sys/sys/linker.h
A commit references this bug: Author: ian Date: Fri Jan 17 15:55:17 UTC 2020 New revision: 356833 URL: https://svnweb.freebsd.org/changeset/base/356833 Log: MFC r354709-r354710, r355069, r355439, r355780, r356273, r356472 r354709: Rewrite arm/stack_machdep.c for EABI; add stack(9) support to arm kernels. The old stack_machdep.c code was written for the APCS ABI (aka "oldabi"). When we switched to ARM EABI (back in freebsd 10) this file never got updated, and apparently nobody noticed that until now. The new implementation uses the same stack unwinder code used by the arm implemenation of the db_trace stuff. r354710: Compile in arm/unwind.c if options STACK is in effect; the new arm stack(9) code now uses unwind.c. r355069: When doing ARM stack unwinding as part of stack_save(9), do not search loaded modules (pass 0/false for the can_lock arg). Searching the unwind info in modules acquires an exclusive sxlock, and the stack(9) functions can be called in a context where unbounded sleeps are forbidden (such as from the witness checkorder code). Just ignoring the existence of modules in stack_save() is not ideal, so I'm looking for a better solution, but this commit will make it possible to boot an ARM kernel with WITNESS enabled again, until I get something better. PR: 242200 r355439: Declare the global kernel symbols created by ldscript.arm in arm's machdep.h, and remove a couple scattered local declarations. Most of these aren't referenced in C code (there are some references in asm code), and they also aren't documented anywhere. This helps a bit with the latter. r355780: Rewrite arm kernel stack unwind code to work when unwinding through modules. The arm kernel stack unwinder has apparently never been able to unwind when the path of execution leads through a kernel module. There was code that tried to handle modules by looking for the unwind data in them, but it did so by trying to find symbols which have never existed in arm kernel modules. That caused the unwind code to panic, and because part of panic handling calls into the unwind code, that just created a recursion loop. Locating the unwind data in a loaded module requires accessing the Elf section headers to find the SHT_ARM_EXIDX section. For preloaded modules those headers are present in a metadata blob. For dynamically loaded modules, the headers are present only while the loading is in progress; the memory is freed once the module is ready to use. For that reason, there is new code in kern/link_elf.c, wrapped in #ifdef __arm__, to extract the unwind info while the headers are loaded. The values are saved into new fields in the linker_file structure which are also conditional on __arm__. In arm/unwind.c there is new code to locally cache the per-module info needed to find the unwind tables. The local cache is crafted for lockless read access, because the unwind code often needs to run in context where sleeping is not allowed. A large comment block describes the local cache list, so I won't repeat it all here. r356273: Since arm/unwind.c s conditionally compiled, only call functions in it when one of those conditions is true. Fixes build failure on kernel configs with no debugging options active. r356472: Add #ifdef option-test wrappers around another call to an arm/unwind.c function which is only compiled-in with certain options. Why is it always the most trivial part of a big commit that takes 3 tries to get right? Changes: _U stable/12/ stable/12/sys/arm/arm/elf_machdep.c stable/12/sys/arm/arm/mp_machdep.c stable/12/sys/arm/arm/stack_machdep.c stable/12/sys/arm/arm/unwind.c stable/12/sys/arm/conf/std.armv6 stable/12/sys/arm/conf/std.armv7 stable/12/sys/arm/include/machdep.h stable/12/sys/arm/include/stack.h stable/12/sys/conf/files.arm stable/12/sys/kern/kern_linker.c stable/12/sys/kern/link_elf.c stable/12/sys/sys/linker.h