Bug 242200 - beaglebone fails to boot after a revision between r354707 and r354730
Summary: beaglebone fails to boot after a revision between r354707 and r354730
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: arm Any
: --- Affects Only Me
Assignee: Ian Lepore
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-24 18:13 UTC by Li-Wen Hsu
Modified: 2020-01-17 15:56 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Li-Wen Hsu freebsd_committer 2019-11-24 18:13:35 UTC
Beagle Bone fails to boot in HW test lab since this build: https://ci.freebsd.org/hwlab/job/FreeBSD-device-head-beaglebone-test/808/

The build contains changes between r354707 and r354730.

Console log:

https://ci.freebsd.org/hwlab/job/FreeBSD-device-head--test/808/artifact/device_tests/beaglebone.boot.log/*view*/

FreeBSD 13.0-CURRENT #65 r354730: Fri Nov 15 14:24:51 UTC 2019
    root@FreeBSD-head-armv7-build_devtest.jail:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm
FreeBSD clang version 9.0.0 (tags/RELEASE_900/final 372316) (based on LLVM 9.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
panic: acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) pmap @ /usr/src/sys/arm/arm/pmap-v6.c:6473
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
         pc = 0xc05d49bc  lr = 0xc007a374 (db_trace_self_wrapper+0x30)
         sp = 0xc0e148f0  fp = 0xc0e14a08
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
         pc = 0xc007a374  lr = 0xc02b0e68 (vpanic+0x164)
         sp = 0xc0e14a10  fp = 0xc0e14a30
         r4 = 0x00000100  r5 = 0xc0b1b920
         r6 = 0xc06e666a  r7 = 0x00000000
vpanic() at vpanic+0x164
         pc = 0xc02b0e68  lr = 0xc02b0c34 (doadump)
         sp = 0xc0e14a38  fp = 0xc0e14a3c
         r4 = 0x00001949  r5 = 0xc06c0a49
         r6 = 0xc06e0a68  r7 = 0x00000000
         r8 = 0xc0e8e450  r9 = 0xc071206e
        r10 = 0xc0b3eda0
doadump() at doadump
         pc = 0xc02b0c34  lr = 0xc0320d38 (witness_checkorder+0xd0c)
         sp = 0xc0e14a44  fp = 0xc0e14aa0
         r4 = 0xc0b3eda0  r5 = 0xc0e14a3c
         r6 = 0xc02b0c34 r10 = 0xc0e14a44
witness_checkorder() at witness_checkorder+0xd0c
         pc = 0xc0320d38  lr = 0xc0294ee4 (__mtx_lock_flags+0xb4)
         sp = 0xc0e14aa8  fp = 0xc0e14ad0
         r4 = 0xc0b3edb0  r5 = 0x00000000
         r6 = 0x00000000  r7 = 0x00000000
         r8 = 0xc071206e  r9 = 0x00001949
        r10 = 0xc0b3eda0
__mtx_lock_flags() at __mtx_lock_flags+0xb4
         pc = 0xc0294ee4  lr = 0xc05f1e60 (pmap_fault+0x74)
         sp = 0xc0e14ad8  fp = 0xc0e14af8
         r4 = 0x00000005  r5 = 0x000002d4
         r6 = 0x00000005  r7 = 0xc0b3eda0
         r8 = 0xc0b3edb0  r9 = 0xc0b40d80
        r10 = 0x00000000
pmap_fault() at pmap_fault+0x74
         pc = 0xc05f1e60  lr = 0xc05f6968 (abort_handler+0x10c)
         sp = 0xc0e14b00  fp = 0xc0e14b98
         r4 = 0x00000005  r5 = 0x00000000
         r6 = 0xc0e14ba0  r7 = 0x00000005
         r8 = 0x00000013  r9 = 0xc0b1b920
        r10 = 0x000002d4
abort_handler() at abort_handler+0x10c
         pc = 0xc05f6968  lr = 0xc05d72dc (exception_exit)
         sp = 0xc0e14ba0  fp = 0xc0e14ca8
         r4 = 0x00000000  r5 = 0xc0e14c78
         r6 = 0xc0e14c88  r7 = 0x00000000
         r8 = 0xc0b1b920  r9 = 0x00000000
        r10 = 0xc0a9ee98
exception_exit() at exception_exit
         pc = 0xc05d72dc  lr = 0xc0ae231c (0xc0ae231c)
         sp = 0xc0e14c30  fp = 0xc0e14ca8
         r0 = 0x00000000  r1 = 0x00000000
         r2 = 0x00000001  r3 = 0xc0b1b920
         r4 = 0x00000000  r5 = 0xc0e14c78
         r6 = 0xc0e14c88  r7 = 0x00000000
         r8 = 0xc0b1b920  r9 = 0x00000000
        r10 = 0xc0a9ee98 r12 = 0xc08b4f38
_sx_xlock_hard() at _sx_xlock_hard+0x2d0
         pc = 0xc02ba3f0  lr = 0xc02ba080 (_sx_xlock+0x124)
         sp = 0xc0e14cb0  fp = 0xc0e14cd8
         r4 = 0xc06c03ab  r5 = 0x00000250
         r6 = 0xc0a9ee98  r7 = 0xc0a9ee88
         r8 = 0x00000000  r9 = 0xc0ae3408
        r10 = 0xc0b31d00
_sx_xlock() at _sx_xlock+0x124
         pc = 0xc02ba080  lr = 0xc0285174 (linker_file_foreach+0x2c)
         sp = 0xc0e14ce0  fp = 0xc0e14cf0
         r4 = 0xc0e14d00  r5 = 0xc05f7d20
         r6 = 0xc05f3a20  r7 = 0xc0879e3c
         r8 = 0xc0e8e520 r10 = 0xc0b31d00
linker_file_foreach() at linker_file_foreach+0x2c
         pc = 0xc0285174  lr = 0xc05f7918 (unwind_stack_one+0x4c)
         sp = 0xc0e14cf8  fp = 0xc0e14d28
         r4 = 0xc0e14d34  r5 = 0x00000000
         r6 = 0xc05f3a20  r7 = 0xc0879e3c
unwind_stack_one() at unwind_stack_one+0x4c
         pc = 0xc05f7918  lr = 0xc05f3a64 (stack_save+0x44)
         sp = 0xc0e14d30  fp = 0xc0e14d90
         r4 = 0xc0b0e7b4  r5 = 0xc0e14d34
         r6 = 0xc0e8d1a0  r7 = 0xc0b0e7b4
         r8 = 0xc0e8e520  r9 = 0xc0ae3408
        r10 = 0xc0b31d00
stack_save() at stack_save+0x44
         pc = 0xc05f3a64  lr = 0xc0320540 (witness_checkorder+0x514)
         sp = 0xc0e14d98  fp = 0xc0e14de8
         r4 = 0x00000004  r5 = 0x00000034
witness_checkorder() at witness_checkorder+0x514
         pc = 0xc0320540  lr = 0xc0294ee4 (__mtx_lock_flags+0xb4)
         sp = 0xc0e14df0  fp = 0xc0e14e18
         r4 = 0xc0b31d10  r5 = 0x00000000
         r6 = 0x00000000  r7 = 0x00000000
         r8 = 0xc0701b45  r9 = 0x00000339
        r10 = 0xc0b31d00
__mtx_lock_flags() at __mtx_lock_flags+0xb4
         pc = 0xc0294ee4  lr = 0xc05b0950 (vm_page_startup+0x8fc)
         sp = 0xc0e14e20  fp = 0xc0e14e50
         r4 = 0x87f00000  r5 = 0xc11ff000
         r6 = 0x80000000  r7 = 0xc0701b45
         r8 = 0xc0b31d10  r9 = 0x00000000
        r10 = 0xc0b31c00
vm_page_startup() at vm_page_startup+0x8fc
         pc = 0xc05b0950  lr = 0xc059df98 (vm_mem_init+0x20)
         sp = 0xc0e14e58  fp = 0xc0e14e60
         r4 = 0xc0b40a44  r5 = 0xc0a9e61c
         r6 = 0x00800001  r7 = 0x00000000
         r8 = 0x01000000  r9 = 0xc0b1b90c
        r10 = 0xc0b1b8fc
vm_mem_init() at vm_mem_init+0x20
         pc = 0xc059df98  lr = 0xc024ad64 (mi_startup+0x2a4)
         sp = 0xc0e14e68  fp = 0xc0e14e90
         r4 = 0xc090503c r10 = 0xc0b1b8fc
mi_startup() at mi_startup+0x2a4
         pc = 0xc024ad64  lr = 0xc00002c4 (_start+0x144)
         sp = 0xc0e14e98  fp = 0x00000000
         r4 = 0xc00003f8  r5 = 0xc0b78000
         r6 = 0x9cf1f8c0  r7 = 0x00c52078
         r8 = 0xc0d46000  r9 = 0x9cf09380
        r10 = 0x00000000
_start() at _start+0x144
         pc = 0xc00002c4  lr = 0xc00002c4 (_start+0x144)
         sp = 0xc0e14e98  fp = 0x00000000
KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at      kdb_enter+0x58: ldrb    r15, [r15, r15, ror r15]!
db>
Comment 1 Li-Wen Hsu freebsd_committer 2019-11-24 18:25:34 UTC
ian: it seems that only r354709 and 354710 are directly related to sys/arm, could you help to find the root cause?
Comment 2 Ian Lepore freebsd_committer 2019-11-24 18:37:35 UTC
(In reply to Li-Wen Hsu from comment #1)
The fact that stack unwinding and linker_file_foreach appears in the backtrace means I think I know what's happening, and it's my bad.  But I don't know offhand how to fix it, I'll start looking into it.
Comment 3 commit-hook freebsd_committer 2019-11-24 21:09:44 UTC
A commit references this bug:

Author: ian
Date: Sun Nov 24 21:08:57 UTC 2019
New revision: 355069
URL: https://svnweb.freebsd.org/changeset/base/355069

Log:
  When doing ARM stack unwinding as part of stack_save(9), do not search
  loaded modules (pass 0/false for the can_lock arg).  Searching the unwind
  info in modules acquires an exclusive sxlock, and the stack(9) functions can
  be called in a context where unbounded sleeps are forbidden (such as from
  the witness checkorder code).

  Just ignoring the existence of modules in stack_save() is not ideal, so I'm
  looking for a better solution, but this commit will make it possible to boot
  an ARM kernel with WITNESS enabled again, until I get something better.

  PR:		242200

Changes:
  head/sys/arm/arm/stack_machdep.c
Comment 4 Ian Lepore freebsd_committer 2019-12-29 18:23:58 UTC
I forgot to cite this PR number when I committed a real fix for this problem:

Author: ian
Date: Sun Dec 15 21:16:35 2019
New Revision: 355780
URL: https://svnweb.freebsd.org/changeset/base/355780

Log:
  Rewrite arm kernel stack unwind code to work when unwinding through modules.
  
  The arm kernel stack unwinder has apparently never been able to unwind when
  the path of execution leads through a kernel module. There was code that
  tried to handle modules by looking for the unwind data in them, but it did
  so by trying to find symbols which have never existed in arm kernel
  modules. That caused the unwind code to panic, and because part of panic
  handling calls into the unwind code, that just created a recursion loop.
  
  Locating the unwind data in a loaded module requires accessing the Elf
  section headers to find the SHT_ARM_EXIDX section. For preloaded modules
  those headers are present in a metadata blob. For dynamically loaded
  modules, the headers are present only while the loading is in progress; the
  memory is freed once the module is ready to use. For that reason, there is
  new code in kern/link_elf.c, wrapped in #ifdef __arm__, to extract the
  unwind info while the headers are loaded. The values are saved into new
  fields in the linker_file structure which are also conditional on __arm__.
  
  In arm/unwind.c there is new code to locally cache the per-module info
  needed to find the unwind tables. The local cache is crafted for lockless
  read access, because the unwind code often needs to run in context where
  sleeping is not allowed.  A large comment block describes the local cache
  list, so I won't repeat it all here.

Modified:
  head/sys/arm/arm/elf_machdep.c
  head/sys/arm/arm/unwind.c
  head/sys/arm/include/stack.h
  head/sys/kern/kern_linker.c
  head/sys/kern/link_elf.c
  head/sys/sys/linker.h
Comment 5 commit-hook freebsd_committer 2020-01-17 15:56:06 UTC
A commit references this bug:

Author: ian
Date: Fri Jan 17 15:55:17 UTC 2020
New revision: 356833
URL: https://svnweb.freebsd.org/changeset/base/356833

Log:
  MFC r354709-r354710, r355069, r355439, r355780, r356273, r356472

  r354709:
  Rewrite arm/stack_machdep.c for EABI; add stack(9) support to arm kernels.

  The old stack_machdep.c code was written for the APCS ABI (aka "oldabi").
  When we switched to ARM EABI (back in freebsd 10) this file never got
  updated, and apparently nobody noticed that until now.

  The new implementation uses the same stack unwinder code used by the
  arm implemenation of the db_trace stuff.

  r354710:
  Compile in arm/unwind.c if options STACK is in effect; the new arm stack(9)
  code now uses unwind.c.

  r355069:
  When doing ARM stack unwinding as part of stack_save(9), do not search
  loaded modules (pass 0/false for the can_lock arg).  Searching the unwind
  info in modules acquires an exclusive sxlock, and the stack(9) functions can
  be called in a context where unbounded sleeps are forbidden (such as from
  the witness checkorder code).

  Just ignoring the existence of modules in stack_save() is not ideal, so I'm
  looking for a better solution, but this commit will make it possible to boot
  an ARM kernel with WITNESS enabled again, until I get something better.

  PR:		242200

  r355439:
  Declare the global kernel symbols created by ldscript.arm in arm's machdep.h,
  and remove a couple scattered local declarations.

  Most of these aren't referenced in C code (there are some references in
  asm code), and they also aren't documented anywhere.  This helps a bit
  with the latter.

  r355780:
  Rewrite arm kernel stack unwind code to work when unwinding through modules.

  The arm kernel stack unwinder has apparently never been able to unwind when
  the path of execution leads through a kernel module. There was code that
  tried to handle modules by looking for the unwind data in them, but it did
  so by trying to find symbols which have never existed in arm kernel
  modules. That caused the unwind code to panic, and because part of panic
  handling calls into the unwind code, that just created a recursion loop.

  Locating the unwind data in a loaded module requires accessing the Elf
  section headers to find the SHT_ARM_EXIDX section. For preloaded modules
  those headers are present in a metadata blob. For dynamically loaded
  modules, the headers are present only while the loading is in progress; the
  memory is freed once the module is ready to use. For that reason, there is
  new code in kern/link_elf.c, wrapped in #ifdef __arm__, to extract the
  unwind info while the headers are loaded. The values are saved into new
  fields in the linker_file structure which are also conditional on __arm__.

  In arm/unwind.c there is new code to locally cache the per-module info
  needed to find the unwind tables. The local cache is crafted for lockless
  read access, because the unwind code often needs to run in context where
  sleeping is not allowed.  A large comment block describes the local cache
  list, so I won't repeat it all here.

  r356273:
  Since arm/unwind.c s conditionally compiled, only call functions in it
  when one of those conditions is true.  Fixes build failure on kernel
  configs with no debugging options active.

  r356472:
  Add #ifdef option-test wrappers around another call to an arm/unwind.c
  function which is only compiled-in with certain options.

  Why is it always the most trivial part of a big commit that takes 3 tries
  to get right?

Changes:
_U  stable/12/
  stable/12/sys/arm/arm/elf_machdep.c
  stable/12/sys/arm/arm/mp_machdep.c
  stable/12/sys/arm/arm/stack_machdep.c
  stable/12/sys/arm/arm/unwind.c
  stable/12/sys/arm/conf/std.armv6
  stable/12/sys/arm/conf/std.armv7
  stable/12/sys/arm/include/machdep.h
  stable/12/sys/arm/include/stack.h
  stable/12/sys/conf/files.arm
  stable/12/sys/kern/kern_linker.c
  stable/12/sys/kern/link_elf.c
  stable/12/sys/sys/linker.h