Created attachment 220079 [details] Write SP, LR and r0-r12 registers in addition to the PC one There's a kernel panic on BeagleBone Black with a custom kernel compiled with CPSW_ETHERSWITCH option to enable etherswitch(4) support in sys/arm/ti/cpsw/if_cpsw.c. A simple call to "etherswitchcfg" causes a kernel panic with this stack backtrace printed with ddb: root@etherknife-debug:~ # etherswitchcfg ... KDB: stack backtrace: ... db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x170 doadump() at doadump abort_align() at abort_align abort_handler() at abort_handler+0x330 exception_exit() at exception_exit device_get_softc() at device_get_softc pc = 0xc0307b28 lr = 0xc0661664 (cpsw_getport+0x40) sp = 0xd7307b80 fp = 0xd7307b90 cpsw_getport() at cpsw_getport+0x40 etherswitchioctl() at etherswitchioctl+0x62c devfs_ioctl() at devfs_ioctl+0xcc vn_ioctl() at vn_ioctl+0x12c devfs_ioctl_f() at devfs_ioctl_f+0x2c kern_ioctl() at kern_ioctl+0x324 sys_ioctl() at sys_ioctl+0xf8 swi_handler() at swi_handler+0x15c swi_exit() at swi_exit However, if I switch to gdb stub from ddb and connect kgdb to this target, the only stack backtrace which will be reported is: (kgdb) bt #0 kdb_enter (why=0xc073d1d2 "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:486 #1 0xc02cbe2c in vpanic (fmt=0xc07254d3 "Fatal abort", ap=...) at /usr/src/sys/kern/kern_shutdown.c:907 #2 0xc02cbb8c in panic (fmt=0xc0a87348 <kdb_why> "\322\321s\300\024v0\327\322\321s\300\034z0\327\323Tr\300") at /usr/src/sys/kern/kern_shutdown.c:843 #3 0xc05b3a80 in abort_fatal (...) at /usr/src/sys/arm/arm/trap-v6.c:616 #4 0xc05b3600 in abort_handler (...) at /usr/src/sys/arm/arm/trap-v6.c:520 #5 <signal handler called> #6 device_get_softc (dev=0x0) at /usr/src/sys/kern/subr_bus.c:2507 #7 0x00023070 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) I switched to the 6th frame at this point, discovered that the link register had 0x23070 instead of 0xc0661664 reported by ddb and tried to update it from kgdb with: (kgdb) set $lr=0xc0661664 without luck because the only register which can be updated by the gdb stub was PC. This patch adds an ability to write SP, LR and general purpose registers r0-r12.
This is how a stack backtrace looks like with the LR register updated: (kgdb) bt #0 kdb_enter (...) at /usr/src/sys/kern/subr_kdb.c:486 #1 0xc02cbe2c in vpanic (...) at /usr/src/sys/kern/kern_shutdown.c:907 #2 0xc02cbb8c in panic (...) at /usr/src/sys/kern/kern_shutdown.c:843 #3 0xc05b3a80 in abort_fatal (...) at /usr/src/sys/arm/arm/trap-v6.c:616 #4 0xc05b3600 in abort_handler (...) at /usr/src/sys/arm/arm/trap-v6.c:520 #5 <signal handler called> #6 device_get_softc (dev=0x0) at /usr/src/sys/kern/subr_bus.c:2507 #7 0xc0661664 in cpsw_getport (dev=<optimized out>, p=0xd7307d28) at /usr/src/sys/arm/ti/cpsw/if_cpsw.c:2841 #8 0xc008e6f0 in etherswitchioctl (cdev=<optimized out>, cmd=<optimized out>, data=<optimized out>, flags=<optimized out>, td=0xd1b8a780) at ./etherswitch_if.h:166 #9 0xc01ce0fc in devfs_ioctl (ap=0xd7307bf8) at /usr/src/sys/fs/devfs/devfs_vnops.c:944 #10 0xc03c6540 in vn_ioctl (fp=0xd19c90c0, com=<optimized out>, data=0xd7307d28, active_cred=<optimized out>, td=0xd1b8a780) at /usr/src/sys/kern/vfs_vnops.c:1616 #11 0xc01ce758 in devfs_ioctl_f (fp=0x0, com=2, data=0xc0661624 <cpsw_getport>, cred=0xc0843414 <etherswitch_getport_desc>, td=0xd1b8a780) at /usr/src/sys/fs/devfs/devfs_vnops.c:875 #12 0xc0343688 in fo_ioctl (fp=0xd19c90c0, com=3225708804, data=0xc0661624 <cpsw_getport>, active_cred=0xc0843414 <etherswitch_getport_desc>, td=0xd1b8a780) at /usr/src/sys/sys/file.h:343 #13 kern_ioctl (td=0xd1b8a780, fd=<optimized out>, com=3225708804, data=0xc0661624 <cpsw_getport> "\360H-\351\020\260\215\342\001@\240\341") at /usr/src/sys/kern/sys_generic.c:801 #14 0xc0343318 in sys_ioctl (td=0xd1b8a780, uap=0xd1b8aa30) at /usr/src/sys/kern/sys_generic.c:709 #15 0xc05b2e20 in syscallenter (td=0xd1b8a780) at /usr/src/sys/arm/arm/../../kern/subr_syscall.c:189 #16 syscall (td=0xd1b8a780, frame=<optimized out>) at /usr/src/sys/arm/arm/syscall.c:144 #17 swi_handler (frame=<optimized out>) at /usr/src/sys/arm/arm/syscall.c:169 #18 <signal handler called> #19 0x201f8ba8 in ?? () #20 0x00023070 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) It's much better as far as I can tell!
Hmm, I think kgdb is suppose to be able to handle this without intervention, i.e., you are really hitting a kgdb bug. kgdb uses trapframe "sniffers" to unwind through exception frames, and there is one for 32-bit ARM kernels: https://github.com/freebsd/freebsd-ports/blob/master/devel/gdb/files/kgdb/arm-fbsd-kern.c Perhaps arm_fbsd_trapframe_sniffer() needs an update? I have no problem with the patch - it adds some handy functionality to the gdb stub - but it seems better to fix kgdb. Do you have any idea what's going wrong there?
(In reply to Mark Johnston from comment #2) No, unfortunately I've no idea what's going on there, in kgdb. I could take a look at arm_fbsd_trapframe_sniffer() (within a separate PR perhaps?) to fix it there. My idea is to have it in the gdb stub just to protect us from any possible future bugs of kgdb. It might take some time for me to provide a proper fix anyway :)
(In reply to Mark Johnston from comment #2) It looks like it isn't the first time gdb got a problem with recognizing a stack trace: https://fragglet.livejournal.com/19646.html.
I don't think it hurts to permit writes, but indeed the unwinder should be able to unwind the function. I'm also not sure how setting $lr "fixes" it. Setting $lr changes the value of $lr in the top-level frame, not in frame 6? Debugging the unwinder is a bit of work. The trapframe unwinder does unwind 'lr' from the trapframe. If the layout the unwinder assumes is wrong that might manifest as this though. One way you can check is if the register values at frame 6 (info registers) match the value of the trapframe (which you can find via 'p *frame' up in frames 3 or 4.
(In reply to John Baldwin from comment #5) I wouldn't say that setting $lr fixes this issue with kgdb, but it just helps me to update LR to a value printed by ddb right after a switch to frame 6. Please, take a look at the stack backtrace printed by ddb and backtraces and registers printed by kgdb I attached. I could create a new PR with a patch for kgdb actually to separate work. What do you think?
Created attachment 220115 [details] Stack backtrace and registers printed by DDB
Created attachment 220116 [details] This is what I did to restore a correct value of LR in kgdb
Any comments?
The patch looks ok to me. One cosmetic nit: I think the check for curthread == kdb_thread can be lifted out of the switch statement, i.e., the function should just return immediately if curthread != kdb_thread.
(In reply to Mark Johnston from comment #10) I also thought about it. No problem, will do!
Created attachment 220181 [details] Write SP, LR and r0-r12 registers in addition to the PC one
(In reply to Dmitry Salychev from comment #12) Thanks, committed.
A commit references this bug: Author: markj Date: Mon Dec 7 15:09:29 UTC 2020 New revision: 368414 URL: https://svnweb.freebsd.org/changeset/base/368414 Log: arm: Let the GDB stub write to SP, LR and GP registers This can be handy if gdb's stack unwinder fails, for example because of a bug in kgdb's trap frame unwinder. PR: 251463 Submitted by: Dmitry Salychev <dsl@mcusim.org> MFC after: 1 week Changes: head/sys/arm/arm/gdb_machdep.c head/sys/arm/include/gdb_machdep.h
(In reply to Dmitry Salychev from comment #6) Unfortunately, you didn't quite give me the thing I asked for (the contents of the 'struct trapframe'), but I think I see the issue anyway. The arm trapframe is kind of odd in that it has two separate sp/lr register sets. One is 'tf_usr_*' and the other is 'tf_svc_*'. No other architecture does this. All other architectures save the original SP/LR registers (or equivalents) in the same place in a trapframe regardless of whether the fault was from userland or a nested fault in the kernel. Currently, kgdb always uses the 'tf_usr_*' pair, and the problem here is that the values you need are apparently in the 'tf_svc_*' pair.
Created attachment 220356 [details] arm_trapframe.patch
(In reply to John Baldwin from comment #15) That's because I didn't have time to do so (and I offered a new PR to be created to work on this). Btw, sp/lr are banked registers and may have different values depending on a mode (usr, fiq, irq, svc, etc.) It looks like modes other than usr and svc are ignored by kgdb. Why is it so?
This mirrors the code in trap-v6.c that prints out the values: static int abort_fatal(struct trapframe *tf, u_int idx, u_int fsr, u_int far, u_int prefetch, struct thread *td, struct ksig *ksig) { bool usermode; ... usermode = TRAPF_USERMODE(tf); ... if (usermode) printf("usp=%08x, ulr=%08x", tf->tf_usr_sp, tf->tf_usr_lr); else printf("ssp=%08x, slr=%08x", tf->tf_svc_sp, tf->tf_svc_lr); printf(", pc =%08x\n\n", tf->tf_pc); ... } Where TRAPF_USERMODE is: #define TRAPF_USERMODE(frame) ((frame->tf_spsr & PSR_MODE) == PSR_USR32_MODE) There are some comments about this in sys/arm/arm/exception.S (see PUSHFRAME vs PUSHFRAMEINSVC) though they don't fully explain the reason for the two banks as both macros leave one of the banks uninitialized and only initialize the other by my reading.
Were you able to test the patch btw? I believe if you apply it in /usr/ports/devel/gdb/files/kgdb and rebuild the gdb port it should fix your stack trace.
A commit references this bug: Author: markj Date: Mon Dec 14 14:54:20 UTC 2020 New revision: 368636 URL: https://svnweb.freebsd.org/changeset/base/368636 Log: MFC r368414: arm: Let the GDB stub write to SP, LR and GP registers PR: 251463 Changes: _U stable/12/ stable/12/sys/arm/arm/gdb_machdep.c stable/12/sys/arm/include/gdb_machdep.h
I will probably include the arm trapframe patch in the next update to kgdb. It would be nice if you were able to test it on your vmcore.
A commit references this bug: Author: jhb Date: Fri Jan 8 20:14:50 UTC 2021 New revision: 560808 URL: https://svnweb.freebsd.org/changeset/ports/560808 Log: Fix a couple of issues in kgdb. - Properly unwind across in-kernel exceptions on arm. - Enumerate processes via the pid hash table for kernels without the zombproc list. PR: 251463 (1) Reviewed by: pizzamig (maintainer) Differential Revision: https://reviews.freebsd.org/D27849 Changes: head/devel/gdb/Makefile head/devel/gdb/files/kgdb/arm-fbsd-kern.c head/devel/gdb/files/kgdb/fbsd-kthr.c