Bug 251463 - Write additional ARM registers in the gdb stub
Summary: Write additional ARM registers in the gdb stub
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: CURRENT
Hardware: arm Any
: --- Affects Only Me
Assignee: Mark Johnston
URL:
Keywords: patch
Depends on: 251022
Blocks:
  Show dependency treegraph
 
Reported: 2020-11-29 19:44 UTC by Dmitry Salychev
Modified: 2021-01-09 16:33 UTC (History)
3 users (show)

See Also:


Attachments
Write SP, LR and r0-r12 registers in addition to the PC one (1.21 KB, patch)
2020-11-29 19:44 UTC, Dmitry Salychev
no flags Details | Diff
Stack backtrace and registers printed by DDB (4.29 KB, text/plain)
2020-11-30 20:59 UTC, Dmitry Salychev
no flags Details
This is what I did to restore a correct value of LR in kgdb (6.57 KB, text/plain)
2020-11-30 21:01 UTC, Dmitry Salychev
no flags Details
Write SP, LR and r0-r12 registers in addition to the PC one (1.21 KB, patch)
2020-12-02 18:59 UTC, Dmitry Salychev
no flags Details | Diff
arm_trapframe.patch (2.01 KB, patch)
2020-12-07 20:31 UTC, John Baldwin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Salychev 2020-11-29 19:44:11 UTC
Created attachment 220079 [details]
Write SP, LR and r0-r12 registers in addition to the PC one

There's a kernel panic on BeagleBone Black with a custom kernel
compiled with CPSW_ETHERSWITCH option to enable etherswitch(4) support
in sys/arm/ti/cpsw/if_cpsw.c.

A simple call to "etherswitchcfg" causes a kernel panic with this
stack backtrace printed with ddb:

	root@etherknife-debug:~ # etherswitchcfg
	...
	KDB: stack backtrace:
	...
	db_trace_self() at db_trace_self
	db_trace_self_wrapper() at db_trace_self_wrapper+0x30
	vpanic() at vpanic+0x170
	doadump() at doadump
	abort_align() at abort_align
	abort_handler() at abort_handler+0x330
	exception_exit() at exception_exit
	device_get_softc() at device_get_softc
		pc = 0xc0307b28  lr = 0xc0661664 (cpsw_getport+0x40)
		sp = 0xd7307b80  fp = 0xd7307b90
	cpsw_getport() at cpsw_getport+0x40
	etherswitchioctl() at etherswitchioctl+0x62c
	devfs_ioctl() at devfs_ioctl+0xcc
	vn_ioctl() at vn_ioctl+0x12c
	devfs_ioctl_f() at devfs_ioctl_f+0x2c
	kern_ioctl() at kern_ioctl+0x324
	sys_ioctl() at sys_ioctl+0xf8
	swi_handler() at swi_handler+0x15c
	swi_exit() at swi_exit

However, if I switch to gdb stub from ddb and connect kgdb to this
target, the only stack backtrace which will be reported is:

	(kgdb) bt
	#0  kdb_enter (why=0xc073d1d2 "panic", msg=<optimized out>) at /usr/src/sys/kern/subr_kdb.c:486
	#1  0xc02cbe2c in vpanic (fmt=0xc07254d3 "Fatal abort", ap=...) at /usr/src/sys/kern/kern_shutdown.c:907
	#2  0xc02cbb8c in panic (fmt=0xc0a87348 <kdb_why> "\322\321s\300\024v0\327\322\321s\300\034z0\327\323Tr\300") at /usr/src/sys/kern/kern_shutdown.c:843
	#3  0xc05b3a80 in abort_fatal (...) at /usr/src/sys/arm/arm/trap-v6.c:616
	#4  0xc05b3600 in abort_handler (...) at /usr/src/sys/arm/arm/trap-v6.c:520
	#5  <signal handler called>
	#6  device_get_softc (dev=0x0) at /usr/src/sys/kern/subr_bus.c:2507
	#7  0x00023070 in ?? ()
	Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I switched to the 6th frame at this point, discovered that the link
register had 0x23070 instead of 0xc0661664 reported by ddb and
tried to update it from kgdb with:

	(kgdb) set $lr=0xc0661664

without luck because the only register which can be updated by the
gdb stub was PC. This patch adds an ability to write SP, LR and
general purpose registers r0-r12.
Comment 1 Dmitry Salychev 2020-11-29 19:51:45 UTC
This is how a stack backtrace looks like with the LR register updated:

(kgdb) bt
#0  kdb_enter (...) at /usr/src/sys/kern/subr_kdb.c:486
#1  0xc02cbe2c in vpanic (...) at /usr/src/sys/kern/kern_shutdown.c:907
#2  0xc02cbb8c in panic (...) at /usr/src/sys/kern/kern_shutdown.c:843
#3  0xc05b3a80 in abort_fatal (...) at /usr/src/sys/arm/arm/trap-v6.c:616
#4  0xc05b3600 in abort_handler (...) at /usr/src/sys/arm/arm/trap-v6.c:520
#5  <signal handler called>
#6  device_get_softc (dev=0x0) at /usr/src/sys/kern/subr_bus.c:2507
#7  0xc0661664 in cpsw_getport (dev=<optimized out>, p=0xd7307d28) at /usr/src/sys/arm/ti/cpsw/if_cpsw.c:2841
#8  0xc008e6f0 in etherswitchioctl (cdev=<optimized out>, cmd=<optimized out>, data=<optimized out>, flags=<optimized out>, td=0xd1b8a780) at ./etherswitch_if.h:166
#9  0xc01ce0fc in devfs_ioctl (ap=0xd7307bf8) at /usr/src/sys/fs/devfs/devfs_vnops.c:944
#10 0xc03c6540 in vn_ioctl (fp=0xd19c90c0, com=<optimized out>, data=0xd7307d28, active_cred=<optimized out>, td=0xd1b8a780) at /usr/src/sys/kern/vfs_vnops.c:1616
#11 0xc01ce758 in devfs_ioctl_f (fp=0x0, com=2, data=0xc0661624 <cpsw_getport>, cred=0xc0843414 <etherswitch_getport_desc>, td=0xd1b8a780) at /usr/src/sys/fs/devfs/devfs_vnops.c:875
#12 0xc0343688 in fo_ioctl (fp=0xd19c90c0, com=3225708804, data=0xc0661624 <cpsw_getport>, active_cred=0xc0843414 <etherswitch_getport_desc>, td=0xd1b8a780) at /usr/src/sys/sys/file.h:343
#13 kern_ioctl (td=0xd1b8a780, fd=<optimized out>, com=3225708804, data=0xc0661624 <cpsw_getport> "\360H-\351\020\260\215\342\001@\240\341") at /usr/src/sys/kern/sys_generic.c:801
#14 0xc0343318 in sys_ioctl (td=0xd1b8a780, uap=0xd1b8aa30) at /usr/src/sys/kern/sys_generic.c:709
#15 0xc05b2e20 in syscallenter (td=0xd1b8a780) at /usr/src/sys/arm/arm/../../kern/subr_syscall.c:189
#16 syscall (td=0xd1b8a780, frame=<optimized out>) at /usr/src/sys/arm/arm/syscall.c:144
#17 swi_handler (frame=<optimized out>) at /usr/src/sys/arm/arm/syscall.c:169
#18 <signal handler called>
#19 0x201f8ba8 in ?? ()
#20 0x00023070 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

It's much better as far as I can tell!
Comment 2 Mark Johnston freebsd_committer 2020-11-30 16:46:44 UTC
Hmm, I think kgdb is suppose to be able to handle this without intervention, i.e., you are really hitting a kgdb bug.  kgdb uses trapframe "sniffers" to unwind through exception frames, and there is one for 32-bit ARM kernels: https://github.com/freebsd/freebsd-ports/blob/master/devel/gdb/files/kgdb/arm-fbsd-kern.c

Perhaps arm_fbsd_trapframe_sniffer() needs an update?

I have no problem with the patch - it adds some handy functionality to the gdb stub - but it seems better to fix kgdb.  Do you have any idea what's going wrong there?
Comment 3 Dmitry Salychev 2020-11-30 17:29:57 UTC
(In reply to Mark Johnston from comment #2)

No, unfortunately I've no idea what's going on there, in kgdb. I could take a look at arm_fbsd_trapframe_sniffer() (within a separate PR perhaps?) to fix it there.

My idea is to have it in the gdb stub just to protect us from any possible future bugs of kgdb. It might take some time for me to provide a proper fix anyway :)
Comment 4 Dmitry Salychev 2020-11-30 17:42:46 UTC
(In reply to Mark Johnston from comment #2)

It looks like it isn't the first time gdb got a problem with recognizing a stack trace: https://fragglet.livejournal.com/19646.html.
Comment 5 John Baldwin freebsd_committer freebsd_triage 2020-11-30 18:16:51 UTC
I don't think it hurts to permit writes, but indeed the unwinder should be able to unwind the function.  I'm also not sure how setting $lr "fixes" it.  Setting $lr changes the value of $lr in the top-level frame, not in frame 6?

Debugging the unwinder is a bit of work.  The trapframe unwinder does unwind 'lr' from the trapframe.  If the layout the unwinder assumes is wrong that might manifest as this though.  One way you can check is if the register values at frame 6 (info registers) match the value of the trapframe (which you can find via 'p *frame' up in frames 3 or 4.
Comment 6 Dmitry Salychev 2020-11-30 20:58:58 UTC
(In reply to John Baldwin from comment #5)

I wouldn't say that setting $lr fixes this issue with kgdb, but it just helps me to update LR to a value printed by ddb right after a switch to frame 6. Please, take a look at the stack backtrace printed by ddb and backtraces and registers printed by kgdb I attached.

I could create a new PR with a patch for kgdb actually to separate work. What do you think?
Comment 7 Dmitry Salychev 2020-11-30 20:59:58 UTC
Created attachment 220115 [details]
Stack backtrace and registers printed by DDB
Comment 8 Dmitry Salychev 2020-11-30 21:01:13 UTC
Created attachment 220116 [details]
This is what I did to restore a correct value of LR in kgdb
Comment 9 Dmitry Salychev 2020-12-02 11:07:14 UTC
Any comments?
Comment 10 Mark Johnston freebsd_committer 2020-12-02 15:52:36 UTC
The patch looks ok to me.  One cosmetic nit: I think the check for curthread == kdb_thread can be lifted out of the switch statement, i.e., the function should just return immediately if curthread != kdb_thread.
Comment 11 Dmitry Salychev 2020-12-02 16:47:04 UTC
(In reply to Mark Johnston from comment #10)

I also thought about it. No problem, will do!
Comment 12 Dmitry Salychev 2020-12-02 18:59:09 UTC
Created attachment 220181 [details]
Write SP, LR and r0-r12 registers in addition to the PC one
Comment 13 Mark Johnston freebsd_committer 2020-12-07 15:09:11 UTC
(In reply to Dmitry Salychev from comment #12)
Thanks, committed.
Comment 14 commit-hook freebsd_committer 2020-12-07 15:09:56 UTC
A commit references this bug:

Author: markj
Date: Mon Dec  7 15:09:29 UTC 2020
New revision: 368414
URL: https://svnweb.freebsd.org/changeset/base/368414

Log:
  arm: Let the GDB stub write to SP, LR and GP registers

  This can be handy if gdb's stack unwinder fails, for example because of
  a bug in kgdb's trap frame unwinder.

  PR:		251463
  Submitted by:	Dmitry Salychev <dsl@mcusim.org>
  MFC after:	1 week

Changes:
  head/sys/arm/arm/gdb_machdep.c
  head/sys/arm/include/gdb_machdep.h
Comment 15 John Baldwin freebsd_committer freebsd_triage 2020-12-07 20:30:53 UTC
(In reply to Dmitry Salychev from comment #6)
Unfortunately, you didn't quite give me the thing I asked for (the contents of the 'struct trapframe'), but I think I see the issue anyway.

The arm trapframe is kind of odd in that it has two separate sp/lr register sets.  One is 'tf_usr_*' and the other is 'tf_svc_*'.  No other architecture does this.  All other architectures save the original SP/LR registers (or equivalents) in the same place in a trapframe regardless of whether the fault was from userland or a nested fault in the kernel.  Currently, kgdb always uses the 'tf_usr_*' pair, and the problem here is that the values you need are apparently in the 'tf_svc_*' pair.
Comment 16 John Baldwin freebsd_committer freebsd_triage 2020-12-07 20:31:46 UTC
Created attachment 220356 [details]
arm_trapframe.patch
Comment 17 Dmitry Salychev 2020-12-07 21:14:13 UTC
(In reply to John Baldwin from comment #15)

That's because I didn't have time to do so (and I offered a new PR to be created to work on this).

Btw, sp/lr are banked registers and may have different values depending on a mode (usr, fiq, irq, svc, etc.) It looks like modes other than usr and svc are ignored by kgdb. Why is it so?
Comment 18 John Baldwin freebsd_committer freebsd_triage 2020-12-07 22:32:37 UTC
This mirrors the code in trap-v6.c that prints out the values:

static int
abort_fatal(struct trapframe *tf, u_int idx, u_int fsr, u_int far,
    u_int prefetch, struct thread *td, struct ksig *ksig)
{
	bool usermode;
...
	usermode = TRAPF_USERMODE(tf);
...
	if (usermode)
		printf("usp=%08x, ulr=%08x",
		    tf->tf_usr_sp, tf->tf_usr_lr);
	else
		printf("ssp=%08x, slr=%08x",
		    tf->tf_svc_sp, tf->tf_svc_lr);
	printf(", pc =%08x\n\n", tf->tf_pc);
...
}

Where TRAPF_USERMODE is:

#define TRAPF_USERMODE(frame)	((frame->tf_spsr & PSR_MODE) == PSR_USR32_MODE)

There are some comments about this in sys/arm/arm/exception.S (see PUSHFRAME vs PUSHFRAMEINSVC) though they don't fully explain the reason for the two banks as both macros leave one of the banks uninitialized and only initialize the other by my reading.
Comment 19 John Baldwin freebsd_committer freebsd_triage 2020-12-07 22:34:03 UTC
Were you able to test the patch btw?  I believe if you apply it in /usr/ports/devel/gdb/files/kgdb and rebuild the gdb port it should fix your stack trace.
Comment 20 commit-hook freebsd_committer 2020-12-14 14:55:19 UTC
A commit references this bug:

Author: markj
Date: Mon Dec 14 14:54:20 UTC 2020
New revision: 368636
URL: https://svnweb.freebsd.org/changeset/base/368636

Log:
  MFC r368414:
  arm: Let the GDB stub write to SP, LR and GP registers

  PR:	251463

Changes:
_U  stable/12/
  stable/12/sys/arm/arm/gdb_machdep.c
  stable/12/sys/arm/include/gdb_machdep.h
Comment 21 John Baldwin freebsd_committer freebsd_triage 2020-12-29 23:46:01 UTC
I will probably include the arm trapframe patch in the next update to kgdb.  It would be nice if you were able to test it on your vmcore.
Comment 22 commit-hook freebsd_committer 2021-01-08 20:14:53 UTC
A commit references this bug:

Author: jhb
Date: Fri Jan  8 20:14:50 UTC 2021
New revision: 560808
URL: https://svnweb.freebsd.org/changeset/ports/560808

Log:
  Fix a couple of issues in kgdb.

  - Properly unwind across in-kernel exceptions on arm.

  - Enumerate processes via the pid hash table for kernels without the
    zombproc list.

  PR:		251463 (1)
  Reviewed by:	pizzamig (maintainer)
  Differential Revision:	https://reviews.freebsd.org/D27849

Changes:
  head/devel/gdb/Makefile
  head/devel/gdb/files/kgdb/arm-fbsd-kern.c
  head/devel/gdb/files/kgdb/fbsd-kthr.c