Summary: | dtrace/powerpc64: System crash | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Breno Leitao <breno.leitao> | ||||
Component: | kern | Assignee: | freebsd-ppc (Nobody) <ppc> | ||||
Status: | New --- | ||||||
Severity: | Affects Only Me | CC: | emaste, jhibbits, kbowling, linimon | ||||
Priority: | --- | Keywords: | patch | ||||
Version: | CURRENT | ||||||
Hardware: | powerpc | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Breno Leitao
2018-01-19 17:51:02 UTC
This looks like it's trying to read a user address (0xffffffffffff9b90). This shouldn't happen, so there must be an incorrect check in the dtrace arch code (sys/cddl/dev/dtrace/powerpc/dtrace_isa.c) Hi Justin, We are debugging this one in background, and the problem seems to be a wrong stack pointer (r1). Mainly after restoring it at sys/powerpc/aim/trap_subr64.S: ASENTRY_NOPROF(breakpoint) .... dbtrap: /* Write the trap vector to SPRG3 by computing LR & 0xff00 */ mflr %r1 andi. %r1,%r1,0xff00 mtsprg3 %r1 ld %r1,TRAP_TOCBASE(0) /* get new SP */ ld %r1,TOC_REF(tmpstk)(%r1) addi %r1,%r1,(TMPSTKSZ-48) We have a working patch already, and I would suggest the authors to share it. I did some further debugs, and this is what I found now: 1) The very first problem is that we are not able to leave kdb, if we just get in into kdb, and 'c' continue, we get: db> c fatal kernel trap: exception = 0x400 (instruction storage interrupt) virtual address = 0x426f6f7420666c60 srr0 = 0x426f6f7420666c60 (0x426f6f7420666c60) srr1 = 0x8000000040001032 lr = 0x426f6f7420666c61 (0x426f6f7420666c61) curthread = 0x1441460 pid = 0, comm = We are going to an address that means "boot fla", which is clearly wrong. Debugging the code, I found exactly what is happening. On dbtrap->dbleave, we call FRAME_LEAVE(), which does leaves SRR0 = 0x6616r8 and SRR1 = 0x8000000000001032 and then call rfid. SRR0 and SRR1 are sane. SRR0 is part of the kdb_enter function, as showed: 0x0000000000661658 in kdb_enter (why=0xf68968 "bootflags", msg=0x80 "") The kdb_enter() function follows: void kdb_enter(const char *why, const char *msg) { if (kdb_dbbe != NULL && kdb_active == 0) { if (msg != NULL) printf("KDB: enter: %s\n", msg); kdb_why = why; breakpoint(); kdb_why = KDB_WHY_UNSET; } } After the rfid we get into 0xf68968 that executes: 0x0000000000661668 <+112>: ld r0,16(r1) 0x000000000066166c <+116>: mtlr r0 ... 0x000000000066167c <+132>: blr In this case, lr has 0x426f6f7420666c61, which is clearly wrong and causes the trap. It appears that r1 does not point to the proper stack, since it contains: r1 = 0xf68958 0xf68958: 0x6b65726e 0x656c6e61 0x6d650000 0x00000000 0xf68968: 0x626f6f74 0x666c6167 0x73000000 0x00000000 0xf68978: 0x426f6f74 0x20666c61 Anyway, clearly the kdb_enter() has a wrong stack pointer (r1) after calling breakpoint. Hi, I created a patch that I think it makes sense. I think that the problem we are not using the temporary stack properly, since it is defined on the bottom of the stack, as showed: #define TMPSTKSZ 16384 /* 16K temporary stack */ GLOBAL(tmpstk) .space TMPSTKSZ TOC_ENTRY(tmpstk) So, the tmpstk is pointing to the bottom of the stack (higher address) in the TOC. Later, in the dbtrap: section, we get tmpstk and increase TMPSTKSZ, which is going to get in a different area. If the above is correct, we have two solutions: 1) Move the TOC_ENTRY to the TOP of the stack, something as (not validated): #define TMPSTKSZ 16384 /* 16K temporary stack */ GLOBAL(tmpstk) TOC_ENTRY(tmpstk) .space TMPSTKSZ 2) Does not increment the tmpstk pointer, since we are on the bottom of the stack already: index 7ef41d90ffe..2307ec4307c 100644 --- a/sys/powerpc/aim/trap_subr64.S +++ b/sys/powerpc/aim/trap_subr64.S @@ -805,9 +805,9 @@ dbtrap: andi. %r1,%r1,0xff00 mtsprg3 %r1 ld %r1,TRAP_TOCBASE(0) /* get new SP */ ld %r1,TOC_REF(tmpstk)(%r1) - addi %r1,%r1,(TMPSTKSZ-48) + addi %r1,%r1,-48 Does it make any sense? Looking further, it and the stack is properly set, i.e, the tmpstk is set at the correct place, and the patch above, although not crashing, is far to be correct. This is my new discovery: 1) The tmpstk is being used by powerpc_init(). 2) Powerpc_init calls kdb_enter(), which calls breakpoint(). 3) Breakpoint goes to "dbtrap" label, which reset the stack to tmpstk, overwriting it since the beginning. 4) when dbtrap/breakpoint exits, the stack is a mess, causing this problem. To prove that, I created a stack just for the trap, and this seems to fix the problem. diff --git a/sys/powerpc/aim/locore64.S b/sys/powerpc/aim/locore64.S index f283ba6489b..3dd1ca4092d 100644 --- a/sys/powerpc/aim/locore64.S +++ b/sys/powerpc/aim/locore64.S @@ -58,9 +58,13 @@ GLOBAL(__endkernel) #define TMPSTKSZ 16384 /* 16K temporary stack */ GLOBAL(tmpstk) .space TMPSTKSZ - TOC_ENTRY(tmpstk) +#define TRAPSTKSZ 4096 +GLOBAL(trapstk) + .space TRAPSTKSZ +TOC_ENTRY(trapstk) + /* * Entry point for bootloaders that do not fully implement ELF and start * at the beginning of the image (kexec, notably). In its own section so diff --git a/sys/powerpc/aim/trap_subr64.S b/sys/powerpc/aim/trap_subr64.S index 7ef41d90ffe..ffae46fa3f6 100644 --- a/sys/powerpc/aim/trap_subr64.S +++ b/sys/powerpc/aim/trap_subr64.S @@ -805,9 +805,9 @@ dbtrap: andi. %r1,%r1,0xff00 mtsprg3 %r1 - ld %r1,TRAP_TOCBASE(0) /* get new SP */ - ld %r1,TOC_REF(tmpstk)(%r1) - addi %r1,%r1,(TMPSTKSZ-48) + GET_TOCBASE(%r1) + ld %r1,TOC_REF(trapstk)(%r1) + addi %r1,%r1,(TRAPSTKSZ-48) FRAME_SETUP(PC_DBSAVE) /* Call C trap code: */ Created attachment 190876 [details]
Create a trapstack to avoid tmpstack corruption
This is not tested on powerpc32 bits platform yet.
|