237590 – powerpc64 PowerMac11,2 and 7,2 context, -r330614 and later (including -r345758): "ofwdump -ap" crashes the system; probable -r330610 "cause"

Bug 237590 - powerpc64 PowerMac11,2 and 7,2 context, -r330614 and later (including -r345758): "ofwdump -ap" crashes the system; probable -r330610 "cause"

Summary: powerpc64 PowerMac11,2 and 7,2 context, -r330614 and later (including -r34575...

Status:	New

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	powerpc Any

Importance:	--- Affects Some People
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-04-26 20:53 UTC by Mark Millard
Modified:	2020-05-29 04:16 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mark Millard 2019-04-26 20:53:36 UTC

The biggest issue may be the inability to sufficiently sleep CPUs
on powerpc64 in general, with ofwdump on old PowerMac G5's just
being a good way to test that.

This was originally observed on head -r345758.

But "bisecting" based on:

https://artifact.ci.freebsd.org/snapshot/head/r*

I found that for the likes of "ofwdump -ap > /dev/null" :

-r330572: does not crash the system (or program).
-r330614: crashes the system: timeout trying to sleep cpus.

There are no other https://artifact.ci.freebsd.org/snapshot/head/r*
between -r330572 and -r330614 with powerpc64 present. So I stopped
at this range.

Turns out that between those two versions is:

Revision 330610 . . .
Modified Wed Mar 7 17:08:07 2018 UTC . . . by nwhitehorn 
. . .
Move the powerpc64 direct map base address from zero to high memory. This
accomplishes a few things:
- Makes NULL an invalid address in the kernel, which is useful for catching
 bugs.
. . .

(It may be that -r330610 exposed another problem that was
accidentally avoided before that.)


So far it appears likely that every bootable/usable build from
-r330610 on crashes for the likes of "ofwdump -ap" when openfirmware
is used live and, so, the cpus need to avoid multi-threading the
openfirmware use. (So, for PowerMacs, when booted without usefdt mode
for  sufficiently recent builds).

I've not seen the problem with 32-bit powerpc FreeBSD on old PowerMacs,
including when used to boot the same machines. Only powerpc64 FreeBSD.

The only powerpc64 test contexts that I have access to sometimes are
old G5 PowerMacs: 2-socket/2-core-each PowerMac11,2 and
2-socket/1-core-each PowerMac7,2. So I've no direct evidence for
any other powerpc64 context.

Comment 1 Mark Millard 2019-04-26 21:35:19 UTC

(In reply to Mark Millard from comment #0)

I suppose that a possibility is that:

A) It may be ddb related code that can not sleep some
   CPU(s) (because some already are sleeping?).

B) It may be openfirmware tried to use an address that
   is invalid in the kernel on or after -r330610 .

(A) may prevent seeing a notice that would point to (B)
as a possibility, thus hiding the true cause.

Comment 2 Mark Millard 2019-04-27 02:32:25 UTC

(In reply to Mark Millard from comment #1)

From looking at the code the message about timeout
stopping cpus is from kdbtrap and does not of itself
seem to be part of the original problem.

Attempting a bt at the ddb> prompt gets its own
fatal kernel trap but some of the exception information
is possibly of interest. Typed from a screen picture
(this is from -r330614):

exception      = 0x300 (data storage interrupt)
virtual address= 0xffffffffffb7f400
dsisr          = 0x40000000
srr0           = 0x9b2194 (0x9b2194)
srr1           = 0x9000000000001032
lr             = 0x9b254c (0x9b254c)
curthread      = 0xc0000000024fb000
       pid = 12, comm = swi4: clock (0)

The stack backtrace itself listed:

0x0000000f95460: kernel DSI read trap @ 0xffffffffffb7f400
by .db_backtrace+0x40

[Multiple attempts got the same as the above.]

That matches of 0x9b2194 for srr0:

00000000009b2154 <.db_backtrace> mflr    r0
00000000009b2158 <.db_backtrace+0x4> std     r28,-32(r1)
00000000009b215c <.db_backtrace+0x8> std     r29,-24(r1)
00000000009b2160 <.db_backtrace+0xc> std     r30,-16(r1)
00000000009b2164 <.db_backtrace+0x10> std     r31,-8(r1)
00000000009b2168 <.db_backtrace+0x14> std     r0,16(r1)
00000000009b216c <.db_backtrace+0x18> stdu    r1,-144(r1)
00000000009b2170 <.db_backtrace+0x1c> mr      r31,r1
00000000009b2174 <.db_backtrace+0x20> mr      r28,r5
00000000009b2178 <.db_backtrace+0x24> ld      r9,-20736(r2)
00000000009b217c <.db_backtrace+0x28> lwz     r0,0(r9)
00000000009b2180 <.db_backtrace+0x2c> cmpwi   cr7,r0,0
00000000009b2184 <.db_backtrace+0x30> bne     cr7,00000000009b24e0 <.db_backtrace+0x38c>
00000000009b2188 <.db_backtrace+0x34> cmpldi  cr7,r4,4095
00000000009b218c <.db_backtrace+0x38> ble     cr7,00000000009b24e0 <.db_backtrace+0x38c>
00000000009b2190 <.db_backtrace+0x3c> mr      r30,r4
00000000009b2194 <.db_backtrace+0x40> ld      r30,0(r30)
. . .

Which is from:

static int
db_backtrace(struct thread *td, db_addr_t fp, int count)
{
	db_addr_t stackframe, lr, *args;
	boolean_t kernel_only = TRUE;
	boolean_t full = FALSE;

#if 0
. . .
#endif

	stackframe = fp;

	while (!db_pager_quit) {
		if (stackframe < PAGE_SIZE)
			break;

		/*
		 * Locate the next frame by grabbing the backchain ptr
		 * from frame[0]
		 */
		stackframe = *(db_addr_t *)stackframe;
. . .

That establishes that the 0xffffffffffb7f400 value is
from what the code expects was a stack location holding
a backchain pointer.

Comment 3 Mark Millard 2019-05-18 20:13:24 UTC

[The following is from a head -r347549 based
context.]

I have isolated the call into openfirmware that
leads to the crash. Running

( truss ofwdump -ap ) 2>&1

on the console left me able to take a picture at
the end. Then under (a patched-up) usefdt mode I
was able to what would normally be next.

The crash was shown after (from the usefdt
mode context were I could save the output):

      name:
write(1,"      name:\n",12)                      = 12 (0xc)
        6f 62 70 2d 74 66 74 70 00 
write(1,"        6f 62 70 2d 74 66 74 70 "...,36) = 36 (0x24)
        'obp-tftp'
write(1,"        'obp-tftp'\n",19)               = 19 (0x13)
ioctl(3,OFIOCNEXTPROP,0x3fffffffffffd648)        = 0 (0x0)
ioctl(3,OFIOCGETPROPLEN,0x3fffffffffffd528)      = 0 (0x0)
ioctl(3,OFIOCGET,0x3fffffffffffd508)             = 0 (0x0)
      stats:
write(1,"      stats:\n",13)                     = 13 (0xd)
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
write(1,"        00 00 00 00 00 00 00 00 "...,69) = 69 (0x45)
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
write(1,"        00 00 00 00 00 00 00 00 "...,69) = 69 (0x45)
ioctl(3,OFIOCNEXTPROP,0x3fffffffffffd648)        = 0 (0x0)
ioctl(3,OFIOCGETPROPLEN,0x3fffffffffffd528)      = 0 (0x0)

Then was:

timeput stopping cpus
[ thread pid 11 tid 100003 ]
Stopped at k_trap_0x28: stdu r31,r2,0xfd50
db>

(I'll note that the last two addresses in the last two
ioctl's shows addresses ending in fd678 and fd558 in the
crashing context [non-usefdt mode]. In general the
address are not exact matches for the two boot modes.)

But note that truss reports commands after they finish.
(See the writes above.) So the processor was likely
working on the command that followed.

That was the OFIOCGET for the fairly large log property
(text from non-usefdt mode output, no crash):

ioctl(3,OFIOCGET,0x3fffffffffffd508)             = 0 (0x0)
      log:
write(1,"      log:\n",11)                       = 11 (0xb)
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
write(1,"        00 00 00 00 00 00 00 00 "...,69) = 69 (0x45)
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
. . .

(I'll not list the large block of output for reporting
the log area.)

Comment 4 Mark Millard 2019-05-18 20:51:00 UTC

(In reply to Mark Millard from comment #3)

Based on register values values it looks like the
openfirmware code got something like, say, a Data
Storage Exception, that in turn went to generictrap,
that in turn went to k_trap:

<k_trap> mfsprg  r31,0
<k_trap+0x4> mfsrr0  r30
<k_trap+0x8> std     r30,288(r31)
<k_trap+0xc> mfsrr1  r30
<k_trap+0x10> std     r30,296(r31)
<k_trap+0x14> mfsprg  r31,1
<k_trap+0x18> mfmsr   r30
<k_trap+0x1c> ori     r30,r30,50
<k_trap+0x20> mtmsr   r30
<k_trap+0x24> isync
<k_trap+0x28> stdu    r31,-688(r1)

It failed at the last because the value r1 held
0xffff'ffff'ffb7'f400, apparently an openfirmware
stack address being handled by a FreeBSD exception
handler under the FreeBSD memory management.

(So far, I can not be sure of what, specifically,
initiated getting to k_trap.)

Comment 5 Mark Millard 2019-07-20 07:37:32 UTC

(In reply to Mark Millard from comment #3)

The ofwdump command that directly tries to report
the log information that leads to the crash is:

ofwdump -Plog /packages/obp-tftp

(This is easier to follow than the prior truss
output showing the matching OFIOCGET activity.

Comment 6 Mark Millard 2020-03-11 00:23:58 UTC

The conflict between the new 64-bit direct map and
use of PowerMac G5 openfirmware during:

ofwdump -Plog /packages/obp-tftp

continues.

A usefdt mode patched up sufficiently to work
with the PowerMac G5's (and more PowerMacs) can
avoid this but needs to deal with some PowerMac
openfirmware oddities (violations of some rules),
as well as having some fixes to FreeBSD code. The
PowerMac's even have at least one example of
(extracted from a patch):

+// Some Macintoshes have identical package-to-pathname results for
+// multiple nodes of the same type and unit under the parent node.
+// Avoid blocking this for fdt.

(Unfortuantely the (textually small) change is in
contrib code. It was the only part of my
investigatory code that was.)

(One oddity is that the unpatched conversion
to fdt code reverses the order of  the sibling
lists but the FreeBSD code does not work in an
order independent manor for uisng the sibling
lists. In my investigatory patching, I made the
conversion preserve order instead of changing a
bunch of code that used the lists.)

Comment 7 Justin Hibbits freebsd_committer

2020-05-29 01:05:39 UTC

Is this still a problem?  I don't have a G5 to test HEAD with.


(In reply to Mark Millard from comment #6)

Can you share your patch to make ofwfdt recreate the device tree in the correct order?  I think we would really like to get away from calling into OFW at runtime, so anything that helps that is welcome.

Comment 8 Mark Millard 2020-05-29 02:12:27 UTC

(In reply to Justin Hibbits from comment #7)

Quick summary: look at my attachments in . . .

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233863

I put a fair amount of notes in the code with more details.
Those are probably essential.

Some of my bugzilla comments have notes about the patches,
starting in comment #7 from what I see. If needed I could
go through and list the comment numbers tied to the patches.
There is a history structure, as I progressed in what I was
learning and doing, unfortunately.


More overall context:

Why the patches are there: I could not boot and use the
workarounds indicated for that defect at the time. The
patches were from my effort to be able to get enough
working to use the workarounds for that other defect,
or at least that is how they started. I also tied to
make the patches okay to use on all the powerpc hardware
I had a chance to boot with FreeBSD. Even the iMac G3
contributed to what worked vs did not work in general,
so I avoided bad assumptions by testing in contexts
where some assumptions were demonstrated to not always
hold.

I ended up with more covered than what I started out to
try to do. (That is a good thing.)

But the patches do not cover the cross-cpu time problem.
At the time I used a much more hackish workaround for
that issue, not a workaround that I ever submitted anyplace.
After getting things working that are covered by the
attachments, I then worked on the cross-cpu time problem
(again) and produced what I eventually separately put out
the review for.

Back then I had weeks of dedicated time that I applied to
these issues, learning unfamiliar things as I went. It has
been a long time since I've dealt with any of the code in
any detail but I keep the patches in my environment (head
-r360311 currently). The patches in the old defect report
were last synchronized to -r346132 from what I see in the
comments.

If needed I could probably produce svndiff's for a more
modern /usr/src snapshot than I currently run. But I'm
waiting for the fixes to the powerpc crashes exposed by
the jemalloc update before I progress my environments
(lacking vm_page_dirty in pmap_protect and whatever for
moea64_protect).

I do not remember if there were any source files common
between these patches and the cross-cpu time material.
If there were, I may have to deal with keeping them
separate. (My running environment has all the patches
applied.)

Comment 9 Mark Millard 2020-05-29 04:00:22 UTC

(In reply to Justin Hibbits from comment #7)

So I reviewed 233863 's comments and the 3 attachments
and this should make your exploration much easier . . .

What comments to read for which attachments in 233863 ?

Two of the attachments have essential information, one
has extra information and dealing with it is optional
for the purpose (given one part of the essential
patches is used).

Essential: attachments 203683 and 204369

203683: read comments 15,23,49
(plus read any code comments in the diff)

204369: read comments 7,12,13,22,23,27,30,32,51
(plus read any code comments in the diff)

Note: comment 23 relates to the later-below
extra/optional patch as well: what part of
204369 makes the below optional.

You might want to only pay attention to the above
2 attachments and their comments.


Extra/optional: 204368

204368: read comments 19,20,23,50
(plus read any code comments in the diff)

Note: comment 23 has the explanation of why the
attachment is extra/optional (if one part of
the essential patches is used).

Comment 10 Mark Millard 2020-05-29 04:16:14 UTC

(In reply to Justin Hibbits from comment #7)

"Is this still a problem?": Unsure but I expect:
high probability for yes for without usefdt .
(And, as things are, usefdt mode does not work
well.)

I seem to remember that changes in relation to the
direct map have occurred since head -r360311 (that
I've been running for some time). Definitely broke
for without usefdt in what I'm running.

But I'm not progressing until vm_page_dirty in
pmap_protect and whatever for moea64_protect are
updated and I can verify the result is well
behaved in my contexts.