Bug 205458 - 11.0-CURRENT/10-STABLE powerpc64: a PowerMac G5 specific sys/powerpc/ofw/ofw_machdep.c change for reliable PowerMac G5 booting (with lots of RAM)
Summary: 11.0-CURRENT/10-STABLE powerpc64: a PowerMac G5 specific sys/powerpc/ofw/ofw_...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: powerpc Any
: --- Affects Only Me
Assignee: Justin Hibbits
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2015-12-20 10:45 UTC by Mark Millard
Modified: 2017-05-27 07:11 UTC (History)
5 users (show)

See Also:


Attachments
Test patch to preserve SPRG[1-3] (1.23 KB, patch)
2017-01-23 05:16 UTC, Justin Hibbits
no flags Details | Diff
Corrected test patch (1.23 KB, patch)
2017-01-23 15:40 UTC, Justin Hibbits
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Millard 2015-12-20 10:45:25 UTC
The standard GENERIC64 powerpc64 FreeBSD builds do not reliably boot PowerMac G5's: hang ups in the very early boot activity happen frequently but randomly. 11.0, 10.x, and far into the past are all this way as I understand. I have experience with 10.0-RELEASE to 11.0-CURRENT.

I have been testing a PowerMac G5 specific change since late 2015-Feb or so and have reliable booting with the change and horribly unreliable booting without it. The G5's in question happen to have lots of RAM for their vintage (8GB, 12GB, 16GB). (It took months and thousands of reboot attempts to isolate the change, the more RAM the worse the failure rate for booting in my testing.)

The change is in ofw_sprg_prepare of sys/powerpc/ofw/ofw_machdep.c and could look something like (presented in a form to show new/PowerMacG5-Specific code and old general code):

#ifdef POWERMAC_G5_SPECIFIC_BUILD
	__asm __volatile("mfsprg0 %0\n\t"
			 "mtsprg1 %1\n\t"
			 "mtsprg2 %2\n\t"
			 "mtsprg3 %3\n\t"
			 : "=&r"(ofw_sprg0_save)
			 : "r"(ofmsr[2]),
			 "r"(ofmsr[3]),
			 "r"(ofmsr[4]));
#else
// The historical code:
	__asm __volatile("mfsprg0 %0\n\t"
			 "mtsprg0 %1\n\t"
			 "mtsprg1 %2\n\t"
			 "mtsprg2 %3\n\t"
			 "mtsprg3 %4\n\t"
			 : "=&r"(ofw_sprg0_save)
			 : "r"(ofmsr[1]),
			 "r"(ofmsr[2]),
			 "r"(ofmsr[3]),
			 "r"(ofmsr[4]));
#endif

In other words: for PowerMac G5's omit the mtsprg0 from ofmsr[1]: leave the register as it already is instead of resetting it. The value in ofmsr[1] is inappropriate to the context. I deliberately kept the change minimal and left in all other code related to the register.

Unfortunately I do not know how to accurately detect a PowerMac G5 context while avoiding calling into openfirmware. So I do not know how to make this have an auto-adjusting/dynamic check for a PowerMac G5. Entering openfirmware first would likely recreate the problem (or other problems) from what I can tell.

If appropriate FreeBSD folks consider PowerMac G5's still to be worth-the-effort as tier2 examples to support a predefined, systematic way to buildkernel for reliable PowerMac G5 booting then some sort of publically-documented technique of configuring a build to control the distinction would be appropriate. But it might be that there is a policy of "no such platform-specific source code for desktop machines". Understandable if so. There might be conventions for handling such contexts that I'm not aware of.

I've been testing/using various vintages of 11.0-CURRENT, 10.x-RELENG, and 10-STABLE, progressing over time. All have been reliably booting with the change on all 3 PowerMac G5's that I have access to. (I have no other powerpc64 context.)
Comment 1 Mark Millard 2016-09-11 21:22:20 UTC
I should have noted the specifics of the types of PowerMac G5's that my activity with this hack has been in use on:

2 PowerMac7,11 examples, one with 16 GBytes RAM and one with 12 GBytes RAM, each with 2 dual-core processors ("Quad core G5" examples).

1 PowerMac7,2 with 8 GBytes RAM, 2 single-core processors.


One of the worries about the hack is that other PowerMac/iMac G5 variants than I have tested on might behave differently.

I've no evidence for any other type of PowerMac G5 variant.
Comment 2 Krzysztof Parzyszek 2016-09-12 19:22:37 UTC
It works great with 10.3 on PowerMac11,2, quad core with 16GB of memory.
Comment 3 Mark Millard 2016-09-12 19:32:16 UTC
(In reply to Mark Millard from comment #1)

My PowerMac7,11 reference for the Quad Core (copied from an old note) must be wrong: PowerMac11,2 would be right.

[It is still weeks before I'll have access to directly look.]
Comment 4 Nathan Whitehorn freebsd_committer 2016-09-12 21:01:18 UTC
I've finally understood why this patch does anything useful. Open Firmware runs in virtual mode on the Powermac G5. This runs inside the kernel page table, which preserves all address translations made by OF before the kernel starts; as a result, the kernel address space is a strict superset of OF's.

Where this explodes is if OF uses an unmapped SLB entry. The SLB fault handler runs in real mode and refers to the PCPU pointer in SPRG0, which blows up the kernel. Having a value of SPRG0 that works for the kernel is less fatal than preserving OF's value in this case.

I believe that OF's SPRG0 is maintained only for compatibility with some G4 Apple hardware, the eMac in particular, but will check and we can move on with this. I think it should be safe to wrap this in an #ifdef __powerpc64__.
Comment 5 Mark Millard 2016-09-13 06:31:05 UTC
Nathan Whitehorn has put out a kernel patch that he is asking for help with testing on PowerMac/iMac G5 variations. See:

https://lists.freebsd.org/pipermail/freebsd-ppc/2016-September/008413.html

The patch is not just using __powerpc64__ instead of POWERMAC_G5_SPECIFIC_BUILD in my presentation of my hack.

Nathan has already tested a PowerMac11,2.

If anyone reading this has any other type(s) of Apple G5's and is willing/able it would be good to get in some testing before the change is made to the kernel.

I expect that it does not mater much which version/variation of 10.x, 11.0, or 12 is used if the original boot problems can sometimes be observed before the change. Although I'm not sure of the patch automatically applies nicely to all possible vintages of those.


As for me. . .

It will be a few weeks before I'll again have access in order to test a PowerMac7,2 (or to independently repeat PowerMac11,2 testing). I'll not have access to other Apple G5's. So my range of testing will be rather limited and will not be soon.
Comment 6 Mark Millard 2016-09-13 17:15:00 UTC
(In reply to Mark Millard from comment #5)

Jukka A. Ukkonen has reported Nathan Whithorns patch to have failed to work but my hack to have worked:

In https://lists.freebsd.org/pipermail/freebsd-ppc/2016-September/008416.html he reports the failure of Nathan's patch. . .

I just tried the patch on a PowerMac G5 early 2005
model ...

cpu0: IBM PowerPC 970 revision 2.2, 2000.19 MHz
cpu0: Features dc000000<PPC32,PPC64,ALTIVEC,FPU,MMU>
cpu0: HID0 511081<NAP,DPM,NHR,TBEN,ENATTN>

which I think is a PowerMac7,3. It still panics right
after it has reported VT(ofwfb).

--jau


But we had an E-mail exchange about him manually applying a simple edit to get my patch and he reports in the end. . .

I rebooted the box only a few minutes ago, and this
time it booted just fine.
Well, the tmpfs has started failing as follows...

# mount /tmp
mount: tmpfs: Operation not supported by device

but supposedly this is an unrelated issue. At least I hope so. ;-)

Now the function reads as shown below, and I guess this is
exactly what you had in mind.

static __inline void
ofw_sprg_prepare(void)
{
       if (ofw_real_mode)
               return;

       /*
        * Assume that interrupt are disabled at this point, or
        * SPRG1-3 could be trashed
        */
       __asm __volatile("mfsprg0 %0\n\t"
                        "mtsprg1 %1\n\t"
                        "mtsprg2 %2\n\t"
                        "mtsprg3 %3\n\t"
                        : "=&r"(ofw_sprg0_save)
                        :
                        "r"(ofmsr[2]),
                        "r"(ofmsr[3]),
                        "r"(ofmsr[4]));
}

So, from my point of view the PowerMac7,3 seems to be back to
relative health with this change. I hope it works for all the
other G5s as well.

--jau
Comment 7 Mark Millard 2016-09-13 17:36:19 UTC
(In reply to Mark Millard from comment #6)

That test code does not have the #ifdef __powerpc64__ logic that an official update would need to also G4's and the like: it just serves as a test of the alternative's operation on Jukka's PowerMac7,3.
Comment 8 Mark Millard 2016-09-13 17:38:58 UTC
(In reply to Mark Millard from comment #7)

Also: The "simple edit" involved was to the official svn source that does not have Nathan's proposed patch. My original wording did not make that explicit.
Comment 9 Mark Millard 2017-01-10 10:04:36 UTC
(In reply to Nathan Whitehorn from comment #4)

There seems to have been no activity for this since comment #6
reported the patch failed on a PowerMac 7,3 . So it has been
sitting in the In Progress state for about 4 months (2016-Sep-13
to 2017-Jan-10) as I write this.

Comment 6 basically reports that some of the special purpose
register(s) do need to be saved and restored, just not all
of them. Nathan's patch had completely disabled
ofw_sprg_prepare and its matching restore for __powerpc64__ .

Unfortunately for now I've only access to a PowerMac 11,2 (a
so-called "Quad Core" PowerMac G5) so for now I could only
test the context that Nathan found to be working for his
2016-Sep patch. I should eventually have access to a
PowerMac 7,2 form of G5 again.

Still I wonder about 205458 showing "In Progress" for long periods.
It might get lost with folks thinking it is being worked on when
other higher priority things may have displaced its activity
for a long time (even starting from now).

However, closing the defect seems wrong as well. Going back to
new would seem better. But as it shows to me such is not an
option: closed is the only alternative to In Progress.
Comment 10 Justin Hibbits freebsd_committer 2017-01-23 05:16:01 UTC
Created attachment 179238 [details]
Test patch to preserve SPRG[1-3]

This is an untested patch, combining Nathan's work and Mark's work.  Everyone please test this on all varieties of G5's (7,x and 11,x).
Comment 11 Justin Hibbits freebsd_committer 2017-01-23 15:40:30 UTC
Created attachment 179258 [details]
Corrected test patch

The previous patch was obviously wrong, and I didn't test compile it (wrong asm arg indices).  This one's been compile tested.
Comment 12 Mark Millard 2017-01-25 00:49:18 UTC
Comment on attachment 179258 [details]
Corrected test patch

As for the result of the corrected test patch. . .

It appears that:

static __inline void
ofw_sprg_prepare(void)
. . .
static __inline void
ofw_sprg_restore(void)
. . .

are only defined under an earlier:

#ifdef AIM
extern register_t ofmsr[5];
extern void     *openfirmware_entry;
char            save_trap_init[0x2f00];          /* EXC_LAST */
char            save_trap_of[0x2f00];            /* EXC_LAST */
                
int             ofwcall(void *);
static int      openfirmware(void *args);
. . . (ofw_sprg_<?>'s defined in here) . . .
#ifndef __powerpc64__
        __asm __volatile("mtsprg0 %0" :: "r"(ofw_sprg0_save));
#endif
}
#endif

(Matching the ifdef AIM if I checked right.)

But of following parts are outside any AIM
definition requirement:

static int
openfirmware_core(void *args)
{
. . .
        ofw_sprg_prepare();
. . .
        ofw_sprg_restore();
. . .
}

So without AIM defined the code looks like
it would not compile. (I have AIM defined
for my context.)
Comment 13 Justin Hibbits freebsd_committer 2017-01-25 04:25:44 UTC
(In reply to Mark Millard from comment #12)

openfirmware_core() is only defined behind a #ifdef AIM (#ifdef is right over ofw_quiesce()).
Comment 14 Mark Millard 2017-01-25 04:35:03 UTC
(In reply to Justin Hibbits from comment #13)

That makes the following code in openfirmware_core odd:

#if defined(AIM) && !defined(__powerpc64__)
        /*
         * Clear battable[] translations
         */
        if (!(cpu_features & PPC_FEATURE_64))
                __asm __volatile("mtdbatu 2, %0\n"
                                 "mtdbatu 3, %0" : : "r" (0));
        isync();
#endif

No need to test AIM.

I assumed too much from seeing that AIM test.

Sorry for the noice.
Comment 15 Mark Millard 2017-02-19 03:02:51 UTC
(In reply to Justin Hibbits from comment #11)

Context: PowerMac G5 so-called "Quad Core",
full 16 GByte of RAM.
(Not the most interesting but the only powerpc64
context I currently have access to.)

I've had no problems with the classic boot
hangups and crashes since I switched to your
code (replacing my hack).

That even includes during the kernel version
bisection effort that eventually found recent
locking changes in the kernel (after -r313266)
lead to randomly occurring panics and hangups
after booting.

It is still unknown when I'll again get access
to the other, older type of PowerMac G5 (a
dual processor, single core per processor
G5).
Comment 16 Andreas Tobler freebsd_committer 2017-03-07 20:07:36 UTC
The attached patch (2017-01-23) works for me so far. Quad 14GB.
Comment 17 commit-hook freebsd_committer 2017-03-07 22:12:34 UTC
A commit references this bug:

Author: jhibbits
Date: Tue Mar  7 22:11:58 UTC 2017
New revision: 314885
URL: https://svnweb.freebsd.org/changeset/base/314885

Log:
  Fix booting with >4GB RAM on PowerMac G5 hardware

  ===
  From Nathan Whitehorn:

  Open Firmware runs in virtual mode on the Powermac G5. This runs inside the
  kernel page table, which preserves all address translations made by OF before
  the kernel starts; as a result, the kernel address space is a strict superset of
  OF's.

  Where this explodes is if OF uses an unmapped SLB entry. The SLB fault handler
  runs in real mode and refers to the PCPU pointer in SPRG0, which blows up the
  kernel. Having a value of SPRG0 that works for the kernel is less fatal than
  preserving OF's value in this case.

  ===

  The result of this is seemingly random panics from NULL dereferences, or hangs
  immediately upon boot.  By not restoring SPRG0 for Open Firmware entry the
  kernel PCPU pointer is preserved and SLB faults are successful, resulting in a
  stable kernel.

  PR:		205458
  Reported by:	several (over bugzilla, lists, IRC)
  Reviewed by:	andreast
  Tested by:	many (various forms)
  MFC after:	2 weeks

Changes:
  head/sys/powerpc/ofw/ofw_machdep.c
Comment 18 commit-hook freebsd_committer 2017-04-01 18:53:17 UTC
A commit references this bug:

Author: jhibbits
Date: Sat Apr  1 18:52:49 UTC 2017
New revision: 316367
URL: https://svnweb.freebsd.org/changeset/base/316367

Log:
  MFC r314885:

    Fix booting with >4GB RAM on PowerMac G5 hardware

    ===
    From Nathan Whitehorn:

    Open Firmware runs in virtual mode on the Powermac G5. This runs inside
    the
    kernel page table, which preserves all address translations made by OF
    before
    the kernel starts; as a result, the kernel address space is a strict
    superset of
    OF's.

    Where this explodes is if OF uses an unmapped SLB entry. The SLB
    fault handler
    runs in real mode and refers to the PCPU pointer in SPRG0,
    which blows up the
    kernel. Having a value of SPRG0 that works for the kernel is
    less fatal than
    preserving OF's value in this case.

    ===

    The result of this is seemingly random panics from
    NULL dereferences, or hangs
    immediately upon boot.  By not restoring SPRG0 for
    Open Firmware entry the
    kernel PCPU pointer is preserved and SLB faults
    are successful, resulting in a
    stable kernel.

  PR:		205458

Changes:
_U  stable/11/
  stable/11/sys/powerpc/ofw/ofw_machdep.c
Comment 19 Justin Hibbits freebsd_committer 2017-04-01 19:48:21 UTC
MFC'd to stable/11 r316367.  This patch doesn't apply directly to stable/10, so unless there's sufficient interest, I don't plan to MFC back to stable/10.
Comment 20 Mark Millard 2017-05-27 07:11:43 UTC
(In reply to Justin Hibbits from comment #19)

It turns out that if TARGET_ARCH=powerpc is
to work on powerpc64 PowerMac's reliably then
it needs to detect the context and also avoid
the sprg0 update.

It turns out that the problems and evidence
that I've been reporting are from the code
restoring the openfirmware sprg0 lile the
TARGET_ARCH=powerpc64 code used to do.

I'll submit a separate, new report. But I
figured the relationship would be good to
note here.