Bug 216123 - ofwfb: r269278 broke booting on Power Mac G4
Summary: ofwfb: r269278 broke booting on Power Mac G4
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: powerpc Any
: --- Affects Some People
Assignee: Nathan Whitehorn
URL:
Keywords: patch, regression
Depends on:
Blocks:
 
Reported: 2017-01-15 21:08 UTC by Tom Lane
Modified: 2018-08-29 20:06 UTC (History)
7 users (show)

See Also:


Attachments
Revert r269278, in a way that applies to CURRENT (7.09 KB, patch)
2017-01-17 00:00 UTC, Tom Lane
no flags Details | Diff
ofwdump -ap output (91.96 KB, text/plain)
2017-01-17 04:23 UTC, Tom Lane
no flags Details
Minimum part of reverting r269278 to get CURRENT to boot (1.41 KB, patch)
2017-01-19 13:51 UTC, Tom Lane
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Lane 2017-01-15 21:08:55 UTC
I've been trying to install FreeBSD on a Power Mac G4 ("PowerMac 3,6",
a/k/a Mirrored Drive Doors 2003).  10.3 works fine, but neither
11-STABLE nor CURRENT are able to boot.  I've bisected the problem
and can state confidently that this commit broke it:

  r269278 | nwhitehorn | 2014-07-29 19:11:05 -0400 (Tue, 29 Jul 2014) | 6 lines

  Make mmap() of the console device when using ofwfb work like other supported
  framebuffer drivers. This lets ofwfb work with xf86-video-scfb and makes
  the driver much more generic and less PCI-centric. This changes some
  user-visible behavior and will require updates to the xorg-server port
  on PowerPC when using ATI graphics cards.

The immediately preceding commit boots fine, with or without kernel debug
options.  (Well, I have to back-patch r269365 and r305901, but then it
boots fine.)  With r269278, the behavior is one of these:

* kernel debug options enabled: seems to freeze immediately at boot.
After the boot loader says it's booting the kernel, the screen goes
black as expected, but no kernel output text ever appears.

* kernel debug options disabled: fatal kernel trap after a few dozen
lines of dmesg output.  I've transcribed a screen shot:

...
gem0: 10kB RX FIFO, 4kB TX FIFO
gem0: Ethernet address: (machine's MAC address here)
cryptosoft0: <software crypto> on nexus0

fatal kernel trap:

   exception  = 0xd0004840 (unknown)
   srr0       = 0x0
   srr1       = 0x0
   lr         = 0x861104
   curthread  = 0

panic: unknown trap
cpuid = 0
KDB: stack backtrace:
#0 0x4a93ac at panic+0x16c
#1 0x861308 at trap_fatal+0x1bc
#2 0x8622e4 at trap+0xfb4
#3 0x84fef0 at powerpc_interrupt+0x170
Uptime: 1s
(reboots)


Furthermore, on the way to isolating when the problem was introduced,
I discovered that the non-debug behavior changed from a trap to a silent
freeze at r279750 ("Make 32-bit PowerPC kernels, like 64-bit PowerPC
kernels, position-independent executables").  It seems to occur at the
same place though, right after reporting cryptosoft0, so I suppose that
it's the same bug manifesting differently.

These behaviors of freeze at boot with debug options, or freeze after
"cryptosoft0" without, are still present in CURRENT and 11-STABLE.
I gather from recent reports on the freebsd-ppc list that a number of
other people see these same behaviors, but not everyone does.  I am unable
to offer a theory as to just what the triggering difference is, or how the
bug might be fixed.  But I'd be happy to run further testing if given some
guidance.
Comment 1 Tom Lane 2017-01-17 00:00:57 UTC
Created attachment 178973 [details]
Revert r269278, in a way that applies to CURRENT

Just for fun, I tried reverting r269278 against CURRENT (r312314 to be precise).
The result boots, confirming my theory that that's what broke it.

I've attached the reversion patch in case anyone really needs it, but to be clear,
I am not proposing that this be committed.
Comment 2 Kevin Bowling freebsd_committer 2017-01-17 00:26:42 UTC
Can you set 'sysctl debug.debugger_on_panic=1' and see if you can get a better idea where it dies from the db> prompt or by switching to kgdb from that?

I don't have any G4 hw so maybe a dump of OFW tree as well.
Comment 3 Tom Lane 2017-01-17 04:23:15 UTC
Created attachment 178976 [details]
ofwdump -ap output

> I don't have any G4 hw so maybe a dump of OFW tree as well.

Here's ofwdump -ap output from the FreeBSD 10.3 installation on the machine;
is that what you wanted?

I'll look into the other part tomorrow.
Comment 4 Tom Lane 2017-01-17 16:34:31 UTC
(In reply to Kevin Bowling from comment #2)
> Can you set 'sysctl debug.debugger_on_panic=1' and see if you can get a better idea
> where it dies from the db> prompt or by switching to kgdb from that?

I couldn't figure out how to do that using the Open Firmware boot loader --- it doesn't understand sysctl, AFAICT.  I tried the -d kernel switch but it didn't change anything.

This machine lacks a serial port, which makes traditional kernel debugging difficult.  I noticed that its Open Firmware version does support telnet connections, though.  Is there a way to use that facility for a kernel debug console?
Comment 5 Kevin Bowling freebsd_committer 2017-01-17 20:15:54 UTC
Yeah the sysctl suggestion was to enable the in kernel debugger.  There's a rather esoteric feature to use firewire as a console https://wiki.freebsd.org/DebugWithDcons, but you'd need another firewire machine :)
Comment 6 Tom Lane 2017-01-17 20:39:18 UTC
(In reply to Kevin Bowling from comment #5)
Hmm ... I lack a firewire cable, but that could be remedied.  The only other firewire machine I have is a G4 laptop, which I'm pretty sure won't boot 11 or 12 for the same reason this one won't.  I do have 10.3 installed on it though.  Does the debug host need to be same FreeBSD version?
Comment 7 Kevin Bowling freebsd_committer 2017-01-17 23:48:15 UTC
I don't think they need to be the same version, but you could revert the one commit on the display system.  I've never used this though, so I can't validate that it actually works still.  But it seems worth trying to see if there is a useful panic string or whatever where the screen is going blank when debugging is enabled, and you can also run kgdb remotely over it.

Otherwise maybe Nathan has enough context with your report now to take a look.
Comment 8 Tom Lane 2017-01-19 13:51:59 UTC
Created attachment 179069 [details]
Minimum part of reverting r269278 to get CURRENT to boot

I've not had any luck bringing a kernel debugger to bear, but I've experimented with the r269278 patch some more, and I can report that the portions attached to this comment are the minimum needed to make it work.  Without the seemingly-superfluous OF_open call in ofwfb_initialize, it freezes during boot in the way previously described.  If the set-depth method call isn't removed from ofwfb_init, it boots but the screen display is completely messed up --- looks like it has the wrong idea about the framebuffer stride.

I'm not sure what to make of this.  I do not understand the division of labor between ofwfb_initialize and ofwfb_init, nor what the intended call sequence is, but I sort of suspect that the actual call sequence is different from what the code's author expected.

Obviously, this is still not a committable patch; the "extra" OF_open call might be safe enough, but removing the set-depth call would presumably represent a loss on better hardware.  But perhaps this will give somebody an idea of what to pursue.
Comment 9 Mark Linimon freebsd_committer freebsd_triage 2017-01-21 23:58:22 UTC
Assign to committer of r269278.
Comment 10 Justin Hibbits freebsd_committer 2017-07-18 03:15:48 UTC
The OF_open() resets the video device, which can help explain things.
Comment 11 Tom Lane 2017-07-20 18:57:10 UTC
I spent some more time looking at this.  I have no new info as to why the re-open in ofwfb_initialize() seems to be required, though jhibbits' theory that it causes a device reset seems plausible.  But what I have found out is that the blithe assumption that the "set-depth" method call will fail without side-effects is wrong.  My machine's Open Firmware will execute that, but it leaves the display unreadable, apparently because the hardware doesn't really know it's being fed a 32-bit framebuffer. (Interestingly, when I do either "32 set-depth" or "8 set-depth" manually at the OF prompt, it seems to take two or three seconds to do.  Wonder why.)

When I boot the machine into OS X, the display is shown as 32-bit.  So it's not that the hardware capability is lacking, apparently it's that this isn't enough to turn it on.

So one route to a fix is to figure out what's missing to get this hardware into 32-bit mode; the other is to figure out a way to not try to switch it into 32-bit mode, without breaking the hardware (that I assume exists) for which the current code is good enough.

Again, this is a 2003 "Mirrored Drive Doors" G4.  It has an ATI Radeon 9000 Pro GPU, and this is what I can see in OF's .properties for the screen device:

name                    ATY,Pheonix_A
compatible              ATY,Pheonix
width                   00000780 
height                  00000438 
linebytes               00000800 
depth                   00000008 
display-type            4c434400 
device_type             display
character-set           ISO8859-1
reg                     00000000  
iso6429-1983-colors     
driver,AAPL,MacOS,PowerPC 4a6f7921 70656666 70777063 00000001 ba4b450b 
                        ... 0001138e bytes total
EDID                    00ffffff ffffff00 5a6325cb 01010101 1d150103 80301b78 
                        2e50f5a4 58529728 135054bf ef80b300 a940a9c0 95009040 
                        81808100 714f023a 80187138 2d40582c 4500dd0c 1100001e 
                        000000ff 00525753 31313239 33303536 300a0000 00fd0032 
                        4b185215 000a2020 20202020 000000fc 00565832 32353020 
                        53455249 4553000f 
address                 9c008000 

Don't really have any further info, but am happy to do experiments if anyone can suggest some.
Comment 12 Teemu Toivola 2018-02-22 20:11:55 UTC
I believe this is the same issue I found when installing FreeBSD on a Mac Mini G4 using FreeBSD-11.1-RELEASE-powerpc-bootonly.iso. The boot using that iso (from a cd) works fine but the system installed as a result gets stuck during first boot right after "cryptosoft0: <software crypto> on nexus0" gets printed.

I then figured out how to boot directly from a usb stick and tried using FreeBSD-11.1-STABLE-powerpc-20180215-r329320-mini-memstick.img.xz. With that image, even the boot of the installer itself gets stuck the same way as earlier with the installed system. I finally tried to install with FreeBSD-12.0-CURRENT-powerpc-20180215-r329338-mini-memstick.img.xz. That one didn't result in the boot process getting stuck during the install nor during the first boot of the system itself and resulted in a usable system.