Bug 226974 - kernel DSI read trap at boot
Summary: kernel DSI read trap at boot
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: powerpc Any
: --- Affects Only Me
Assignee: freebsd-ppc (Nobody)
Depends on:
Reported: 2018-03-27 13:16 UTC by Leandro Lupori
Modified: 2018-05-27 02:02 UTC (History)
5 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Leandro Lupori 2018-03-27 13:16:52 UTC
Latest kernel and bootloader are not able to boot in powerpc64 anymore:

Stopped at      .trap+0x40:     stdu    r1, r1, 0xfe40
db> bt
Tracing pid 0 tid 0 td 0x14c45e0
0xc000000000001190: kernel DSI read trap @ 0xdeadc0dedeadc0de by .pmap_decode_kernel_ptr+0x38: srr1=0x8000000000001032
            r1=0xc000000000001440 cr=0x20001032 xer=0x20000000 ctr=0 r2=0x137bc08 sr=0x40000000
saved LR(0x7f83e3784bfffccd) is invalid.

Using git bisect, I was able to find the commit that introduced the issue:

commit 47a0fb0f642831283509dab8b3ca1c63f3d7cda6
Author: nwhitehorn <nwhitehorn@FreeBSD.org>
Date: Sun Mar 4 04:49:09 2018 +0000

Where we can, pass the kernel an FDT facsimile of the OF device tree rather
    than a pointer to Open Firmware by default. This eliminates a number of
    potentially unsafe calls to firmware from the kernel and provides better

But this commit only sets usefdt to 1 in the bootloader. Older kernels didn't break with this option. So after another round of git bisect I found when the kernel stopped working with the new bootloader:

commit e3be9f8fb6d01fd2604b306b6f8ab52afb2d1173
Author: jeff <jeff@FreeBSD.org>
Date: Tue Feb 20 00:06:07 2018 +0000

Further parallelize the buffer cache.

Tracking down the issue, this is what I was able to find out so far:
- during one of the calls to keg_fetch_slab(), reached from malloc_init(), when in mi_startup(), mmu_obj is overwritten with 0xdeadc0dedeadc0de.
- then, the next time pmap_decode_kernel_ptr is called the trap happens when it tries to use the now invalid mmu_obj
Comment 1 Justin Hibbits freebsd_committer 2018-03-27 14:26:23 UTC
Adding Jeff, since he did the initial commit, so might have some insight.
Comment 2 Justin Hibbits freebsd_committer 2018-03-27 18:07:30 UTC
Jeff, this (r329612) has also reportedly caused issues with loading kernel modules on powerpc64.  See https://lists.freebsd.org/pipermail/freebsd-ppc/2018-March/009389.html
Comment 3 Leandro Lupori 2018-03-27 21:23:51 UTC
After some more investigation, I found that the part of the commit that "broke" the boot on powerpc64 with FDT (that is the default now) was the bdomain[BUF_DOMAINS] array.

By simply adding this array in vfs_bio.c in the previous commit I'm able to reproduce the issue.

On powerpc64 this array has a size of about 520KB, mainly because the default MAXCPU value for it is 256, which makes the following bufdomain field take a lot of space:
struct bufqueue bd_subq[MAXCPU + 1]; /* Per-cpu sub queues + global */

If I change bd_subq size to 17, for instance, then I'm able to go further, but then I get stuck after this message:
usb_needs_explore_all: no devclass
I don't know if this is related to same issue or not.

However, it is still not clear to me why this extra KBs are a problem for powerpc64. It is probably something related to FDT, because unsetting "usefdt" in the bootloader makes the system bootable again, even without the change above.
Comment 4 Mark Millard 2018-03-28 02:29:49 UTC
What is the relationship of mp_ncpus vs. mp_maxid
as used in the code from vfs_bio.c :

static void
bd_init(struct bufdomain *bd)
. . .
        bd->bd_cleanq = &bd->bd_subq[mp_ncpus];
        bq_init(bd->bd_cleanq, QUEUE_CLEAN, -1, "bufq clean lock");
        for (i = 0; i <= mp_maxid; i++)
                bq_init(&bd->bd_subq[i], QUEUE_CLEAN, i,
                    "bufq clean subqueue lock");
. . .

Is mp_maxid<mp_ncpus always so that the loop never
replaces bd->bd_subq[mp_ncpus]? Note that the
loop goes over 0..mp_maxid (inclusive of both ends),
which has mp_maxid+1 values in the range.

If the numbering can be sparse, might mp_maxid+1
be a better pick than mp_ncpus (assuming mp_ncpus
does not count/include "missing" id's)?
Comment 5 Mark Millard 2018-03-28 03:05:24 UTC
(In reply to Mark Millard from comment #4)

Another possible type of oddity:

static int
bd_flushall(struct bufdomain *bd)
. . .
        for (i = 0; i < mp_maxid; i++) {
                bq = &bd->bd_subq[i];

seems to exclude mp_maxid from its loop, unlike
bd_init's loop. And it has nothing for mp_ncpus
(or mp_maxid+1 if that is more appropriate).
(This last might be expected?)

DB_SHOW_COMMAND(bufqueues, bufqueues)
. . .
                for (j = 0; j < mp_maxid + 1; j++)
                        db_printf("%d, ", bd->bd_subq[j].bq_len);

has mp_maxid included (via a different technique),
but has nothing for mp_ncpu (or mp_maxid+1) compared
to bd_init. (This might be expected?)
Comment 6 Leandro Lupori 2018-03-28 16:20:56 UTC
Looking at the latest version of vfs_bio.c, it seems all these parts are now using only mp_maxid.
bd_flushall loop has also been changed to go up to i <= mp_maxid, instead of i < mp_maxid.
Still, the issue persists.

Also, by adding only the bdomain array (and corresponding structs' declarations) in the previous commit, that was working, the issue happens.

The two triggers to this issue seems to be:
- set usefdt=1
- change kernel memory layout
Comment 7 Justin Hibbits freebsd_committer 2018-03-28 16:24:01 UTC
(In reply to Leandro Lupori from comment #6)

Does it boot fine if you force usefdt to 0?
Comment 8 Leandro Lupori 2018-03-28 16:27:00 UTC
(In reply to Justin Hibbits from comment #7)

Setting usefdt to 0 doesn't change the behavior.
But if I unset usefdt in the bootloader, then it boots fine.
Comment 9 Mark Millard 2018-03-28 17:31:17 UTC
(In reply to Leandro Lupori from comment #6)

Clearly I did not manage to look at the latest source:
the fix for what I reported goes back to -r329943 (Feb
25 UTC). Sorry for the noise.
Comment 10 Justin Hibbits freebsd_committer 2018-03-28 19:27:25 UTC
Just as a basic test, could you try changing MAXCPU to your CPU count?  MAXCPU is 256 by default, but if you only have, say, 4 or 8 cores, cutting back to that might lower the memory usage.  If it boots with a smaller MAXCPU, it is likely a memory bounds issue.
Comment 11 Leandro Lupori 2018-03-29 16:28:19 UTC
(In reply to Justin Hibbits from comment #10)

Changing MAXCPU to 1 makes the system bootable, even with usefdt=1.

With usefdt=1, this is what is observed when changing MAXCPU:
256: crashes right after the copyright message
16: hangs on USB discovery
1: boots fine

I'm always able to boot when unsetting usefdt, no matter the MAXCPU value.
Comment 12 Leandro Lupori 2018-04-16 21:24:24 UTC
I have an update on this. I think I've found what is causing this issue.

When ofw_mem_regions() is called during initialization, it looks for /chosen/fdtmemreserv property in the device tree. When FDT is enabled, this property will always exist, as it's a "fake" property returned by OFW FDT getprop itself.

Then, if the fdtmemreserv property exists, excise_fdt_reserved() is called, and it does 2 main things:
1- Excludes reserved memory from available memory regions
2- Excludes FDT memory region itself from available memory

In my case, it's the second item that causes a problem later, at moea64_early_bootstrap().
There is one part of this function that checks for overlaps between available memory regions and the region where the kernel was loaded. When this overlap is found, the available memory list is supposed to be adjusted to exclude kernel memory.
But this code doesn't work correctly for some available memory regions, like the one that is produced by excise_fdt_reserved() when there is an overlap between FDT and kernel end.

This is the relevant part of the available regions in my case:
0x00004000 - 0x01800074 (~24MB)
0x0180b04c - 0x02c00000 (~20MB)
0x0344d3f0 - 0x7dbe0000 (~2GB)

kernelstart = 0x0100100
kernelend   = 0x1b60000
fdt         = 0x1b5b000 - 0x1b5f000 (upper bound rounded to page size)

When FDT is not used, moea64_early_bootstrap() will adjust the regions like this:

0x00004000 - 0x01000000
0x01b60000 - 0x02c00000
0x0344d3f0 - 0x7dbe0000

And everything works fine.
However, when FDT is used, excise_fdt_reserved() will change the available regions to this:

0x00004000 - 0x01800074
0x0180b04c - 0x01b5b000
0x01b5f000 - 0x02c00000
0x0344d3f0 - 0x7dbe0000

And then moea64_early_bootstrap() will adjust the regions and produce this:

0x00004000 - 0x01000000
0x0180b04c - 0x01b5b000
0x01b61000 - 0x02c00000
0x0344d3f0 - 0x7dbe0000

Which turns a large part of the kernel memory (0x0180b04c - 0x01b5b000) into available memory, causing the issue when this region starts being used as such.

I'm not sure about the best way to fix this. Maybe one (or more) of the following:
1- Change moea64_early_bootstrap() to make it able to detect and adjust regions such as (0x0180b04c - 0x01b5b000), in which case the whole region should be removed from available list.
2- Change the boot loader to place the FDT after kernelend.

And one last thing that I noticed and seems odd to me is that region (0x01800074 - 0x0180b04c) is not reported as available, but even so the kernel is loaded at (0x0100100 - 0x1b60000), which includes the not available region.
Comment 13 Leandro Lupori 2018-04-19 14:07:35 UTC
The fix is under review, here: https://reviews.freebsd.org/D15121
Comment 14 Steve Wills freebsd_committer 2018-04-19 18:02:36 UTC
(In reply to Leandro Lupori from comment #13)
This fixed it for me.
Comment 15 commit-hook freebsd_committer 2018-04-19 18:34:47 UTC
A commit references this bug:

Author: nwhitehorn
Date: Thu Apr 19 18:34:38 UTC 2018
New revision: 332788
URL: https://svnweb.freebsd.org/changeset/base/332788

  Fix detection of memory overlap with the kernel in the case where a memory
  region marked "available" by firmware is contained entirely in the kernel.

  This had a tendency to happen with FDTs passed by loader, though could for
  other reasons as well, and would result in the kernel slowly cannibalizing
  itself for other purposes, eventually resulting in a crash.

  A similar fix is needed for mmu_oea.c and should probably just be rolled
  at that point into some generic code in platform.c for taking a mem_region
  list and removing chunks.

  PR:		226974
  Submitted by:	leandro.lupori@gmail.com
  Reviewed by:	jhibbits
  Differential Revision:	D15121

Comment 16 Mark Millard 2018-05-27 02:02:10 UTC
(In reply to commit-hook from comment #15)

Is the evidence over the month+ such that this bugzilla
entry can progress past the "New" status?