Latest kernel and bootloader are not able to boot in powerpc64 anymore: Stopped at .trap+0x40: stdu r1, r1, 0xfe40 db> bt Tracing pid 0 tid 0 td 0x14c45e0 0xc000000000001190: kernel DSI read trap @ 0xdeadc0dedeadc0de by .pmap_decode_kernel_ptr+0x38: srr1=0x8000000000001032 r1=0xc000000000001440 cr=0x20001032 xer=0x20000000 ctr=0 r2=0x137bc08 sr=0x40000000 saved LR(0x7f83e3784bfffccd) is invalid. Using git bisect, I was able to find the commit that introduced the issue: commit 47a0fb0f642831283509dab8b3ca1c63f3d7cda6 Author: nwhitehorn <nwhitehorn@FreeBSD.org> Date: Sun Mar 4 04:49:09 2018 +0000 Where we can, pass the kernel an FDT facsimile of the OF device tree rather than a pointer to Open Firmware by default. This eliminates a number of potentially unsafe calls to firmware from the kernel and provides better performance. But this commit only sets usefdt to 1 in the bootloader. Older kernels didn't break with this option. So after another round of git bisect I found when the kernel stopped working with the new bootloader: commit e3be9f8fb6d01fd2604b306b6f8ab52afb2d1173 Author: jeff <jeff@FreeBSD.org> Date: Tue Feb 20 00:06:07 2018 +0000 Further parallelize the buffer cache. Tracking down the issue, this is what I was able to find out so far: - during one of the calls to keg_fetch_slab(), reached from malloc_init(), when in mi_startup(), mmu_obj is overwritten with 0xdeadc0dedeadc0de. - then, the next time pmap_decode_kernel_ptr is called the trap happens when it tries to use the now invalid mmu_obj
Adding Jeff, since he did the initial commit, so might have some insight.
Jeff, this (r329612) has also reportedly caused issues with loading kernel modules on powerpc64. See https://lists.freebsd.org/pipermail/freebsd-ppc/2018-March/009389.html
After some more investigation, I found that the part of the commit that "broke" the boot on powerpc64 with FDT (that is the default now) was the bdomain[BUF_DOMAINS] array. By simply adding this array in vfs_bio.c in the previous commit I'm able to reproduce the issue. On powerpc64 this array has a size of about 520KB, mainly because the default MAXCPU value for it is 256, which makes the following bufdomain field take a lot of space: struct bufqueue bd_subq[MAXCPU + 1]; /* Per-cpu sub queues + global */ If I change bd_subq size to 17, for instance, then I'm able to go further, but then I get stuck after this message: usb_needs_explore_all: no devclass I don't know if this is related to same issue or not. However, it is still not clear to me why this extra KBs are a problem for powerpc64. It is probably something related to FDT, because unsetting "usefdt" in the bootloader makes the system bootable again, even without the change above.
What is the relationship of mp_ncpus vs. mp_maxid as used in the code from vfs_bio.c : static void bd_init(struct bufdomain *bd) { . . . bd->bd_cleanq = &bd->bd_subq[mp_ncpus]; bq_init(bd->bd_cleanq, QUEUE_CLEAN, -1, "bufq clean lock"); for (i = 0; i <= mp_maxid; i++) bq_init(&bd->bd_subq[i], QUEUE_CLEAN, i, "bufq clean subqueue lock"); . . . Is mp_maxid<mp_ncpus always so that the loop never replaces bd->bd_subq[mp_ncpus]? Note that the loop goes over 0..mp_maxid (inclusive of both ends), which has mp_maxid+1 values in the range. If the numbering can be sparse, might mp_maxid+1 be a better pick than mp_ncpus (assuming mp_ncpus does not count/include "missing" id's)?
(In reply to Mark Millard from comment #4) Another possible type of oddity: static int bd_flushall(struct bufdomain *bd) { . . . for (i = 0; i < mp_maxid; i++) { bq = &bd->bd_subq[i]; seems to exclude mp_maxid from its loop, unlike bd_init's loop. And it has nothing for mp_ncpus (or mp_maxid+1 if that is more appropriate). (This last might be expected?) DB_SHOW_COMMAND(bufqueues, bufqueues) { . . . for (j = 0; j < mp_maxid + 1; j++) db_printf("%d, ", bd->bd_subq[j].bq_len); has mp_maxid included (via a different technique), but has nothing for mp_ncpu (or mp_maxid+1) compared to bd_init. (This might be expected?)
Looking at the latest version of vfs_bio.c, it seems all these parts are now using only mp_maxid. bd_flushall loop has also been changed to go up to i <= mp_maxid, instead of i < mp_maxid. Still, the issue persists. Also, by adding only the bdomain array (and corresponding structs' declarations) in the previous commit, that was working, the issue happens. The two triggers to this issue seems to be: - set usefdt=1 - change kernel memory layout
(In reply to Leandro Lupori from comment #6) Does it boot fine if you force usefdt to 0?
(In reply to Justin Hibbits from comment #7) Setting usefdt to 0 doesn't change the behavior. But if I unset usefdt in the bootloader, then it boots fine.
(In reply to Leandro Lupori from comment #6) Clearly I did not manage to look at the latest source: the fix for what I reported goes back to -r329943 (Feb 25 UTC). Sorry for the noise.
Just as a basic test, could you try changing MAXCPU to your CPU count? MAXCPU is 256 by default, but if you only have, say, 4 or 8 cores, cutting back to that might lower the memory usage. If it boots with a smaller MAXCPU, it is likely a memory bounds issue.
(In reply to Justin Hibbits from comment #10) Changing MAXCPU to 1 makes the system bootable, even with usefdt=1. With usefdt=1, this is what is observed when changing MAXCPU: 256: crashes right after the copyright message 16: hangs on USB discovery 1: boots fine I'm always able to boot when unsetting usefdt, no matter the MAXCPU value.
I have an update on this. I think I've found what is causing this issue. When ofw_mem_regions() is called during initialization, it looks for /chosen/fdtmemreserv property in the device tree. When FDT is enabled, this property will always exist, as it's a "fake" property returned by OFW FDT getprop itself. Then, if the fdtmemreserv property exists, excise_fdt_reserved() is called, and it does 2 main things: 1- Excludes reserved memory from available memory regions 2- Excludes FDT memory region itself from available memory In my case, it's the second item that causes a problem later, at moea64_early_bootstrap(). There is one part of this function that checks for overlaps between available memory regions and the region where the kernel was loaded. When this overlap is found, the available memory list is supposed to be adjusted to exclude kernel memory. But this code doesn't work correctly for some available memory regions, like the one that is produced by excise_fdt_reserved() when there is an overlap between FDT and kernel end. This is the relevant part of the available regions in my case: 0x00004000 - 0x01800074 (~24MB) 0x0180b04c - 0x02c00000 (~20MB) 0x0344d3f0 - 0x7dbe0000 (~2GB) kernelstart = 0x0100100 kernelend = 0x1b60000 fdt = 0x1b5b000 - 0x1b5f000 (upper bound rounded to page size) When FDT is not used, moea64_early_bootstrap() will adjust the regions like this: 0x00004000 - 0x01000000 0x01b60000 - 0x02c00000 0x0344d3f0 - 0x7dbe0000 And everything works fine. However, when FDT is used, excise_fdt_reserved() will change the available regions to this: 0x00004000 - 0x01800074 0x0180b04c - 0x01b5b000 0x01b5f000 - 0x02c00000 0x0344d3f0 - 0x7dbe0000 And then moea64_early_bootstrap() will adjust the regions and produce this: 0x00004000 - 0x01000000 0x0180b04c - 0x01b5b000 0x01b61000 - 0x02c00000 0x0344d3f0 - 0x7dbe0000 Which turns a large part of the kernel memory (0x0180b04c - 0x01b5b000) into available memory, causing the issue when this region starts being used as such. I'm not sure about the best way to fix this. Maybe one (or more) of the following: 1- Change moea64_early_bootstrap() to make it able to detect and adjust regions such as (0x0180b04c - 0x01b5b000), in which case the whole region should be removed from available list. 2- Change the boot loader to place the FDT after kernelend. And one last thing that I noticed and seems odd to me is that region (0x01800074 - 0x0180b04c) is not reported as available, but even so the kernel is loaded at (0x0100100 - 0x1b60000), which includes the not available region.
The fix is under review, here: https://reviews.freebsd.org/D15121
(In reply to Leandro Lupori from comment #13) This fixed it for me.
A commit references this bug: Author: nwhitehorn Date: Thu Apr 19 18:34:38 UTC 2018 New revision: 332788 URL: https://svnweb.freebsd.org/changeset/base/332788 Log: Fix detection of memory overlap with the kernel in the case where a memory region marked "available" by firmware is contained entirely in the kernel. This had a tendency to happen with FDTs passed by loader, though could for other reasons as well, and would result in the kernel slowly cannibalizing itself for other purposes, eventually resulting in a crash. A similar fix is needed for mmu_oea.c and should probably just be rolled at that point into some generic code in platform.c for taking a mem_region list and removing chunks. PR: 226974 Submitted by: leandro.lupori@gmail.com Reviewed by: jhibbits Differential Revision: D15121 Changes: head/sys/powerpc/aim/mmu_oea64.c
(In reply to commit-hook from comment #15) Is the evidence over the month+ such that this bugzilla entry can progress past the "New" status?
This seems to have been fixed. Please reopen if I'm wrong.