Bug 270044 - Raspberry Pi 4B does not reboot after addition of ram0 driver
Summary: Raspberry Pi 4B does not reboot after addition of ram0 driver
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Some People
Assignee: Mitchell Horne
URL: https://reviews.freebsd.org/D39003
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-08 16:12 UTC by Mike Karels
Modified: 2023-03-24 13:17 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Karels freebsd_committer freebsd_triage 2023-03-08 16:12:44 UTC
Since the addition of the ram0 driver in e6cf1a0826c9, the Raspberry Pi 4B does not reboot.  It hangs after printing the Uptime, and before printing "Resetting system ..."  Placing hints.ram.0.disabled="1" in /boot/device.hints works around the problem, but this should not be necessary.

There was some discussion of this issue on freebsd-arm.  Mark Millard pointed out that the memory map looked strange with the ram0 driver enabled.  See https://lists.freebsd.org/archives/freebsd-arm/2023-February/002331.html.

I can assist with debugging / running tests.
Comment 1 Kyle Evans freebsd_committer freebsd_triage 2023-03-08 16:54:46 UTC
This is wrong but I don't think will break anything:

>  0x3ee5c000 - 0x3ee5cfff,     0 MB (      1 pages) NoAlloc NoDump
>  0x3ee5c000 - 0x3ee5cfff,     0 MB (      1 pages) NoAlloc 

I guess this means we grabbed exclusions both from the memory map and FDT, but we didn't coalesce them because the flags didn't match exactly. This patch should fix that rather than avoiding processing one or the other, in case we end up with slightly more information in one than the other: https://people.freebsd.org/~kevans/physmem.diff -- I don't think this would break resetting off-hand, though.
Comment 2 Mark Millard 2023-03-08 17:03:50 UTC
One of the things I'm unclear about for what this driver
does is the distinction between, for example, PCIe
addresses that are local to the PCIe device or segment
(right word?) that do not share an address space that
contains the system RAM. By contrast there are  the
inbound PCIe addresses that do share the address space
that also contains the system RAM.

The devinfo -u output (for example) does not make
the distinction up front but shows examples of
both. For example, using a RPi4B for illustration:

PCIe Memory:
   0xc0000000-0xc00fffff (pcib1)
   0xc0100000-0xffffffff (root0)
. . .
pcib1 memory window:
   0x600000000-0x600000fff (bcm_xhci0)
   0xc0001000-0xc00fffff (root0)

The "0x600000000-0x600000fff (bcm_xhci0)" is inbound
addressing having the address space that also includes
system RAM.

The other 3 do not. For these 3 I do not see how the
lack of a conflicts with ram0 is supposed to be handled.


Another issue is that there is no "verbose shutdown" to
be able to get evidence about where the hangup during
shutdown happens or what low level activity FreeBSD is
trying to do at the time.
Comment 3 Mike Karels freebsd_committer freebsd_triage 2023-03-08 19:47:10 UTC
(In reply to Kyle Evans from comment #1)
I tried your patch; it doesn't change reboot, but changes the Excluded map:

*** Excluded.standard   Wed Mar  8 13:33:58 2023
--- Excluded.patch      Wed Mar  8 13:34:28 2023
***************
*** 1,13 ****
  Excluded memory regions:
    0x00000000 - 0x00001fff,     0 MB (      2 pages) NoAlloc 
    0x07ef0000 - 0x07f0ffff,     0 MB (     32 pages) NoAlloc 
!   0x32000000 - 0x335b3fff,    21 MB (   5556 pages) NoAlloc 
    0x39f2a000 - 0x39f2afff,     0 MB (      1 pages) NoAlloc 
    0x39f2e000 - 0x39f2efff,     0 MB (      1 pages) NoAlloc 
    0x39f30000 - 0x39f31fff,     0 MB (      2 pages) NoAlloc 
    0x39f33000 - 0x39f36fff,     0 MB (      4 pages) NoAlloc 
    0x3b350000 - 0x3b35ffff,     0 MB (     16 pages) NoAlloc 
    0x3eaf5000 - 0x3ebf7fff,     1 MB (    259 pages) NoAlloc 
    0x3ef5b000 - 0x3ef5bfff,     0 MB (      1 pages) NoAlloc NoDump
-   0x3ef5b000 - 0x3ef5bfff,     0 MB (      1 pages) NoAlloc 
    0xfe100000 - 0xfe100fff,     0 MB (      1 pages) NoAlloc 
--- 1,12 ----
  Excluded memory regions:
    0x00000000 - 0x00001fff,     0 MB (      2 pages) NoAlloc 
    0x07ef0000 - 0x07f0ffff,     0 MB (     32 pages) NoAlloc 
!   0x32000000 - 0x335b2fff,    21 MB (   5555 pages) NoAlloc 
    0x39f2a000 - 0x39f2afff,     0 MB (      1 pages) NoAlloc 
    0x39f2e000 - 0x39f2efff,     0 MB (      1 pages) NoAlloc 
    0x39f30000 - 0x39f31fff,     0 MB (      2 pages) NoAlloc 
    0x39f33000 - 0x39f36fff,     0 MB (      4 pages) NoAlloc 
    0x3b350000 - 0x3b35ffff,     0 MB (     16 pages) NoAlloc 
    0x3eaf5000 - 0x3ebf7fff,     1 MB (    259 pages) NoAlloc 
    0x3ef5b000 - 0x3ef5bfff,     0 MB (      1 pages) NoAlloc NoDump
    0xfe100000 - 0xfe100fff,     0 MB (      1 pages) NoAlloc 

This may be a different vintage RPi4; it's the original version with 4 GB.
Comment 4 Mitchell Horne freebsd_committer freebsd_triage 2023-03-09 17:40:05 UTC
Hi, thanks for the report. I did not see the ML post, despite being CC'd.

On the RPI4B, reboot is handled by the BCM watchdog device. Looks like ram0 is preventing the bcmwd0 device from attaching 
(dmesg output from my own RPI4):

> bcmwd0: <BCM2708/2835 Watchdog> mem 0x7e100000-0x7e100113,0x7e00a000-0x7e00a023,0x7ec11000-0x7ec1101f on simplebus0                                                                           
> bcmwd0: could not allocate memory resource                                                     
> device_attach: bcmwd0 attach returned 6

Clearly, we need to be slightly more selective in identifying excluded ranges that correspond to physical memory rather than just bus space. This may be a shortcoming in the discovery mechanism (device tree). Ultimately this is somewhat expected fallout from e6cf1a0826c9, and I am looking into it.
Comment 5 Mitchell Horne freebsd_committer freebsd_triage 2023-03-09 19:38:35 UTC
(In reply to Kyle Evans from comment #1)

Evidently that wasn't the problem at hand, but this fix is still good, IMO. Consider it Reviewed by: mhorne.
Comment 6 Mark Millard 2023-03-09 23:56:33 UTC
(In reply to Kyle Evans from comment #1)

When looking at the upstream status of the U-Boot 2023.01
break to some contexts, including 8 GiByte RPi4B/CM4
contexts, I found:

lmb: Treat a region which is a subset as equal :

https://source.denx.de/u-boot/u-boot/-/commit/0d91c88230fe8bd9f8d39ca2ab69cd6282e9f949

Bump LMB_MAX_REGIONS default to 16 :

https://source.denx.de/u-boot/u-boot/-/commit/2dc16a2c1f924985216b3f1d6710f96d6c4fb1ab

are the commits for the issue.

That first seems to have fixed U-Boot 2023.01 causing
extra, overlapping ranges.

From what I can tell, the fix activity has settled on these
updates.

It may be important that replication efforts for testing
sometimes use the original, broken U-Boot 2023.01 .
Comment 7 Mark Millard 2023-03-10 02:55:57 UTC
(In reply to Mitchell Horne from comment #4)

Looks to me like the:

bcmwd0: <BCM2708/2835 Watchdog> mem 0x7e100000-0x7e100113,0x7e00a000-0x7e00a023,0x7ec11000-0x7ec1101f on simplebus0

overlaps with the UEFI page described with:

         MemoryMappedIO 0000fe100000 0000fe100000 00000001 RUNTIME

and it is in the address space that includes the system
RAM. My guess is that the MemoryMappedIO status is an
indication that using those addresses is okay (for the
purpose of the MemoryMappedIO region exists for): not
an actual conflict for the purpose but the page starting
at 0000fe100000 is not okay for general-use memory
allocations.

So, not the same as what I was curious about in Comment #2 .
Comment 8 Mark Millard 2023-03-10 04:20:58 UTC
(In reply to Mark Millard from comment #7)

For (4 GiByte RPi4B):

Physical memory chunk(s):
  0x00002000 - 0x3b2fffff,   946 MB ( 242430 pages)
  0x40000000 - 0xfbffffff,  3008 MB ( 770048 pages)

should (sub)ranges outside of that never be assigned to ram0?:
already not in available RAM subrange? For example, the
page:

         MemoryMappedIO 0000fe100000 0000fe100000 00000001 RUNTIME

is outside the Physical-chunks-of-RAM ranges.

In other words, no part of:

ram0: reserving excluded region: fe100000-fe100fff

should have happened --because of the complete lack
of overlap with the Physical-chunks-of-RAM ranges?

The same would go for "C0T" 8GiByte RPi4B. There is
no overlap for:

Physical memory chunk(s):
   0x00002000 -  0x3b2fffff,   946 MB ( 242430 pages)
   0x40000000 -  0xfbffffff,  3008 MB ( 770048 pages)
  0x100000000 - 0x1ffffffff,  4096 MB (1048576 pages)
vs.
         MemoryMappedIO 0000fe100000 0000fe100000 00000001 RUNTIME


The same would go for "B0T" 8GiByte RPi4B. There is
no overlap for (same Physical-chunks-of-RAM as
"C0T"):

Physical memory chunk(s):
   0x00002000 -  0x3b2fffff,   946 MB ( 242430 pages)
   0x40000000 -  0xfbffffff,  3008 MB ( 770048 pages)
  0x100000000 - 0x1ffffffff,  4096 MB (1048576 pages)
vs.
         MemoryMappedIO 0000fe100000 0000fe100000 00000001 RUNTIME
Comment 9 Mark Millard 2023-03-10 06:15:13 UTC
Looks like physmem_excluded is never used:

# grep -r 'physmem_excluded' /usr/main-src/sys/ | less
/usr/main-src/sys/sys/physmem.h:bool physmem_excluded(vm_paddr_t pa, vm_size_t sz);
/usr/main-src/sys/kern/subr_physmem.c:physmem_excluded(vm_paddr_t pa, vm_size_t sz)

(And, looking at the code, the testing used does not deal with anything
but fully-contained-in as "excluded", not even reporting that such an
oddity as a partial overlap was present if it was).
Comment 10 commit-hook freebsd_committer freebsd_triage 2023-03-15 15:45:40 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8937bd37d07c5c75995e01457aec00fb0a05c462

commit 8937bd37d07c5c75995e01457aec00fb0a05c462
Author:     Mitchell Horne <mhorne@FreeBSD.org>
AuthorDate: 2023-03-15 15:26:57 +0000
Commit:     Mitchell Horne <mhorne@FreeBSD.org>
CommitDate: 2023-03-15 15:28:35 +0000

    arm64: limit EFI excluded regions to physical memory types

    Consolidate add_efi_map_entry() and exclude_efi_map_entry() into a
    single function, handle_efi_map_entry(), so that the exact set of entry
    types handled is the same in the addition or exclusion cases. Before,
    exclude_efi_map_entry() had a 'default' case that would exclude all
    entry types that were not listed explicitly in the switch statement.

    Logically, we do not need to exclude a range that could not possibly be
    added to physmem, and we do not need to exclude bus ranges that are not
    physical memory, for example EFI_MD_TYPE_IOMEM.

    Since physmem's ram0 device will reserve bus memory resources for its
    owned ranges, this was preventing attachment of the watchdog device on
    the RPI4B. For some reason its region of memory-mapped I/O appeared in
    the EFI memory map (with the aforementioned EFI_MD_TYPE_IOMEM type).
    This change fixes the attachment issue, as we prevent the physmem API
    from messing with this range of bus space.

    PR:             270044
    Reported by:    karels, Mark Millard
    Reviewed by:    andrew, karels, imp
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D39003

 sys/arm64/arm64/machdep.c | 71 ++++++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 32 deletions(-)
Comment 11 commit-hook freebsd_committer freebsd_triage 2023-03-24 13:14:31 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=865b5640f8ef3c3a369f13b2c531737b7fb8fec4

commit 865b5640f8ef3c3a369f13b2c531737b7fb8fec4
Author:     Mitchell Horne <mhorne@FreeBSD.org>
AuthorDate: 2023-03-15 15:26:57 +0000
Commit:     Mitchell Horne <mhorne@FreeBSD.org>
CommitDate: 2023-03-24 13:12:40 +0000

    arm64: limit EFI excluded regions to physical memory types

    Consolidate add_efi_map_entry() and exclude_efi_map_entry() into a
    single function, handle_efi_map_entry(), so that the exact set of entry
    types handled is the same in the addition or exclusion cases. Before,
    exclude_efi_map_entry() had a 'default' case that would exclude all
    entry types that were not listed explicitly in the switch statement.

    Logically, we do not need to exclude a range that could not possibly be
    added to physmem, and we do not need to exclude bus ranges that are not
    physical memory, for example EFI_MD_TYPE_IOMEM.

    Since physmem's ram0 device will reserve bus memory resources for its
    owned ranges, this was preventing attachment of the watchdog device on
    the RPI4B. For some reason its region of memory-mapped I/O appeared in
    the EFI memory map (with the aforementioned EFI_MD_TYPE_IOMEM type).
    This change fixes the attachment issue, as we prevent the physmem API
    from messing with this range of bus space.

    PR:             270044
    Reported by:    karels, Mark Millard
    Reviewed by:    andrew, karels, imp
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D39003

    (cherry picked from commit 8937bd37d07c5c75995e01457aec00fb0a05c462)

 sys/arm64/arm64/machdep.c | 71 ++++++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 32 deletions(-)