Bug 278233 - PHYS_IN_DMAP and VIRT_IN_DMAP macros assume contiguous DMAP memory
Summary: PHYS_IN_DMAP and VIRT_IN_DMAP macros assume contiguous DMAP memory
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: Unspecified
Hardware: arm64 Any
: --- Affects Some People
Assignee: Andrew Turner
URL: https://reviews.freebsd.org/D44677
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-07 13:00 UTC by Zeev Zilberman
Modified: 2024-05-13 16:30 UTC (History)
3 users (show)

See Also:


Attachments
bug fix: support non-contiguous memory in PHYS_IN_DMAP and VIRT_IN_DMAP (6.50 KB, text/plain)
2024-04-07 13:00 UTC, Zeev Zilberman
no flags Details
bug fix: support non-contiguous memory in PHYS_IN_DMAP and VIRT_IN_DMAP (3.71 KB, patch)
2024-04-07 13:06 UTC, Zeev Zilberman
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zeev Zilberman 2024-04-07 13:00:47 UTC
Created attachment 249802 [details]
bug fix: support non-contiguous memory in PHYS_IN_DMAP and VIRT_IN_DMAP

The issue was observed on r8g.metal-48xl which is a coherent Graviton4-based system with 2 SoCs.

ENA driver crashes when trying to remap BAR2 memory with WC attributes:

ena0: Elastic Network Adapter (ENA)ena v2.7.0
panic: Invalid DMAP table level: 0
 
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a4
panic() at panic+0x48
pmap_change_props_locked() at pmap_change_props_locked+0x6dc
pmap_change_props_locked() at pmap_change_props_locked+0x550
pmap_change_attr() at pmap_change_attr+0x5c
ena_attach() at ena_attach+0x264
device_attach() at device_attach+0x3fc
device_probe_and_attach() at device_probe_and_attach+0x80
bus_generic_attach() at bus_generic_attach+0x1c
pci_attach() at pci_attach+0xec
...


The memory map on those systems has 2 large DRAM regions, one for each SoC. The regions are non contiguous and IO memory that includes the PCI BARs for the 2nd SoC is between those 2 regions. So the memory map looks like following

DRAM region 0: 0x10000000000-0x1c000000000
IO memory(SoC1): 0x41000000000-0x050000000000
DRAM region 1: 0x50000000000-0x5c000000000


PHYS_IN_DMAP and VIRT_IN_DMAP macros in arm64 vmparam.h that inherently assume that the DMAP region is contiguous with current implementation:

/* True if pa is in the dmap range */
#define PHYS_IN_DMAP(pa)        ((pa) >= DMAP_MIN_PHYSADDR && \
    (pa) < DMAP_MAX_PHYSADDR)

 /* True if va is in the dmap range */
 #define        VIRT_IN_DMAP(va)        ((va) >= DMAP_MIN_ADDRESS && \
     (va) < (dmap_max_addr))

This causes the following check in sys/arm64/arm64/pmap.c, pmap_change_props_locked() to wrongly conclude that PCI memory is part of DMAP:
if (!VIRT_IN_DMAP(tmpva) && PHYS_IN_DMAP(pa)) {
  /*
  * Keep the DMAP memory in sync.
  */
  rv = pmap_change_props_locked(
       PHYS_TO_DMAP(pa), pte_size,
       prot, mode, true);
  if (rv != 0)
       return (rv);
}
And consequent lookup of this memory in DMAP page tables fails, as PCI memory is not mapped in DMAP.

Attached a proposed patch that stores a list of DMAP regions and changes the macros above to functions that check the address vs the list.
(Based on https://reviews.freebsd.org/D3538)
The instance boots properly after applying this patch.
Comment 1 Zeev Zilberman 2024-04-07 13:06:57 UTC
Created attachment 249803 [details]
bug fix: support non-contiguous memory in PHYS_IN_DMAP and VIRT_IN_DMAP
Comment 2 Andrew Turner freebsd_committer freebsd_triage 2024-04-08 13:20:03 UTC
I have a possible fix in review D44677
Comment 3 Colin Percival freebsd_committer freebsd_triage 2024-04-08 14:36:36 UTC
Thank Andrew!

Zeev, can you test Andrew's patch?  I can build an AMI with it if it would help, but I don't have access to r8g.metal-48xl so I can't test it myself.
Comment 4 Zeev Zilberman 2024-04-09 15:22:52 UTC
The system boots properly with Andrew's patch, thanks!

Should VIRT_IN_DMAP macro be fixed too? The patch addresses only PHYS_IN_DMAP.
Comment 5 commit-hook freebsd_committer freebsd_triage 2024-04-24 18:32:06 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9d40492efa467095340cf3dca5860880aa441472

commit 9d40492efa467095340cf3dca5860880aa441472
Author:     Andrew Turner <andrew@FreeBSD.org>
AuthorDate: 2024-04-08 10:44:33 +0000
Commit:     Andrew Turner <andrew@FreeBSD.org>
CommitDate: 2024-04-24 18:17:19 +0000

    arm64: Check DMAP address is valid in PHYS_IN_DMAP

    When checking if a physical address is in the DMAP region we assume
    all physical addresses between DMAP_MIN_PHYSADDR and DMAP_MAX_PHYSADDR
    are able to be accesses through the DMAP. It may be the case that
    there is device memory in this range that shouldn't be accessed through
    the DMAP mappings.

    Add a check to PHYS_IN_DMAP that the translated virtual address is a
    valid kernel address. To support code that already checks the address
    is valid add PHYS_IN_DMAP_RANGE.

    PR:             278233
    Reviewed by:    alc, markj
    Sponsored by:   Arm Ltd
    Differential Revision:  https://reviews.freebsd.org/D44677

 sys/arm64/arm64/efirt_machdep.c    |  9 ++-------
 sys/arm64/arm64/machdep.c          |  2 +-
 sys/arm64/arm64/minidump_machdep.c |  7 ++++---
 sys/arm64/include/vmparam.h        | 18 +++++++++++++++---
 4 files changed, 22 insertions(+), 14 deletions(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2024-05-02 08:10:07 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c3a3b231da00e93fcd7baced74fa933b112de473

commit c3a3b231da00e93fcd7baced74fa933b112de473
Author:     Andrew Turner <andrew@FreeBSD.org>
AuthorDate: 2024-04-08 10:44:33 +0000
Commit:     Andrew Turner <andrew@FreeBSD.org>
CommitDate: 2024-05-02 07:59:31 +0000

    arm64: Check DMAP address is valid in PHYS_IN_DMAP

    When checking if a physical address is in the DMAP region we assume
    all physical addresses between DMAP_MIN_PHYSADDR and DMAP_MAX_PHYSADDR
    are able to be accesses through the DMAP. It may be the case that
    there is device memory in this range that shouldn't be accessed through
    the DMAP mappings.

    Add a check to PHYS_IN_DMAP that the translated virtual address is a
    valid kernel address. To support code that already checks the address
    is valid add PHYS_IN_DMAP_RANGE.

    PR:             278233
    Reviewed by:    alc, markj
    Sponsored by:   Arm Ltd
    Differential Revision:  https://reviews.freebsd.org/D44677

    (cherry picked from commit 9d40492efa467095340cf3dca5860880aa441472)

 sys/arm64/arm64/efirt_machdep.c    |  9 ++-------
 sys/arm64/arm64/machdep.c          |  2 +-
 sys/arm64/arm64/minidump_machdep.c |  7 ++++---
 sys/arm64/include/vmparam.h        | 18 +++++++++++++++---
 4 files changed, 22 insertions(+), 14 deletions(-)
Comment 7 Colin Percival freebsd_committer freebsd_triage 2024-05-08 15:56:50 UTC
(In reply to Zeev Zilberman from comment #4)

Can you confirm that 14.1-BETA1 boots on these instances?  (If it doesn't, we need to fix whatever other problems exist ASAP before the release.)
Comment 8 Zeev Zilberman 2024-05-13 07:08:35 UTC
Confirming that FreeBSD 14.1-BETA2 boots properly on r8g.metal-48xl system.
Comment 9 Colin Percival freebsd_committer freebsd_triage 2024-05-13 16:30:19 UTC
Thanks!