Bug 265019 - pmap_growkernel wiring excess memory on module load
Summary: pmap_growkernel wiring excess memory on module load
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-03 21:45 UTC by Austin Shafer
Modified: 2022-10-18 21:56 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Austin Shafer 2022-07-03 21:45:06 UTC
When loading a large kernel module (in this case nvidia.ko) a massive amount of memory will be fired. This ends up wiring 4.2Gb out of 8Gb on the system, which significantly hurts usability.

There was a lot of discussion on discord about this, I'll try to include it here:
https://discord.com/channels/727023752348434432/757305573866733680/992940224063746058

"Maybe it is the addr param itself after all. I'm seeing some weirdness in the min address used for kernel_vm_end that I haven't explained yet. KERNBASE is 0xffffffff80000000, but I see vm_map_find using 0xfffffe0000000000 as the min address to start searching for open space at. So kernel_vm_end appears to start at fffffe00b1600000 when loading the kernel (the kernel log drops some lines there so this is the first value I see for it), and it's raised to find more space. That seems to be the general "neighborhood" of addresses passed to pmap_growkernel . Then at some point when I call kldload it passes some offset from KERNBASE instead of from that "neighborhood", and since KERNBASE and 0xfffffe0000000000 are so far away it allocates a million pages.
Sorry the logic there is a little fuzzy, but tldr is it feels like there are two different ideas about where the kernel VM address starts and kldload triggers us using the wrong one."

What we think is happening is that normally vm_map_find is called starting from VM_MIN_KERNEL_ADDRESS, but link_elf_load_file starts searching at KERNBASE. This ends up wiring massive amount of memory to split the difference. Normally this would be fine since we reserve a bunch of pages after KERNBASE for kernel modules, but if the kernel module is too large and overflows this region, this error might occur.

This PR tracks improving the logic of pmap_growkernel to properly track growing in multiple ways. We might want to have two kernel_vm_ends, and decide the starting/ending points of growing the kernel based on the address given to pmap_growkernel and if it's above KERNBASE
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2022-09-22 23:09:25 UTC
I think this patch will fix it: https://reviews.freebsd.org/D36673
Comment 2 commit-hook freebsd_committer freebsd_triage 2022-09-24 13:38:43 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0b29f5efcc7ee8271ad2f6b6447898b489d618ec

commit 0b29f5efcc7ee8271ad2f6b6447898b489d618ec
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2022-09-24 13:19:21 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2022-09-24 13:27:50 +0000

    amd64: Make it possible to grow the KERNBASE region of KVA

    pmap_growkernel() may be called when mapping a region above KERNBASE,
    typically for a kernel module.  If we have enough PTPs left over from
    bootstrap, pmap_growkernel() does nothing.  However, it's possible to
    run out, and in this case pmap_growkernel() will try to grow the kernel
    map all the way from kernel_vm_end to somewhere past KERNBASE, which can
    easily run the system out of memory.  This happens with large kernel
    modules such as the nvidia GPU driver.  There is also a WIP dtrace
    provider which needs to map KVA in the region above KERNBASE (to provide
    trampolines which allow a copy of traced kernel instruction to be
    executed), and its allocations could potentially trigger this scenario.

    This change modifies pmap_growkernel() to manage the two regions
    separately, allowing them to grow independently.  The end of the
    KERNBASE region is tracked by modifying "nkpt".

    PR:             265019
    Reviewed by:    alc, imp, kib
    MFC after:      2 weeks
    Differential Revision:  https://reviews.freebsd.org/D36673

 sys/amd64/amd64/pmap.c | 68 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 45 insertions(+), 23 deletions(-)
Comment 3 commit-hook freebsd_committer freebsd_triage 2022-10-09 15:21:34 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8bebdbe494f6909221e324ec5c13700dfd30cb5e

commit 8bebdbe494f6909221e324ec5c13700dfd30cb5e
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2022-09-24 13:19:21 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2022-10-09 15:21:10 +0000

    amd64: Make it possible to grow the KERNBASE region of KVA

    pmap_growkernel() may be called when mapping a region above KERNBASE,
    typically for a kernel module.  If we have enough PTPs left over from
    bootstrap, pmap_growkernel() does nothing.  However, it's possible to
    run out, and in this case pmap_growkernel() will try to grow the kernel
    map all the way from kernel_vm_end to somewhere past KERNBASE, which can
    easily run the system out of memory.  This happens with large kernel
    modules such as the nvidia GPU driver.  There is also a WIP dtrace
    provider which needs to map KVA in the region above KERNBASE (to provide
    trampolines which allow a copy of traced kernel instruction to be
    executed), and its allocations could potentially trigger this scenario.

    This change modifies pmap_growkernel() to manage the two regions
    separately, allowing them to grow independently.  The end of the
    KERNBASE region is tracked by modifying "nkpt".

    PR:             265019
    Reviewed by:    alc, imp, kib

    (cherry picked from commit 0b29f5efcc7ee8271ad2f6b6447898b489d618ec)

 sys/amd64/amd64/pmap.c | 65 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 44 insertions(+), 21 deletions(-)
Comment 4 Austin Shafer 2022-10-18 18:24:09 UTC
Sorry, took a bit of time to circle back and test this. Can confirm it fixed the issue. Thanks!
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2022-10-18 21:56:12 UTC
(In reply to Austin Shafer from comment #4)
No problem, thanks for testing.