Bug 273626

Summary: Memory leak in ioctl nvme passthrough commands
Product: Base System Reporter: David Sloan <david.sloan>
Component: kernAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Some People CC: emaste, imp, markj
Priority: ---    
Version: 13.2-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
git patch file none

Description David Sloan 2023-09-07 21:36:02 UTC
When running nvme passthrough commands through the ioctl interface buffers are never released from the kernel and increase the wired memory count permanently. This will eventually lead to system lockup if enough buffers are sent through the passthrough command interface. This can be replicated with C program at the end of this bug report, which will leak 512 KB of memory per execution. We have tested this on 13.2 and 15-current with the same result.

On inspection of sys/dev/nvme/nvme_ctrlr.c it appears that nvme_ctrlr_passthrough_cmd() is missing a call to vunmapbuf() on exit when user buffers are mapped. Applying the diff:

diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c
index c4a75090174..608b738a1ff 100644
--- a/sys/dev/nvme/nvme_ctrlr.c
+++ b/sys/dev/nvme/nvme_ctrlr.c
@@ -1361,8 +1361,9 @@ nvme_ctrlr_passthrough_cmd(struct nvme_controller *ctrlr,
                mtx_sleep(pt, mtx, PRIBIO, "nvme_pt", 0);
        mtx_unlock(mtx);
 
-err:
        if (buf != NULL) {
+               vunmapbuf(buf);
+err:
                uma_zfree(pbuf_zone, buf);
                PRELE(curproc);
        }

Resolves the issue.

Example program to reproduce the issue:

#include <dev/nvme/nvme.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <unistd.h>

#define SZ_512K (512 * 1024)

int nvme_read(int fd, void *data, size_t len, uint64_t pos) {
        // Assumes device with 512 byte lbas
        struct nvme_pt_command pt = {
                .cmd = {
                        .opc = NVME_OPC_READ,
                        // LBA
                        .cdw10 = (pos / 512) & 0xffffffff,
                        .cdw11 = (pos / 512) >> 32,
                        // nblocks
                        .cdw12 = len / 512 - 1,
                },
                .buf = data,
                .len = len,
                .is_read = 1,
        };
        return ioctl(fd, NVME_PASSTHROUGH_CMD, &pt);
}

int main(int argc, const char **argv) {
        void *buf;
        int rc;
        int fd;

        if (argc != 2) {
                fprintf(stderr, "Expected single nvme namespace argument\n");
                exit(1);
        }

        fd = open(argv[1], O_RDWR);
        if (fd == -1) {
                fprintf(stderr, "error opening device %s: %m\n", argv[1]);
                exit(1);
        }

        rc = posix_memalign(&buf, 4096, SZ_512K);
        if (rc) {
                errno = rc;
                fprintf(stderr, "error allocating buffer %m\n");
                exit(1);
        }

        rc = nvme_read(fd, buf, SZ_512K, 0);
        if (rc) {
                fprintf(stderr, "error reading from nvme device: %m\n");
                exit(1);
        }

        free(buf);
        close(fd);

        return 0;
}
Comment 1 David Sloan 2023-09-07 21:51:19 UTC
Created attachment 244701 [details]
git patch file
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2023-10-02 12:16:40 UTC
This looks right to me.  Warner, do you have any objections to committing this?
Comment 3 Warner Losh freebsd_committer freebsd_triage 2023-10-02 15:42:33 UTC
(In reply to Mark Johnston from comment #2)
This patch is correct. I'm likely going to be a while to commit it, so if you'd like to add a 'Reviewed-by: imp' to the commit and go for it.
This should go into stable/14 and releng/14.0 too.
Comment 4 commit-hook freebsd_committer freebsd_triage 2023-10-02 16:05:56 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7ea866eb14f8ec869a525442c03228b6701e1dab

commit 7ea866eb14f8ec869a525442c03228b6701e1dab
Author:     David Sloan <david.sloan@eideticom.com>
AuthorDate: 2023-09-07 16:22:21 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-02 15:50:14 +0000

    nvme: Fix memory leak in pt ioctl commands

    When running nvme passthrough commands through the ioctl interface
    memory is mapped with vmapbuf() but not unmapped. This results in leaked
    memory whenever a process executes an nvme passthrough command with a
    data buffer. This can be replicated with a simple c function (error
    checks skipped for brevity):

    void leak_memory(int nvme_ns_fd, uint16_t nblocks) {
            struct nvme_pt_command pt = {
                    .cmd = {
                            .opc = NVME_OPC_READ,
                            .cdw12 = nblocks - 1,
                    },
                    .len = nblocks * 512, // Assumes devices with 512 byte lba
                    .is_read = 1, // Reads and writes should both trigger leak
            }
            void *buf;

            posix_memalign(&buf, nblocks * 512);
            pt.buf = buf;
            ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt);
            free(buf);
    }

    Signed-off-by: David Sloan <david.sloan@eideticom.com>

    PR:             273626
    Reviewed by:    imp, markj
    MFC after:      1 week

 sys/dev/nvme/nvme_ctrlr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2023-10-09 00:59:55 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=510404f2f49e0797bbef0034b3c13831bed78b35

commit 510404f2f49e0797bbef0034b3c13831bed78b35
Author:     David Sloan <david.sloan@eideticom.com>
AuthorDate: 2023-09-07 16:22:21 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-09 00:41:25 +0000

    nvme: Fix memory leak in pt ioctl commands

    When running nvme passthrough commands through the ioctl interface
    memory is mapped with vmapbuf() but not unmapped. This results in leaked
    memory whenever a process executes an nvme passthrough command with a
    data buffer. This can be replicated with a simple c function (error
    checks skipped for brevity):

    void leak_memory(int nvme_ns_fd, uint16_t nblocks) {
            struct nvme_pt_command pt = {
                    .cmd = {
                            .opc = NVME_OPC_READ,
                            .cdw12 = nblocks - 1,
                    },
                    .len = nblocks * 512, // Assumes devices with 512 byte lba
                    .is_read = 1, // Reads and writes should both trigger leak
            }
            void *buf;

            posix_memalign(&buf, nblocks * 512);
            pt.buf = buf;
            ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt);
            free(buf);
    }

    Signed-off-by: David Sloan <david.sloan@eideticom.com>

    PR:             273626
    Reviewed by:    imp, markj
    MFC after:      1 week

    (cherry picked from commit 7ea866eb14f8ec869a525442c03228b6701e1dab)

 sys/dev/nvme/nvme_ctrlr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2023-10-09 00:59:58 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=242a062841e7166d8354470835dc1d94773b5c01

commit 242a062841e7166d8354470835dc1d94773b5c01
Author:     David Sloan <david.sloan@eideticom.com>
AuthorDate: 2023-09-07 16:22:21 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-09 00:41:52 +0000

    nvme: Fix memory leak in pt ioctl commands

    When running nvme passthrough commands through the ioctl interface
    memory is mapped with vmapbuf() but not unmapped. This results in leaked
    memory whenever a process executes an nvme passthrough command with a
    data buffer. This can be replicated with a simple c function (error
    checks skipped for brevity):

    void leak_memory(int nvme_ns_fd, uint16_t nblocks) {
            struct nvme_pt_command pt = {
                    .cmd = {
                            .opc = NVME_OPC_READ,
                            .cdw12 = nblocks - 1,
                    },
                    .len = nblocks * 512, // Assumes devices with 512 byte lba
                    .is_read = 1, // Reads and writes should both trigger leak
            }
            void *buf;

            posix_memalign(&buf, nblocks * 512);
            pt.buf = buf;
            ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt);
            free(buf);
    }

    Signed-off-by: David Sloan <david.sloan@eideticom.com>

    PR:             273626
    Reviewed by:    imp, markj
    MFC after:      1 week

    (cherry picked from commit 7ea866eb14f8ec869a525442c03228b6701e1dab)

 sys/dev/nvme/nvme_ctrlr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 7 commit-hook freebsd_committer freebsd_triage 2023-10-09 18:13:45 UTC
A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=11ab396fc995b72d9d5f5d54d0bc93840a7fbf76

commit 11ab396fc995b72d9d5f5d54d0bc93840a7fbf76
Author:     David Sloan <david.sloan@eideticom.com>
AuthorDate: 2023-09-07 16:22:21 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-09 18:05:36 +0000

    nvme: Fix memory leak in pt ioctl commands

    When running nvme passthrough commands through the ioctl interface
    memory is mapped with vmapbuf() but not unmapped. This results in leaked
    memory whenever a process executes an nvme passthrough command with a
    data buffer. This can be replicated with a simple c function (error
    checks skipped for brevity):

    void leak_memory(int nvme_ns_fd, uint16_t nblocks) {
            struct nvme_pt_command pt = {
                    .cmd = {
                            .opc = NVME_OPC_READ,
                            .cdw12 = nblocks - 1,
                    },
                    .len = nblocks * 512, // Assumes devices with 512 byte lba
                    .is_read = 1, // Reads and writes should both trigger leak
            }
            void *buf;

            posix_memalign(&buf, nblocks * 512);
            pt.buf = buf;
            ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt);
            free(buf);
    }

    Signed-off-by: David Sloan <david.sloan@eideticom.com>

    Approved by:    re (gjb)
    PR:             273626
    Reviewed by:    imp, markj
    MFC after:      1 week

    (cherry picked from commit 7ea866eb14f8ec869a525442c03228b6701e1dab)
    (cherry picked from commit 510404f2f49e0797bbef0034b3c13831bed78b35)

 sys/dev/nvme/nvme_ctrlr.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2023-10-09 18:16:32 UTC
Thanks for the analysis and the patch.
Comment 9 David Sloan 2023-10-10 22:59:14 UTC
(In reply to Mark Johnston from comment #8)
Happy to help!