When running nvme passthrough commands through the ioctl interface buffers are never released from the kernel and increase the wired memory count permanently. This will eventually lead to system lockup if enough buffers are sent through the passthrough command interface. This can be replicated with C program at the end of this bug report, which will leak 512 KB of memory per execution. We have tested this on 13.2 and 15-current with the same result. On inspection of sys/dev/nvme/nvme_ctrlr.c it appears that nvme_ctrlr_passthrough_cmd() is missing a call to vunmapbuf() on exit when user buffers are mapped. Applying the diff: diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c index c4a75090174..608b738a1ff 100644 --- a/sys/dev/nvme/nvme_ctrlr.c +++ b/sys/dev/nvme/nvme_ctrlr.c @@ -1361,8 +1361,9 @@ nvme_ctrlr_passthrough_cmd(struct nvme_controller *ctrlr, mtx_sleep(pt, mtx, PRIBIO, "nvme_pt", 0); mtx_unlock(mtx); -err: if (buf != NULL) { + vunmapbuf(buf); +err: uma_zfree(pbuf_zone, buf); PRELE(curproc); } Resolves the issue. Example program to reproduce the issue: #include <dev/nvme/nvme.h> #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <sys/ioctl.h> #include <unistd.h> #define SZ_512K (512 * 1024) int nvme_read(int fd, void *data, size_t len, uint64_t pos) { // Assumes device with 512 byte lbas struct nvme_pt_command pt = { .cmd = { .opc = NVME_OPC_READ, // LBA .cdw10 = (pos / 512) & 0xffffffff, .cdw11 = (pos / 512) >> 32, // nblocks .cdw12 = len / 512 - 1, }, .buf = data, .len = len, .is_read = 1, }; return ioctl(fd, NVME_PASSTHROUGH_CMD, &pt); } int main(int argc, const char **argv) { void *buf; int rc; int fd; if (argc != 2) { fprintf(stderr, "Expected single nvme namespace argument\n"); exit(1); } fd = open(argv[1], O_RDWR); if (fd == -1) { fprintf(stderr, "error opening device %s: %m\n", argv[1]); exit(1); } rc = posix_memalign(&buf, 4096, SZ_512K); if (rc) { errno = rc; fprintf(stderr, "error allocating buffer %m\n"); exit(1); } rc = nvme_read(fd, buf, SZ_512K, 0); if (rc) { fprintf(stderr, "error reading from nvme device: %m\n"); exit(1); } free(buf); close(fd); return 0; }
Created attachment 244701 [details] git patch file
This looks right to me. Warner, do you have any objections to committing this?
(In reply to Mark Johnston from comment #2) This patch is correct. I'm likely going to be a while to commit it, so if you'd like to add a 'Reviewed-by: imp' to the commit and go for it. This should go into stable/14 and releng/14.0 too.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7ea866eb14f8ec869a525442c03228b6701e1dab commit 7ea866eb14f8ec869a525442c03228b6701e1dab Author: David Sloan <david.sloan@eideticom.com> AuthorDate: 2023-09-07 16:22:21 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-10-02 15:50:14 +0000 nvme: Fix memory leak in pt ioctl commands When running nvme passthrough commands through the ioctl interface memory is mapped with vmapbuf() but not unmapped. This results in leaked memory whenever a process executes an nvme passthrough command with a data buffer. This can be replicated with a simple c function (error checks skipped for brevity): void leak_memory(int nvme_ns_fd, uint16_t nblocks) { struct nvme_pt_command pt = { .cmd = { .opc = NVME_OPC_READ, .cdw12 = nblocks - 1, }, .len = nblocks * 512, // Assumes devices with 512 byte lba .is_read = 1, // Reads and writes should both trigger leak } void *buf; posix_memalign(&buf, nblocks * 512); pt.buf = buf; ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt); free(buf); } Signed-off-by: David Sloan <david.sloan@eideticom.com> PR: 273626 Reviewed by: imp, markj MFC after: 1 week sys/dev/nvme/nvme_ctrlr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=510404f2f49e0797bbef0034b3c13831bed78b35 commit 510404f2f49e0797bbef0034b3c13831bed78b35 Author: David Sloan <david.sloan@eideticom.com> AuthorDate: 2023-09-07 16:22:21 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-10-09 00:41:25 +0000 nvme: Fix memory leak in pt ioctl commands When running nvme passthrough commands through the ioctl interface memory is mapped with vmapbuf() but not unmapped. This results in leaked memory whenever a process executes an nvme passthrough command with a data buffer. This can be replicated with a simple c function (error checks skipped for brevity): void leak_memory(int nvme_ns_fd, uint16_t nblocks) { struct nvme_pt_command pt = { .cmd = { .opc = NVME_OPC_READ, .cdw12 = nblocks - 1, }, .len = nblocks * 512, // Assumes devices with 512 byte lba .is_read = 1, // Reads and writes should both trigger leak } void *buf; posix_memalign(&buf, nblocks * 512); pt.buf = buf; ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt); free(buf); } Signed-off-by: David Sloan <david.sloan@eideticom.com> PR: 273626 Reviewed by: imp, markj MFC after: 1 week (cherry picked from commit 7ea866eb14f8ec869a525442c03228b6701e1dab) sys/dev/nvme/nvme_ctrlr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=242a062841e7166d8354470835dc1d94773b5c01 commit 242a062841e7166d8354470835dc1d94773b5c01 Author: David Sloan <david.sloan@eideticom.com> AuthorDate: 2023-09-07 16:22:21 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-10-09 00:41:52 +0000 nvme: Fix memory leak in pt ioctl commands When running nvme passthrough commands through the ioctl interface memory is mapped with vmapbuf() but not unmapped. This results in leaked memory whenever a process executes an nvme passthrough command with a data buffer. This can be replicated with a simple c function (error checks skipped for brevity): void leak_memory(int nvme_ns_fd, uint16_t nblocks) { struct nvme_pt_command pt = { .cmd = { .opc = NVME_OPC_READ, .cdw12 = nblocks - 1, }, .len = nblocks * 512, // Assumes devices with 512 byte lba .is_read = 1, // Reads and writes should both trigger leak } void *buf; posix_memalign(&buf, nblocks * 512); pt.buf = buf; ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt); free(buf); } Signed-off-by: David Sloan <david.sloan@eideticom.com> PR: 273626 Reviewed by: imp, markj MFC after: 1 week (cherry picked from commit 7ea866eb14f8ec869a525442c03228b6701e1dab) sys/dev/nvme/nvme_ctrlr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=11ab396fc995b72d9d5f5d54d0bc93840a7fbf76 commit 11ab396fc995b72d9d5f5d54d0bc93840a7fbf76 Author: David Sloan <david.sloan@eideticom.com> AuthorDate: 2023-09-07 16:22:21 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-10-09 18:05:36 +0000 nvme: Fix memory leak in pt ioctl commands When running nvme passthrough commands through the ioctl interface memory is mapped with vmapbuf() but not unmapped. This results in leaked memory whenever a process executes an nvme passthrough command with a data buffer. This can be replicated with a simple c function (error checks skipped for brevity): void leak_memory(int nvme_ns_fd, uint16_t nblocks) { struct nvme_pt_command pt = { .cmd = { .opc = NVME_OPC_READ, .cdw12 = nblocks - 1, }, .len = nblocks * 512, // Assumes devices with 512 byte lba .is_read = 1, // Reads and writes should both trigger leak } void *buf; posix_memalign(&buf, nblocks * 512); pt.buf = buf; ioctl(nvme_ns_fd, NVME_PASSTHROUGH_COMMAND, &pt); free(buf); } Signed-off-by: David Sloan <david.sloan@eideticom.com> Approved by: re (gjb) PR: 273626 Reviewed by: imp, markj MFC after: 1 week (cherry picked from commit 7ea866eb14f8ec869a525442c03228b6701e1dab) (cherry picked from commit 510404f2f49e0797bbef0034b3c13831bed78b35) sys/dev/nvme/nvme_ctrlr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Thanks for the analysis and the patch.
(In reply to Mark Johnston from comment #8) Happy to help!