The clng_dt_create function allocates global variables named dg_cv and dg_fd_locks, which are then used by the rest of the routines in clnt_dg.c. They are never freed, and live for the life of the process. These variables are arrays indexed by file descriptor. To ensure that they have enough space, they are sized according to RLIMIT_NOFILE. The problem is that resource limit can be very, very, big. On my production servers, it's autoscaled to 22608720. So any process that does _anything_ involving NIS must allocate 259 MiB just for these variables. The exact same mistake is made in clnt_vc.c, with the vc_fd_locks and vc_cv variables. These add up to about 984 MB for every sshd process on my system (I'm not sure how 259 gets multiplied to 984, but valgrind --tool=massif does show that all the memory is coming from clnt_dg_create). A few hundred of those sshd processes and my entire server falls over. Even with sshd rate limiting, about half of my server's physical RAM is used just for this one stupid array variable. Stupidly, it appears that there's no need for a huge array. Instead, the condvar could've been part of the CLIENT structure. Even stupider, this variable _never_ gets used in some simple applications like getgrouplist. So we allocate it, fault in every page, and then never use it. :facepalm:
AIX is also affected: https://www.ibm.com/support/pages/apar/IY61508 From inspection, DragonflyBSD and NetBSD are affected. OpenBSD seems to lack this functionality. Fedora is affected, or at least it was: https://bugzilla.redhat.com/show_bug.cgi?id=1829947 Debian similarly was affected at one point: https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1651130.html . HP-UX is affected: https://community.hpe.com/t5/operating-system-hp-ux/system-crash-quot-clnt-dg-create-out-of-memory-quot/m-p/2982314 . Illumos is _not_ affected, and hasn't been since at least 2005 when github history begins. https://github.com/illumos/illumos-gate/blob/5e90766b81ebd3571c766124cfae6ee244d7d9d2/usr/src/lib/libnsl/rpc/clnt_dg.c#L4 tirpc is _not_ affected. They fixed it by replacing the array with a linked list. https://git.linux-nfs.org/?p=steved/libtirpc.git;a=commit;h=e7c34df8f57331063b9d795812c62cec3ddfbc17 Also, for search engines' sake, I misspelled the name of the function in the OP. It should've been "clnt_dg_create".
There is an explanation in the herald comment, why the author decided to use per-fd 'lock' instead of embedding it into the CLIENT data. Arrays need to be changed to some hash map perhaps, protected by a pthread mutex or spinlock.
Kib's correct of course; I hadn't read that part yet. I have a working patch now that works by changing the array into an RB tree. I'm going to get some QA time on it before I open a Phabricator review.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=24938f9311c9c9acc1ce747f4e6a088c2dbc967d commit 24938f9311c9c9acc1ce747f4e6a088c2dbc967d Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2023-11-10 17:28:32 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2023-11-15 23:12:50 +0000 lib/libc/rpc: switch the per-fd structs in clnt_{dg,vc}.c to RB Trees This saves oodles of memory, especially when "ulimit -n" is large. It also prevents a buffer overflow if getrlimit should fail. Also replace per-fd condvars with mutexes to simplify the code. PR: 274968 MFC after: 2 weeks Sponsored by: Axcient Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D42597 include/rpc/rpc_com.h | 1 - lib/libc/rpc/clnt_dg.c | 153 ++++++++++++++++++++--------------------- lib/libc/rpc/clnt_vc.c | 165 +++++++++++++++++++++++---------------------- lib/libc/rpc/rpc_com.h | 1 - lib/libc/rpc/rpc_generic.c | 23 ------- sys/rpc/rpc_com.h | 1 - 6 files changed, 162 insertions(+), 182 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=60314995efa05b81839556fb31ed036703885d84 commit 60314995efa05b81839556fb31ed036703885d84 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2023-11-09 22:58:56 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2023-11-30 03:16:16 +0000 libc/libc/rpc: refactor some global variables * Combine dg_fd_locks and dg_cv into one array. * Similarly for vc_fd_locks and vc_cv * Turn some macros into inline functions This is a mostly cosmetic change to make refactoring these strutures in a future commit easier. Sponsored by: Axcient Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D42597 (cherry picked from commit a5c2f4e939430f0048136c39fb9fa6093d401905) lib/libc/rpc: switch the per-fd structs in clnt_{dg,vc}.c to RB Trees This saves oodles of memory, especially when "ulimit -n" is large. It also prevents a buffer overflow if getrlimit should fail. Also replace per-fd condvars with mutexes to simplify the code. PR: 274968 Sponsored by: Axcient Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D42597 (cherry picked from commit 24938f9311c9c9acc1ce747f4e6a088c2dbc967d) include/rpc/rpc_com.h | 1 - lib/libc/rpc/clnt_dg.c | 170 +++++++++++++++++++++--------------------- lib/libc/rpc/clnt_vc.c | 179 ++++++++++++++++++++++----------------------- lib/libc/rpc/rpc_com.h | 1 - lib/libc/rpc/rpc_generic.c | 23 ------ sys/rpc/rpc_com.h | 1 - 6 files changed, 168 insertions(+), 207 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ad8b59ffe7b84cae35dd19b4285863b4b506efa9 commit ad8b59ffe7b84cae35dd19b4285863b4b506efa9 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2023-11-09 22:58:56 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2023-11-30 20:42:55 +0000 libc/libc/rpc: refactor some global variables * Combine dg_fd_locks and dg_cv into one array. * Similarly for vc_fd_locks and vc_cv * Turn some macros into inline functions This is a mostly cosmetic change to make refactoring these strutures in a future commit easier. Sponsored by: Axcient Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D42597 (cherry picked from commit a5c2f4e939430f0048136c39fb9fa6093d401905) lib/libc/rpc: switch the per-fd structs in clnt_{dg,vc}.c to RB Trees This saves oodles of memory, especially when "ulimit -n" is large. It also prevents a buffer overflow if getrlimit should fail. Also replace per-fd condvars with mutexes to simplify the code. PR: 274968 Sponsored by: Axcient Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D42597 (cherry picked from commit 24938f9311c9c9acc1ce747f4e6a088c2dbc967d) include/rpc/rpc_com.h | 1 - lib/libc/rpc/clnt_dg.c | 170 +++++++++++++++++++++--------------------- lib/libc/rpc/clnt_vc.c | 179 ++++++++++++++++++++++----------------------- lib/libc/rpc/rpc_com.h | 1 - lib/libc/rpc/rpc_generic.c | 23 ------ sys/rpc/rpc_com.h | 1 - 6 files changed, 168 insertions(+), 207 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c7d8a572acb2bcdf824a75af3e97b24e36463a34 commit c7d8a572acb2bcdf824a75af3e97b24e36463a34 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2023-11-09 22:58:56 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2023-12-01 21:15:45 +0000 libc/libc/rpc: refactor some global variables * Combine dg_fd_locks and dg_cv into one array. * Similarly for vc_fd_locks and vc_cv * Turn some macros into inline functions This is a mostly cosmetic change to make refactoring these strutures in a future commit easier. Sponsored by: Axcient Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D42597 (cherry picked from commit a5c2f4e939430f0048136c39fb9fa6093d401905) lib/libc/rpc: switch the per-fd structs in clnt_{dg,vc}.c to RB Trees This saves oodles of memory, especially when "ulimit -n" is large. It also prevents a buffer overflow if getrlimit should fail. Also replace per-fd condvars with mutexes to simplify the code. PR: 274968 Sponsored by: Axcient Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D42597 (cherry picked from commit 24938f9311c9c9acc1ce747f4e6a088c2dbc967d) include/rpc/rpc_com.h | 1 - lib/libc/rpc/clnt_dg.c | 170 +++++++++++++++++++++--------------------- lib/libc/rpc/clnt_vc.c | 179 ++++++++++++++++++++++----------------------- lib/libc/rpc/rpc_com.h | 1 - lib/libc/rpc/rpc_generic.c | 23 ------ sys/rpc/rpc_com.h | 1 - 6 files changed, 168 insertions(+), 207 deletions(-)