Bug 274968 - rpc_clnt_create: The dg_cv variable uses absurdly too much memory
Summary: rpc_clnt_create: The dg_cv variable uses absurdly too much memory
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 14.0-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: Alan Somers
URL: https://reviews.freebsd.org/D42597
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-08 19:29 UTC by Alan Somers
Modified: 2023-12-01 21:18 UTC (History)
0 users

See Also:
asomers: mfc-stable14+
asomers: mfc-stable13+
asomers: mfc-stable12+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Somers freebsd_committer freebsd_triage 2023-11-08 19:29:24 UTC
The clng_dt_create function allocates global variables named dg_cv and dg_fd_locks, which are then used by the rest of the routines in clnt_dg.c.  They are never freed, and live for the life of the process.  These variables are arrays indexed by file descriptor.  To ensure that they have enough space, they are sized according to RLIMIT_NOFILE.  The problem is that resource limit can be very, very, big.  On my production servers, it's autoscaled to 22608720.  So any process that does _anything_ involving NIS must allocate 259 MiB just for these variables.

The exact same mistake is made in clnt_vc.c, with the vc_fd_locks and vc_cv variables.

These add up to about 984 MB for every sshd process on my system (I'm not sure how 259 gets multiplied to 984, but valgrind --tool=massif does show that all the memory is coming from clnt_dg_create).  A few hundred of those sshd processes and my entire server falls over.  Even with sshd rate limiting, about half of my server's physical RAM is used just for this one stupid array variable.

Stupidly, it appears that there's no need for a huge array.  Instead, the condvar could've been part of the CLIENT structure.

Even stupider, this variable _never_ gets used in some simple applications like getgrouplist.  So we allocate it, fault in every page, and then never use it.  :facepalm:
Comment 1 Alan Somers freebsd_committer freebsd_triage 2023-11-08 21:52:03 UTC
AIX is also affected: https://www.ibm.com/support/pages/apar/IY61508

From inspection, DragonflyBSD and NetBSD are affected.  OpenBSD seems to lack this functionality.

Fedora is affected, or at least it was: https://bugzilla.redhat.com/show_bug.cgi?id=1829947

Debian similarly was affected at one point: https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1651130.html .

HP-UX is affected: https://community.hpe.com/t5/operating-system-hp-ux/system-crash-quot-clnt-dg-create-out-of-memory-quot/m-p/2982314 .

Illumos is _not_ affected, and hasn't been since at least 2005 when github history begins. https://github.com/illumos/illumos-gate/blob/5e90766b81ebd3571c766124cfae6ee244d7d9d2/usr/src/lib/libnsl/rpc/clnt_dg.c#L4

tirpc is _not_ affected.  They fixed it by replacing the array with a linked list.
https://git.linux-nfs.org/?p=steved/libtirpc.git;a=commit;h=e7c34df8f57331063b9d795812c62cec3ddfbc17

Also, for search engines' sake, I misspelled the name of the function in the OP.  It should've been "clnt_dg_create".
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2023-11-08 23:14:41 UTC
There is an explanation in the herald comment, why the author decided to use
per-fd 'lock' instead of embedding it into the CLIENT data.

Arrays need to be changed to some hash map perhaps, protected by a pthread
mutex or spinlock.
Comment 3 Alan Somers freebsd_committer freebsd_triage 2023-11-10 20:05:42 UTC
Kib's correct of course; I hadn't read that part yet.

I have a working patch now that works by changing the array into an RB tree.  I'm going to get some QA time on it before I open a Phabricator review.
Comment 4 commit-hook freebsd_committer freebsd_triage 2023-11-15 23:13:38 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=24938f9311c9c9acc1ce747f4e6a088c2dbc967d

commit 24938f9311c9c9acc1ce747f4e6a088c2dbc967d
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-11-10 17:28:32 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2023-11-15 23:12:50 +0000

    lib/libc/rpc: switch the per-fd structs in clnt_{dg,vc}.c to RB Trees

    This saves oodles of memory, especially when "ulimit -n" is large.  It
    also prevents a buffer overflow if getrlimit should fail.

    Also replace per-fd condvars with mutexes to simplify the code.

    PR:             274968
    MFC after:      2 weeks
    Sponsored by:   Axcient
    Reviewed by:    kib
    Differential Revision: https://reviews.freebsd.org/D42597

 include/rpc/rpc_com.h      |   1 -
 lib/libc/rpc/clnt_dg.c     | 153 ++++++++++++++++++++---------------------
 lib/libc/rpc/clnt_vc.c     | 165 +++++++++++++++++++++++----------------------
 lib/libc/rpc/rpc_com.h     |   1 -
 lib/libc/rpc/rpc_generic.c |  23 -------
 sys/rpc/rpc_com.h          |   1 -
 6 files changed, 162 insertions(+), 182 deletions(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2023-11-30 03:16:37 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=60314995efa05b81839556fb31ed036703885d84

commit 60314995efa05b81839556fb31ed036703885d84
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-11-09 22:58:56 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2023-11-30 03:16:16 +0000

    libc/libc/rpc: refactor some global variables

    * Combine dg_fd_locks and dg_cv into one array.
    * Similarly for vc_fd_locks and vc_cv
    * Turn some macros into inline functions

    This is a mostly cosmetic change to make refactoring these strutures in
    a future commit easier.

    Sponsored by:   Axcient
    Reviewed by:    kib
    Differential Revision: https://reviews.freebsd.org/D42597

    (cherry picked from commit a5c2f4e939430f0048136c39fb9fa6093d401905)

    lib/libc/rpc: switch the per-fd structs in clnt_{dg,vc}.c to RB Trees

    This saves oodles of memory, especially when "ulimit -n" is large.  It
    also prevents a buffer overflow if getrlimit should fail.

    Also replace per-fd condvars with mutexes to simplify the code.

    PR:             274968
    Sponsored by:   Axcient
    Reviewed by:    kib
    Differential Revision: https://reviews.freebsd.org/D42597

    (cherry picked from commit 24938f9311c9c9acc1ce747f4e6a088c2dbc967d)

 include/rpc/rpc_com.h      |   1 -
 lib/libc/rpc/clnt_dg.c     | 170 +++++++++++++++++++++---------------------
 lib/libc/rpc/clnt_vc.c     | 179 ++++++++++++++++++++++-----------------------
 lib/libc/rpc/rpc_com.h     |   1 -
 lib/libc/rpc/rpc_generic.c |  23 ------
 sys/rpc/rpc_com.h          |   1 -
 6 files changed, 168 insertions(+), 207 deletions(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2023-11-30 23:48:54 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ad8b59ffe7b84cae35dd19b4285863b4b506efa9

commit ad8b59ffe7b84cae35dd19b4285863b4b506efa9
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-11-09 22:58:56 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2023-11-30 20:42:55 +0000

    libc/libc/rpc: refactor some global variables

    * Combine dg_fd_locks and dg_cv into one array.
    * Similarly for vc_fd_locks and vc_cv
    * Turn some macros into inline functions

    This is a mostly cosmetic change to make refactoring these strutures in
    a future commit easier.

    Sponsored by:   Axcient
    Reviewed by:    kib
    Differential Revision: https://reviews.freebsd.org/D42597

    (cherry picked from commit a5c2f4e939430f0048136c39fb9fa6093d401905)

    lib/libc/rpc: switch the per-fd structs in clnt_{dg,vc}.c to RB Trees

    This saves oodles of memory, especially when "ulimit -n" is large.  It
    also prevents a buffer overflow if getrlimit should fail.

    Also replace per-fd condvars with mutexes to simplify the code.

    PR:             274968
    Sponsored by:   Axcient
    Reviewed by:    kib
    Differential Revision: https://reviews.freebsd.org/D42597

    (cherry picked from commit 24938f9311c9c9acc1ce747f4e6a088c2dbc967d)

 include/rpc/rpc_com.h      |   1 -
 lib/libc/rpc/clnt_dg.c     | 170 +++++++++++++++++++++---------------------
 lib/libc/rpc/clnt_vc.c     | 179 ++++++++++++++++++++++-----------------------
 lib/libc/rpc/rpc_com.h     |   1 -
 lib/libc/rpc/rpc_generic.c |  23 ------
 sys/rpc/rpc_com.h          |   1 -
 6 files changed, 168 insertions(+), 207 deletions(-)
Comment 7 commit-hook freebsd_committer freebsd_triage 2023-12-01 21:18:05 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c7d8a572acb2bcdf824a75af3e97b24e36463a34

commit c7d8a572acb2bcdf824a75af3e97b24e36463a34
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-11-09 22:58:56 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2023-12-01 21:15:45 +0000

    libc/libc/rpc: refactor some global variables

    * Combine dg_fd_locks and dg_cv into one array.
    * Similarly for vc_fd_locks and vc_cv
    * Turn some macros into inline functions

    This is a mostly cosmetic change to make refactoring these strutures in
    a future commit easier.

    Sponsored by:   Axcient
    Reviewed by:    kib
    Differential Revision: https://reviews.freebsd.org/D42597

    (cherry picked from commit a5c2f4e939430f0048136c39fb9fa6093d401905)

    lib/libc/rpc: switch the per-fd structs in clnt_{dg,vc}.c to RB Trees

    This saves oodles of memory, especially when "ulimit -n" is large.  It
    also prevents a buffer overflow if getrlimit should fail.

    Also replace per-fd condvars with mutexes to simplify the code.

    PR:             274968
    Sponsored by:   Axcient
    Reviewed by:    kib
    Differential Revision: https://reviews.freebsd.org/D42597

    (cherry picked from commit 24938f9311c9c9acc1ce747f4e6a088c2dbc967d)

 include/rpc/rpc_com.h      |   1 -
 lib/libc/rpc/clnt_dg.c     | 170 +++++++++++++++++++++---------------------
 lib/libc/rpc/clnt_vc.c     | 179 ++++++++++++++++++++++-----------------------
 lib/libc/rpc/rpc_com.h     |   1 -
 lib/libc/rpc/rpc_generic.c |  23 ------
 sys/rpc/rpc_com.h          |   1 -
 6 files changed, 168 insertions(+), 207 deletions(-)