Bug 252094

Summary: nss "compat": getpw(nam|uid)_r(3) inadvertently reset next entry key for getpwent_r(3)
Product: Base System Reporter: Viktor Dukhovni <ietf-dane>
Component: binAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Many People CC: emaste, markj
Priority: ---    
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Proposed patch
none
Avoid side-effects from getgr(nam|gid) on getgrent
none
Better getgrent.c patch (Avoid side-effects from getgr(nam|gid) on getgrent)
none
Avoid side-effects from getgr(nam|gid) on getgrent: take III none

Description Viktor Dukhovni 2020-12-24 01:25:37 UTC
Created attachment 220877 [details]
Proposed patch

When testing the Haskell "unix" library, an unexpected interaction was encountered between interleaved calls to getpwent_r(3) and either of getpwnam_r(3) and getpwuid_r(3), which manifests only in the default configuration with nsswitch.conf having "passwd: compat".  Changing to "passwd: files" resolves the issue.

The reason is that the "compat" code incorrectly mutates the "global" (really thread-local) st->keynum not only when doing a getpwent() walk, but also when calling getpwnam() or getpwuid(), which should not mutate the keynum (and don't in files_passwd, but inadvertently do in compat_passwd).

A similar issue is present with getgr(nam|gid)_r(3) vs. getgrent_r(3), but there the problem is more entrenched, because the walk position for getgrent() is just an implicit file offset into the open group file.  Fixing this might require adding appropriate fseeko()/ftello() calls so that getgrent_r(3) can resume at the expected offset even after an intervening call to getgrnam_r(3), ...  Alternatively, it might simply make sense to not bother using the already open file with getgr(nam|uid)_r(3), file systems are sufficiently fast these days, and the relevant metadata will be in the cache...

I'm willing to contribute a separate fix for the group case, but first would like to get feedback on the passwd case, and preferred solution for the group case.

Link to the Haskell issue: <https://github.com/haskell/unix/pull/169>.  This can affect any language or system that interleaves multiple user-level "green" threads within a single OS thread.  Or just for some chooses to interleave getpwnam() calls in the middle of walking the list with getpwent() (somewhat less like, but not manifestly invalid).
Comment 1 Viktor Dukhovni 2021-01-02 07:48:33 UTC
Any feedback?  Did I file this in the right place?
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2021-01-05 14:48:54 UTC
(In reply to Viktor Dukhovni from comment #1)
This is in the right place, I will try to work on this within the next day.
Comment 3 Viktor Dukhovni 2021-01-05 18:34:51 UTC
(In reply to Mark Johnston from comment #2)
Cool!  Thanks.  If you would in particular let me know the general approach you'd like to see taken to address the issue in the "group" case, I can write the patch if you're pressed for time.  I just need to know whether it'd be OK to just open the file each time with the nam/gid calls, or whether you feel the documented optimisation is still important, in which case, a bit more surgery is required in the "ent" iterator to save/restore file offsets.
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2021-01-08 21:35:42 UTC
Sorry for the delay.  I think the passwd patch is ok, I will queue it up for commit.

With respect to getgrnam_r() and getgrgid_r(), I note that we have this "stayopen" whose purpose seems to be exactly to allow those functions to reuse a database handle.  But files_setgrent() and compat_setgrent() never set st->stayopen, so as far as I can tell they will either always open and close the db (unless a previous getgrent() call had opened the db).  The pw database code doesn't appear to have this bug.

I don't love the idea of opening the db each time.  For the sake of sandboxing frameworks like Capsicum where arbitrary filesystem accesses are prohibited, it would be nicer if the setgroupent(3) interface behaves as documented.  Since it doesn't, it's hard to argue against simply opening and closing the group db each time getgr(nam|gid)() is called, especially since that's an easier route to fixing the original problem.
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2021-01-08 21:36:46 UTC
(In reply to Mark Johnston from comment #4)
> so as far as I can tell they will either always open and close the db

s/either //
Comment 6 Mark Johnston freebsd_committer freebsd_triage 2021-01-08 21:45:05 UTC
(In reply to Mark Johnston from comment #4)
And sure enough we have PR 165527 noting the setgroupent(3) bug. :(

I am inclined to fix both issues, modifying getgrnam and getgrgid to save and restore the offset of the open file, if it was already open.
Comment 7 Viktor Dukhovni 2021-01-08 22:02:14 UTC
So I guess I should not rush off and do the group patch?
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2021-01-08 22:44:29 UTC
(In reply to Viktor Dukhovni from comment #7)
Would you be interested in writing it?  If so, I haven't started yet, so please go ahead.  If not I can take care of it.
Comment 9 Viktor Dukhovni 2021-01-08 23:15:54 UTC
Created attachment 221404 [details]
Avoid side-effects from getgr(nam|gid) on getgrent

The grpent() patch.
Comment 10 Viktor Dukhovni 2021-01-08 23:16:42 UTC
I have a version for your review.  I don't have a good way to test these, so please review/build/...
Comment 11 Viktor Dukhovni 2021-01-09 00:56:59 UTC
In my patch I may be assuming that with getgr(nam|uid) it is not possible for st->stayopen to be true and the file not yet be open.  If that assumption is not correct, then perhaps the `pos = ftello(st->fp)` call should be made unconditionally, rather than only when the file is not fresh.

That would actually make the patch simpler.  What do you think?
Comment 12 Viktor Dukhovni 2021-01-09 07:13:51 UTC
Created attachment 221407 [details]
Better getgrent.c patch (Avoid side-effects from getgr(nam|gid) on getgrent)
Comment 13 Viktor Dukhovni 2021-01-09 07:15:05 UTC
I've reworked the group patch to not assume that st->stayopen implies the file is already open.
Comment 14 Mark Johnston freebsd_committer freebsd_triage 2021-01-09 16:29:06 UTC
(In reply to Viktor Dukhovni from comment #13)
Thanks.  At the end of both files_group() and compat_group() we have some confusing code:

if (st->fp != NULL && how != nss_lt_all)
    fseeko(st->fp, pos, SEEK_SET);
if (rv == NS_SUCCESS && retval != NULL)
    *(struct group **)retval = grp;
else if (rv == NS_RETURN && *errnop == ERANGE && st->fp != NULL)
    fseeko(st->fp, pos, SEEK_SET);

So if we're doing a name or gid lookup and the caller-provided buffer is not large enough to store the result, we'll seek twice.  This seems to be a minor issue though.

I think the original problem with interleaved getgrent_r() and getgr(gid|name)_r() calls is also not solved, since in the latter case we will close st->fp if stayopen == 0.  I suspect the close should be conditional on "fresh && how != nss_lt_all && !stayopen".
Comment 15 Viktor Dukhovni 2021-01-09 21:01:23 UTC
(In reply to Mark Johnston from comment #14)
Right, I should also set the local stayopen to true if !fresh.  I'll repost the patch.
Comment 16 Viktor Dukhovni 2021-01-09 21:04:58 UTC
Created attachment 221430 [details]
Avoid side-effects from getgr(nam|gid) on getgrent: take III

Set stayopen also if !fresh
Comment 17 Viktor Dukhovni 2021-01-12 16:48:00 UTC
Any feedback on the latest proposed patch?
Comment 18 Mark Johnston freebsd_committer freebsd_triage 2021-01-12 20:29:45 UTC
(In reply to Viktor Dukhovni from comment #17)
It looks right to me - thank you!  I've staged it in my tree and am writing some regression tests.
Comment 19 Viktor Dukhovni 2021-01-13 01:56:55 UTC
Thanks, much appreciated.  After these are burned in in *current*, if there is a possibility of a backports to extant stable releases, that'd be great.  It would be nice to reduce the wait-time before application-layer work-arounds can be removed by a couple of years.  It'll still probably be *at least* 5 years before GHC can assume that all supported FreeBSD systems are patched...
Comment 20 Viktor Dukhovni 2021-01-15 05:57:12 UTC
(In reply to Mark Johnston from comment #18)
Speaking of regression tests, here's a quick reproducer in Perl (it just loops printing the same group over and over if the upper bound on `$i` is removed:
```
$ perl -le '
  for ($i = 0; $i < 10 && (@x = getgrent()); ++$i) {
    print join(":", @x);
    @y=getgrgid(0);
  }' 
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
```
Correct behaviour is to output the group file and stop.
Comment 21 commit-hook freebsd_committer freebsd_triage 2021-01-21 19:31:12 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6e411d8b14ec5402b7551073e2f9edc4d680dd49

commit 6e411d8b14ec5402b7551073e2f9edc4d680dd49
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-01-21 19:30:19 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-01-21 19:30:19 +0000

    libc/nss tests: Add regression tests for commit 55444c823e1f

    PR:             252094
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 month

 lib/libc/tests/nss/getgr_test.c | 61 +++++++++++++++++++++++++++++++++--------
 lib/libc/tests/nss/getpw_test.c | 51 +++++++++++++++++++++++++++++-----
 2 files changed, 93 insertions(+), 19 deletions(-)
Comment 22 commit-hook freebsd_committer freebsd_triage 2021-01-21 19:31:15 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5619d49e07d3942e438f3d06269f3c1c466cf5b7

commit 5619d49e07d3942e438f3d06269f3c1c466cf5b7
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-01-21 19:30:19 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-01-21 19:30:19 +0000

    libc/nss: Restore iterator state when doing passwd/group lookups

    The getpwent(3) and getgrent(3) implementations maintain some internal
    iterator state.  Interleaved calls to functions which do passwd/group
    lookups using a key, such as getpwnam(3), would in some cases clobber
    this state, causing a subsequent getpwent() or getgrent() call to
    restart iteration from the beginning of the database or to terminate
    early.  This is particularly troublesome in programming environments
    where execution of green threads is interleaved within a single OS
    thread.

    Take care to restore any iterator state following a keyed lookup.  The
    "files" provider for the passwd database was already handling this
    correctly, but "compat" was not, and both providers had this problem
    when accessing the group database.

    PR:             252094
    Submitted by:   Viktor Dukhovni <ietf-dane@dukhovni.org>
    MFC after:      1 month

 lib/libc/gen/getgrent.c | 34 +++++++++++++++-------------------
 lib/libc/gen/getpwent.c | 18 +++++++++++-------
 2 files changed, 26 insertions(+), 26 deletions(-)
Comment 23 Mark Johnston freebsd_committer freebsd_triage 2021-01-21 19:33:21 UTC
(In reply to Viktor Dukhovni from comment #19)
I managed to get the change in under the wire for 13.0, and I plan to do a backport for the next 12.x release.  But yes it'll be a while before one can reasonably assume that all affected systems are patched...
Comment 24 Viktor Dukhovni 2021-01-21 22:05:00 UTC
(In reply to Mark Johnston from comment #23)
Many thanks, much appreciated!  You can close the ticket whenever you feel that'd be appropriate.
Comment 25 commit-hook freebsd_committer freebsd_triage 2021-02-20 16:37:06 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=31f3fbbcd48492e67323fcb54985129fb972801a

commit 31f3fbbcd48492e67323fcb54985129fb972801a
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-01-21 19:30:19 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-02-20 16:35:32 +0000

    libc/nss tests: Add regression tests for commit 55444c823e1f

    PR:             252094
    Sponsored by:   The FreeBSD Foundation

    (cherry picked from commit 6e411d8b14ec5402b7551073e2f9edc4d680dd49)

 lib/libc/tests/nss/getgr_test.c | 61 +++++++++++++++++++++++++++++++++--------
 lib/libc/tests/nss/getpw_test.c | 51 +++++++++++++++++++++++++++++-----
 2 files changed, 93 insertions(+), 19 deletions(-)
Comment 26 commit-hook freebsd_committer freebsd_triage 2021-02-20 16:37:08 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a8499077ab1b23c432c251874cf931e1025d8636

commit a8499077ab1b23c432c251874cf931e1025d8636
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-01-21 19:30:19 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-02-20 16:35:32 +0000

    libc/nss: Restore iterator state when doing passwd/group lookups

    The getpwent(3) and getgrent(3) implementations maintain some internal
    iterator state.  Interleaved calls to functions which do passwd/group
    lookups using a key, such as getpwnam(3), would in some cases clobber
    this state, causing a subsequent getpwent() or getgrent() call to
    restart iteration from the beginning of the database or to terminate
    early.  This is particularly troublesome in programming environments
    where execution of green threads is interleaved within a single OS
    thread.

    Take care to restore any iterator state following a keyed lookup.  The
    "files" provider for the passwd database was already handling this
    correctly, but "compat" was not, and both providers had this problem
    when accessing the group database.

    PR:             252094
    Submitted by:   Viktor Dukhovni <ietf-dane@dukhovni.org>

    (cherry picked from commit 5619d49e07d3942e438f3d06269f3c1c466cf5b7)

 lib/libc/gen/getgrent.c | 34 +++++++++++++++-------------------
 lib/libc/gen/getpwent.c | 18 +++++++++++-------
 2 files changed, 26 insertions(+), 26 deletions(-)
Comment 27 Mark Johnston freebsd_committer freebsd_triage 2021-02-20 16:38:30 UTC
Thanks again for the analysis and patches.  13.0 and 12.3 will ship with these fixes.