Bug 252094 - nss "compat": getpw(nam|uid)_r(3) inadvertently reset next entry key for getpwent_r(3)
Summary: nss "compat": getpw(nam|uid)_r(3) inadvertently reset next entry key for getp...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: Unspecified
Hardware: Any Any
: --- Affects Many People
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-24 01:25 UTC by Viktor Dukhovni
Modified: 2021-01-15 05:57 UTC (History)
2 users (show)

See Also:


Attachments
Proposed patch (1.40 KB, patch)
2020-12-24 01:25 UTC, Viktor Dukhovni
no flags Details | Diff
Avoid side-effects from getgr(nam|gid) on getgrent (2.03 KB, patch)
2021-01-08 23:15 UTC, Viktor Dukhovni
no flags Details | Diff
Better getgrent.c patch (Avoid side-effects from getgr(nam|gid) on getgrent) (2.12 KB, patch)
2021-01-09 07:13 UTC, Viktor Dukhovni
no flags Details | Diff
Avoid side-effects from getgr(nam|gid) on getgrent: take III (2.14 KB, patch)
2021-01-09 21:04 UTC, Viktor Dukhovni
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Viktor Dukhovni 2020-12-24 01:25:37 UTC
Created attachment 220877 [details]
Proposed patch

When testing the Haskell "unix" library, an unexpected interaction was encountered between interleaved calls to getpwent_r(3) and either of getpwnam_r(3) and getpwuid_r(3), which manifests only in the default configuration with nsswitch.conf having "passwd: compat".  Changing to "passwd: files" resolves the issue.

The reason is that the "compat" code incorrectly mutates the "global" (really thread-local) st->keynum not only when doing a getpwent() walk, but also when calling getpwnam() or getpwuid(), which should not mutate the keynum (and don't in files_passwd, but inadvertently do in compat_passwd).

A similar issue is present with getgr(nam|gid)_r(3) vs. getgrent_r(3), but there the problem is more entrenched, because the walk position for getgrent() is just an implicit file offset into the open group file.  Fixing this might require adding appropriate fseeko()/ftello() calls so that getgrent_r(3) can resume at the expected offset even after an intervening call to getgrnam_r(3), ...  Alternatively, it might simply make sense to not bother using the already open file with getgr(nam|uid)_r(3), file systems are sufficiently fast these days, and the relevant metadata will be in the cache...

I'm willing to contribute a separate fix for the group case, but first would like to get feedback on the passwd case, and preferred solution for the group case.

Link to the Haskell issue: <https://github.com/haskell/unix/pull/169>.  This can affect any language or system that interleaves multiple user-level "green" threads within a single OS thread.  Or just for some chooses to interleave getpwnam() calls in the middle of walking the list with getpwent() (somewhat less like, but not manifestly invalid).
Comment 1 Viktor Dukhovni 2021-01-02 07:48:33 UTC
Any feedback?  Did I file this in the right place?
Comment 2 Mark Johnston freebsd_committer 2021-01-05 14:48:54 UTC
(In reply to Viktor Dukhovni from comment #1)
This is in the right place, I will try to work on this within the next day.
Comment 3 Viktor Dukhovni 2021-01-05 18:34:51 UTC
(In reply to Mark Johnston from comment #2)
Cool!  Thanks.  If you would in particular let me know the general approach you'd like to see taken to address the issue in the "group" case, I can write the patch if you're pressed for time.  I just need to know whether it'd be OK to just open the file each time with the nam/gid calls, or whether you feel the documented optimisation is still important, in which case, a bit more surgery is required in the "ent" iterator to save/restore file offsets.
Comment 4 Mark Johnston freebsd_committer 2021-01-08 21:35:42 UTC
Sorry for the delay.  I think the passwd patch is ok, I will queue it up for commit.

With respect to getgrnam_r() and getgrgid_r(), I note that we have this "stayopen" whose purpose seems to be exactly to allow those functions to reuse a database handle.  But files_setgrent() and compat_setgrent() never set st->stayopen, so as far as I can tell they will either always open and close the db (unless a previous getgrent() call had opened the db).  The pw database code doesn't appear to have this bug.

I don't love the idea of opening the db each time.  For the sake of sandboxing frameworks like Capsicum where arbitrary filesystem accesses are prohibited, it would be nicer if the setgroupent(3) interface behaves as documented.  Since it doesn't, it's hard to argue against simply opening and closing the group db each time getgr(nam|gid)() is called, especially since that's an easier route to fixing the original problem.
Comment 5 Mark Johnston freebsd_committer 2021-01-08 21:36:46 UTC
(In reply to Mark Johnston from comment #4)
> so as far as I can tell they will either always open and close the db

s/either //
Comment 6 Mark Johnston freebsd_committer 2021-01-08 21:45:05 UTC
(In reply to Mark Johnston from comment #4)
And sure enough we have PR 165527 noting the setgroupent(3) bug. :(

I am inclined to fix both issues, modifying getgrnam and getgrgid to save and restore the offset of the open file, if it was already open.
Comment 7 Viktor Dukhovni 2021-01-08 22:02:14 UTC
So I guess I should not rush off and do the group patch?
Comment 8 Mark Johnston freebsd_committer 2021-01-08 22:44:29 UTC
(In reply to Viktor Dukhovni from comment #7)
Would you be interested in writing it?  If so, I haven't started yet, so please go ahead.  If not I can take care of it.
Comment 9 Viktor Dukhovni 2021-01-08 23:15:54 UTC
Created attachment 221404 [details]
Avoid side-effects from getgr(nam|gid) on getgrent

The grpent() patch.
Comment 10 Viktor Dukhovni 2021-01-08 23:16:42 UTC
I have a version for your review.  I don't have a good way to test these, so please review/build/...
Comment 11 Viktor Dukhovni 2021-01-09 00:56:59 UTC
In my patch I may be assuming that with getgr(nam|uid) it is not possible for st->stayopen to be true and the file not yet be open.  If that assumption is not correct, then perhaps the `pos = ftello(st->fp)` call should be made unconditionally, rather than only when the file is not fresh.

That would actually make the patch simpler.  What do you think?
Comment 12 Viktor Dukhovni 2021-01-09 07:13:51 UTC
Created attachment 221407 [details]
Better getgrent.c patch (Avoid side-effects from getgr(nam|gid) on getgrent)
Comment 13 Viktor Dukhovni 2021-01-09 07:15:05 UTC
I've reworked the group patch to not assume that st->stayopen implies the file is already open.
Comment 14 Mark Johnston freebsd_committer 2021-01-09 16:29:06 UTC
(In reply to Viktor Dukhovni from comment #13)
Thanks.  At the end of both files_group() and compat_group() we have some confusing code:

if (st->fp != NULL && how != nss_lt_all)
    fseeko(st->fp, pos, SEEK_SET);
if (rv == NS_SUCCESS && retval != NULL)
    *(struct group **)retval = grp;
else if (rv == NS_RETURN && *errnop == ERANGE && st->fp != NULL)
    fseeko(st->fp, pos, SEEK_SET);

So if we're doing a name or gid lookup and the caller-provided buffer is not large enough to store the result, we'll seek twice.  This seems to be a minor issue though.

I think the original problem with interleaved getgrent_r() and getgr(gid|name)_r() calls is also not solved, since in the latter case we will close st->fp if stayopen == 0.  I suspect the close should be conditional on "fresh && how != nss_lt_all && !stayopen".
Comment 15 Viktor Dukhovni 2021-01-09 21:01:23 UTC
(In reply to Mark Johnston from comment #14)
Right, I should also set the local stayopen to true if !fresh.  I'll repost the patch.
Comment 16 Viktor Dukhovni 2021-01-09 21:04:58 UTC
Created attachment 221430 [details]
Avoid side-effects from getgr(nam|gid) on getgrent: take III

Set stayopen also if !fresh
Comment 17 Viktor Dukhovni 2021-01-12 16:48:00 UTC
Any feedback on the latest proposed patch?
Comment 18 Mark Johnston freebsd_committer 2021-01-12 20:29:45 UTC
(In reply to Viktor Dukhovni from comment #17)
It looks right to me - thank you!  I've staged it in my tree and am writing some regression tests.
Comment 19 Viktor Dukhovni 2021-01-13 01:56:55 UTC
Thanks, much appreciated.  After these are burned in in *current*, if there is a possibility of a backports to extant stable releases, that'd be great.  It would be nice to reduce the wait-time before application-layer work-arounds can be removed by a couple of years.  It'll still probably be *at least* 5 years before GHC can assume that all supported FreeBSD systems are patched...
Comment 20 Viktor Dukhovni 2021-01-15 05:57:12 UTC
(In reply to Mark Johnston from comment #18)
Speaking of regression tests, here's a quick reproducer in Perl (it just loops printing the same group over and over if the upper bound on `$i` is removed:
```
$ perl -le '
  for ($i = 0; $i < 10 && (@x = getgrent()); ++$i) {
    print join(":", @x);
    @y=getgrgid(0);
  }' 
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
wheel:*:0:root viktor
```
Correct behaviour is to output the group file and stop.