Created attachment 220877 [details] Proposed patch When testing the Haskell "unix" library, an unexpected interaction was encountered between interleaved calls to getpwent_r(3) and either of getpwnam_r(3) and getpwuid_r(3), which manifests only in the default configuration with nsswitch.conf having "passwd: compat". Changing to "passwd: files" resolves the issue. The reason is that the "compat" code incorrectly mutates the "global" (really thread-local) st->keynum not only when doing a getpwent() walk, but also when calling getpwnam() or getpwuid(), which should not mutate the keynum (and don't in files_passwd, but inadvertently do in compat_passwd). A similar issue is present with getgr(nam|gid)_r(3) vs. getgrent_r(3), but there the problem is more entrenched, because the walk position for getgrent() is just an implicit file offset into the open group file. Fixing this might require adding appropriate fseeko()/ftello() calls so that getgrent_r(3) can resume at the expected offset even after an intervening call to getgrnam_r(3), ... Alternatively, it might simply make sense to not bother using the already open file with getgr(nam|uid)_r(3), file systems are sufficiently fast these days, and the relevant metadata will be in the cache... I'm willing to contribute a separate fix for the group case, but first would like to get feedback on the passwd case, and preferred solution for the group case. Link to the Haskell issue: <https://github.com/haskell/unix/pull/169>. This can affect any language or system that interleaves multiple user-level "green" threads within a single OS thread. Or just for some chooses to interleave getpwnam() calls in the middle of walking the list with getpwent() (somewhat less like, but not manifestly invalid).
Any feedback? Did I file this in the right place?
(In reply to Viktor Dukhovni from comment #1) This is in the right place, I will try to work on this within the next day.
(In reply to Mark Johnston from comment #2) Cool! Thanks. If you would in particular let me know the general approach you'd like to see taken to address the issue in the "group" case, I can write the patch if you're pressed for time. I just need to know whether it'd be OK to just open the file each time with the nam/gid calls, or whether you feel the documented optimisation is still important, in which case, a bit more surgery is required in the "ent" iterator to save/restore file offsets.
Sorry for the delay. I think the passwd patch is ok, I will queue it up for commit. With respect to getgrnam_r() and getgrgid_r(), I note that we have this "stayopen" whose purpose seems to be exactly to allow those functions to reuse a database handle. But files_setgrent() and compat_setgrent() never set st->stayopen, so as far as I can tell they will either always open and close the db (unless a previous getgrent() call had opened the db). The pw database code doesn't appear to have this bug. I don't love the idea of opening the db each time. For the sake of sandboxing frameworks like Capsicum where arbitrary filesystem accesses are prohibited, it would be nicer if the setgroupent(3) interface behaves as documented. Since it doesn't, it's hard to argue against simply opening and closing the group db each time getgr(nam|gid)() is called, especially since that's an easier route to fixing the original problem.
(In reply to Mark Johnston from comment #4) > so as far as I can tell they will either always open and close the db s/either //
(In reply to Mark Johnston from comment #4) And sure enough we have PR 165527 noting the setgroupent(3) bug. :( I am inclined to fix both issues, modifying getgrnam and getgrgid to save and restore the offset of the open file, if it was already open.
So I guess I should not rush off and do the group patch?
(In reply to Viktor Dukhovni from comment #7) Would you be interested in writing it? If so, I haven't started yet, so please go ahead. If not I can take care of it.
Created attachment 221404 [details] Avoid side-effects from getgr(nam|gid) on getgrent The grpent() patch.
I have a version for your review. I don't have a good way to test these, so please review/build/...
In my patch I may be assuming that with getgr(nam|uid) it is not possible for st->stayopen to be true and the file not yet be open. If that assumption is not correct, then perhaps the `pos = ftello(st->fp)` call should be made unconditionally, rather than only when the file is not fresh. That would actually make the patch simpler. What do you think?
Created attachment 221407 [details] Better getgrent.c patch (Avoid side-effects from getgr(nam|gid) on getgrent)
I've reworked the group patch to not assume that st->stayopen implies the file is already open.
(In reply to Viktor Dukhovni from comment #13) Thanks. At the end of both files_group() and compat_group() we have some confusing code: if (st->fp != NULL && how != nss_lt_all) fseeko(st->fp, pos, SEEK_SET); if (rv == NS_SUCCESS && retval != NULL) *(struct group **)retval = grp; else if (rv == NS_RETURN && *errnop == ERANGE && st->fp != NULL) fseeko(st->fp, pos, SEEK_SET); So if we're doing a name or gid lookup and the caller-provided buffer is not large enough to store the result, we'll seek twice. This seems to be a minor issue though. I think the original problem with interleaved getgrent_r() and getgr(gid|name)_r() calls is also not solved, since in the latter case we will close st->fp if stayopen == 0. I suspect the close should be conditional on "fresh && how != nss_lt_all && !stayopen".
(In reply to Mark Johnston from comment #14) Right, I should also set the local stayopen to true if !fresh. I'll repost the patch.
Created attachment 221430 [details] Avoid side-effects from getgr(nam|gid) on getgrent: take III Set stayopen also if !fresh
Any feedback on the latest proposed patch?
(In reply to Viktor Dukhovni from comment #17) It looks right to me - thank you! I've staged it in my tree and am writing some regression tests.
Thanks, much appreciated. After these are burned in in *current*, if there is a possibility of a backports to extant stable releases, that'd be great. It would be nice to reduce the wait-time before application-layer work-arounds can be removed by a couple of years. It'll still probably be *at least* 5 years before GHC can assume that all supported FreeBSD systems are patched...
(In reply to Mark Johnston from comment #18) Speaking of regression tests, here's a quick reproducer in Perl (it just loops printing the same group over and over if the upper bound on `$i` is removed: ``` $ perl -le ' for ($i = 0; $i < 10 && (@x = getgrent()); ++$i) { print join(":", @x); @y=getgrgid(0); }' wheel:*:0:root viktor wheel:*:0:root viktor wheel:*:0:root viktor wheel:*:0:root viktor wheel:*:0:root viktor wheel:*:0:root viktor wheel:*:0:root viktor wheel:*:0:root viktor wheel:*:0:root viktor wheel:*:0:root viktor ``` Correct behaviour is to output the group file and stop.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6e411d8b14ec5402b7551073e2f9edc4d680dd49 commit 6e411d8b14ec5402b7551073e2f9edc4d680dd49 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-01-21 19:30:19 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-01-21 19:30:19 +0000 libc/nss tests: Add regression tests for commit 55444c823e1f PR: 252094 Sponsored by: The FreeBSD Foundation MFC after: 1 month lib/libc/tests/nss/getgr_test.c | 61 +++++++++++++++++++++++++++++++++-------- lib/libc/tests/nss/getpw_test.c | 51 +++++++++++++++++++++++++++++----- 2 files changed, 93 insertions(+), 19 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5619d49e07d3942e438f3d06269f3c1c466cf5b7 commit 5619d49e07d3942e438f3d06269f3c1c466cf5b7 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-01-21 19:30:19 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-01-21 19:30:19 +0000 libc/nss: Restore iterator state when doing passwd/group lookups The getpwent(3) and getgrent(3) implementations maintain some internal iterator state. Interleaved calls to functions which do passwd/group lookups using a key, such as getpwnam(3), would in some cases clobber this state, causing a subsequent getpwent() or getgrent() call to restart iteration from the beginning of the database or to terminate early. This is particularly troublesome in programming environments where execution of green threads is interleaved within a single OS thread. Take care to restore any iterator state following a keyed lookup. The "files" provider for the passwd database was already handling this correctly, but "compat" was not, and both providers had this problem when accessing the group database. PR: 252094 Submitted by: Viktor Dukhovni <ietf-dane@dukhovni.org> MFC after: 1 month lib/libc/gen/getgrent.c | 34 +++++++++++++++------------------- lib/libc/gen/getpwent.c | 18 +++++++++++------- 2 files changed, 26 insertions(+), 26 deletions(-)
(In reply to Viktor Dukhovni from comment #19) I managed to get the change in under the wire for 13.0, and I plan to do a backport for the next 12.x release. But yes it'll be a while before one can reasonably assume that all affected systems are patched...
(In reply to Mark Johnston from comment #23) Many thanks, much appreciated! You can close the ticket whenever you feel that'd be appropriate.
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=31f3fbbcd48492e67323fcb54985129fb972801a commit 31f3fbbcd48492e67323fcb54985129fb972801a Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-01-21 19:30:19 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-02-20 16:35:32 +0000 libc/nss tests: Add regression tests for commit 55444c823e1f PR: 252094 Sponsored by: The FreeBSD Foundation (cherry picked from commit 6e411d8b14ec5402b7551073e2f9edc4d680dd49) lib/libc/tests/nss/getgr_test.c | 61 +++++++++++++++++++++++++++++++++-------- lib/libc/tests/nss/getpw_test.c | 51 +++++++++++++++++++++++++++++----- 2 files changed, 93 insertions(+), 19 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a8499077ab1b23c432c251874cf931e1025d8636 commit a8499077ab1b23c432c251874cf931e1025d8636 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-01-21 19:30:19 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-02-20 16:35:32 +0000 libc/nss: Restore iterator state when doing passwd/group lookups The getpwent(3) and getgrent(3) implementations maintain some internal iterator state. Interleaved calls to functions which do passwd/group lookups using a key, such as getpwnam(3), would in some cases clobber this state, causing a subsequent getpwent() or getgrent() call to restart iteration from the beginning of the database or to terminate early. This is particularly troublesome in programming environments where execution of green threads is interleaved within a single OS thread. Take care to restore any iterator state following a keyed lookup. The "files" provider for the passwd database was already handling this correctly, but "compat" was not, and both providers had this problem when accessing the group database. PR: 252094 Submitted by: Viktor Dukhovni <ietf-dane@dukhovni.org> (cherry picked from commit 5619d49e07d3942e438f3d06269f3c1c466cf5b7) lib/libc/gen/getgrent.c | 34 +++++++++++++++------------------- lib/libc/gen/getpwent.c | 18 +++++++++++------- 2 files changed, 26 insertions(+), 26 deletions(-)
Thanks again for the analysis and patches. 13.0 and 12.3 will ship with these fixes.