Summary: | Race condition in mountd | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Patryk <patrykkotlowski> | ||||||
Component: | bin | Assignee: | Rick Macklem <rmacklem> | ||||||
Status: | Closed FIXED | ||||||||
Severity: | Affects Many People | CC: | cem, markj, rmacklem | ||||||
Priority: | --- | Flags: | rmacklem:
mfc-stable12+
rmacklem: mfc-stable11+ |
||||||
Version: | CURRENT | ||||||||
Hardware: | Any | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
Patryk
2020-05-20 09:56:14 UTC
Does this fully fix the race? If mountd receives SIGHUP while executing get_exportlist(), it will block in select without re-reading the exports file, and the next request will be handled before the export list is re-processed. Am I missing something? Well, handling of SIGHUP to reload the exports file is asynchronous, so I don't think the race Patryk is referring to is affected by handling the next request before reloading the exports file. I think he is referring to a case where: - exports file is updated, sighup posted - mountd is in get_exportslist(); --> while mountd is in getexport_list()… - exports file is updated and sighup is posted - mountd completes get_exportlist(), but does not do so again, since it set got_sighup = 0 after the last sighup posted was handled. Patryk, do I have the above correct? Now, your fix will fix this problem, but it introduces another one. - Some file servers (Peter Eriksson) have 80000+ file systems exported on them. Yes, I did type the correct number above. He creates a separate file system for each student, etc. (Most of his servers have 20000+, but one is over 80000.) - When exports are (re)loaded on his servers, it can take minutes. I think r348590 reduced the time from something like 20minutes to 5minutes, but it is still minutes. --> What happens if a sighup is posted to mountd once every couple of minutes with your patch? All mountd does is reload exports over and over and over again (lyrics of an old Beatles toon?). The current code can suffer from this too, but it ignores sighups posted while it is in get_exportlist() which at least partially avoids this. - Part of the problem is that FreeBSD's mount command posts a sighup to mountd every time a mount is done, just in case that new file system needs to be exported. (/etc/exports hasn't changed, but it does not know if this new file system needs to be exported. I've never liked this, but it is a long standing FreeBSD tradition, so I dare not change it.;-) So, what can be done to make sure the updated exports file gets processed without just doing get_exportlist() over and over..? I think I'll post somewhere like FreeBSD-net@, since it is somewhat like a networking problem where a delay avoids congestion. An example solution might be: - time how long get_exportlist() takes. - note that sighup(s) have been posted. - do get_exportlist() again, but only after a delay at least as large as the time the last get_exportlist() took. --> That way, the updated exports would be processed, but mountd would still do other RPC work as well. --> It might be that even a delay of 1sec between get_exportlist() calls would be sufficient to allow it to do the Mount RPC requests. (There shouldn't be that many of them.) I've actually known about this for quite a while, but since I didn't have a good solution, I left it as is. Created attachment 215182 [details]
fix the race between SIGHUP and exports reload
Here's the patch that I am currently thinking might fix this problem.
I started with your change, plus I did the following:
- wrapped the test and clear of got_sighup with sigprocmask(),
so a sighup can't get "lost" if it is posted just after the test
of got_sighup.
- Added a time delay RELOADDELAY (currently set to 250msec) that
is applied before doing another reload of the exports, if there
are RPC request(s) to process.
--> I think this will avoid export reloads from "hogging" the daemon.
Only lightly tested at this point.
Patryk, would it be possible for you to test the patch I just attached? (You might have to apply it by hand, but it isn't too big.) If it works for you and others don't make comments w.r.t. how to fix this, I'll put it up for review after some more testing. Thanks for reporting this, rick (In reply to Rick Macklem from comment #2) I think he is referring to a case where: - exports file is updated, sighup posted - mountd is in get_exportslist(); --> while mountd is in getexport_list()… - exports file is updated and sighup is posted - mountd completes get_exportlist(), but does not do so again, since it set got_sighup = 0 after the last sighup posted was handled. Patryk, do I have the above correct? Exactly :) An example solution might be: - time how long get_exportlist() takes. - note that sighup(s) have been posted. - do get_exportlist() again, but only after a delay at least as large as the time the last get_exportlist() took. --> That way, the updated exports would be processed, but mountd would still do other RPC work as well. --> It might be that even a delay of 1sec between get_exportlist() calls would be sufficient to allow it to do the Mount RPC requests. (There shouldn't be that many of them.) Sounds great to me! (In reply to Rick Macklem from comment #4) I tested it but with one little difference because I'm currently working on FreeBSD 12.0 i changed the line: get_exportlist(1); to get_exportlist(0); But I assume it doesn't matter. My test was: Run 150 threads where each changes the exports file and sends SIGHUP to mountd (there is Lock while changing file so there is no race here) I run this test 4 times and without the changes in code the race condition occurred each time. After changes it didn't occur at all (also run it 4 times). Thank you for great work! :) Patryk (In reply to Patryk from comment #6) Sorry for mistake I changed the mentioned line for: get_exportlist(); Patryk Great. Thanks for testing it. Once I test it some more myself, I'll try and get it reviewed/committed. A commit references this bug: Author: rmacklem Date: Sat Jun 6 00:40:02 UTC 2020 New revision: 361854 URL: https://svnweb.freebsd.org/changeset/base/361854 Log: Fix mountd so that it will not lose SIGHUPs that indicate "reload exports". Without this patch, if a SIGHUP is handled while the process is executing get_exportlist(), that SIGHUP is essentially ignored because the got_sighup variable is reset to 0 after get_exportlist(). This results in the exports file(s) not being reloaded until another SIGHUP signal is sent to mountd. This patch fixes this by resetting got_sighup to zero before the get_exportlist() call while SIGHUP is blocked. It also defines a delay time of 250msec before doing another exports reload if there are RPC request(s) to process. This prevents repeated exports reloads from delaying handling of RPC requests significantly. PR: 246597 Reported by: patrykkotlowski@gmail.com Tested by: patrykkotlowski@gmail.com Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D25127 Changes: head/usr.sbin/mountd/mountd.c Created attachment 215283 [details]
fix the race between SIGHUP and exports reload
Fixed for clock changes by using clock_gettime(CLOCK_MONOTONIC,..)
as suggested by markj@.
Patch has been committed to head and will be MFC'd in 2 weeks. A commit references this bug: Author: rmacklem Date: Sun Jun 28 01:29:15 UTC 2020 New revision: 362713 URL: https://svnweb.freebsd.org/changeset/base/362713 Log: MFC: r361854 Fix mountd so that it will not lose SIGHUPs that indicate "reload exports". Without this patch, if a SIGHUP is handled while the process is executing get_exportlist(), that SIGHUP is essentially ignored because the got_sighup variable is reset to 0 after get_exportlist(). This results in the exports file(s) not being reloaded until another SIGHUP signal is sent to mountd. This patch fixes this by resetting got_sighup to zero before the get_exportlist() call while SIGHUP is blocked. It also defines a delay time of 250msec before doing another exports reload if there are RPC request(s) to process. This prevents repeated exports reloads from delaying handling of RPC requests significantly. PR: 246597 Changes: _U stable/12/ stable/12/usr.sbin/mountd/mountd.c A commit references this bug: Author: rmacklem Date: Sun Jun 28 04:08:42 UTC 2020 New revision: 362717 URL: https://svnweb.freebsd.org/changeset/base/362717 Log: MFC: r361854 Fix mountd so that it will not lose SIGHUPs that indicate "reload exports". Without this patch, if a SIGHUP is handled while the process is executing get_exportlist(), that SIGHUP is essentially ignored because the got_sighup variable is reset to 0 after get_exportlist(). This results in the exports file(s) not being reloaded until another SIGHUP signal is sent to mountd. This patch fixes this by resetting got_sighup to zero before the get_exportlist() call while SIGHUP is blocked. It also defines a delay time of 250msec before doing another exports reload if there are RPC request(s) to process. This prevents repeated exports reloads from delaying handling of RPC requests significantly. PR: 246597 Changes: _U stable/11/ stable/11/usr.sbin/mountd/mountd.c Patch has been committed and MFC'd. |