Bug 246597

Summary: Race condition in mountd
Product: Base System Reporter: Patryk <patrykkotlowski>
Component: binAssignee: Rick Macklem <rmacklem>
Status: Closed FIXED    
Severity: Affects Many People CC: cem, markj, rmacklem
Priority: --- Flags: rmacklem: mfc-stable12+
rmacklem: mfc-stable11+
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
fix the race between SIGHUP and exports reload
none
fix the race between SIGHUP and exports reload none

Description Patryk 2020-05-20 09:56:14 UTC
While using mountd I noticed that if there is changes in exports file and SIGHUP sent to mountd sometimes it omits some lines. It turned out that there is race condition in mountd.c file.

	/* Expand svc_run() here so that we can call get_exportlist(). */
	for (;;) {
		if (got_sighup) {
			get_exportlist();
			got_sighup = 0; //TODO raise!!!
		}
		readfds = svc_fdset;
		switch (select(svc_maxfd + 1, &readfds, NULL, NULL, NULL)) {
		case -1:
			if (errno == EINTR)
                                continue;
			syslog(LOG_ERR, "mountd died: select: %m");
			exit(1);
		case 0:
			continue;
		default:
			svc_getreqset(&readfds);
		}
	}

If mountd is processing get_exportlist and receives SIGHUP it will omit next SIGHUP handle.

In my opinion this loop should look like this:

	/* Expand svc_run() here so that we can call get_exportlist(). */
	for (;;) {
		if (got_sighup) {
			got_sighup = 0; //No race :)
			get_exportlist();
		}
		readfds = svc_fdset;
		switch (select(svc_maxfd + 1, &readfds, NULL, NULL, NULL)) {
		case -1:
			if (errno == EINTR)
                                continue;
			syslog(LOG_ERR, "mountd died: select: %m");
			exit(1);
		case 0:
			continue;
		default:
			svc_getreqset(&readfds);
		}
	}

I tested it in my production environment.
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2020-06-01 15:40:20 UTC
Does this fully fix the race?  If mountd receives SIGHUP while executing get_exportlist(), it will block in select without re-reading the exports file, and the next request will be handled before the export list is re-processed.  Am I missing something?
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2020-06-02 01:39:24 UTC
Well, handling of SIGHUP to reload the exports file is asynchronous,
so I don't think the race Patryk is referring to is affected by handling
the next request before reloading the exports file.

I think he is referring to a case where:
- exports file is updated, sighup posted
- mountd is in get_exportslist();
  --> while mountd is in getexport_list()…
      - exports file is updated and sighup is posted
- mountd completes get_exportlist(), but does not
  do so again, since it set got_sighup = 0 after
  the last sighup posted was handled.

Patryk, do I have the above correct?

Now, your fix will fix this problem, but it introduces another one.
- Some file servers (Peter Eriksson) have 80000+ file systems exported
  on them. Yes, I did type the correct number above. He creates a
  separate file system for each student, etc. (Most of his servers have
  20000+, but one is over 80000.)
- When exports are (re)loaded on his servers, it can take minutes.
  I think r348590 reduced the time from something like 20minutes to
  5minutes, but it is still minutes.
--> What happens if a sighup is posted to mountd once every couple
    of minutes with your patch? All mountd does is reload exports
    over and over and over again (lyrics of an old Beatles toon?).
    The current code can suffer from this too, but it ignores sighups
    posted while it is in get_exportlist() which at least partially
    avoids this.
    - Part of the problem is that FreeBSD's mount command posts a
      sighup to mountd every time a mount is done, just in case that
      new file system needs to be exported. (/etc/exports hasn't
      changed, but it does not know if this new file system needs
      to be exported. I've never liked this, but it is a long
      standing FreeBSD tradition, so I dare not change it.;-)

So, what can be done to make sure the updated exports file gets
processed without just doing get_exportlist() over and over..?

I think I'll post somewhere like FreeBSD-net@, since it is somewhat
like a networking problem where a delay avoids congestion.

An example solution might be:
- time how long get_exportlist() takes.
- note that sighup(s) have been posted.
- do get_exportlist() again, but only after a delay at least as large
  as the time the last get_exportlist() took.
--> That way, the updated exports would be processed, but mountd
    would still do other RPC work as well.
--> It might be that even a delay of 1sec between get_exportlist() calls
    would be sufficient to allow it to do the Mount RPC requests.
    (There shouldn't be that many of them.)

I've actually known about this for quite a while, but since I didn't
have a good solution, I left it as is.
Comment 3 Rick Macklem freebsd_committer freebsd_triage 2020-06-03 00:02:12 UTC
Created attachment 215182 [details]
fix the race between SIGHUP and exports reload

Here's the patch that I am currently thinking might fix this problem.
I started with your change, plus I did the following:
- wrapped the test and clear of got_sighup with sigprocmask(),
  so a sighup can't get "lost" if it is posted just after the test
  of got_sighup.
- Added a time delay RELOADDELAY (currently set to 250msec) that
  is applied before doing another reload of the exports, if there
  are RPC request(s) to process.
  --> I think this will avoid export reloads from "hogging" the daemon.

Only lightly tested at this point.
Comment 4 Rick Macklem freebsd_committer freebsd_triage 2020-06-03 00:04:22 UTC
Patryk, would it be possible for you to test the patch I just attached?
(You might have to apply it by hand, but it isn't too big.)

If it works for you and others don't make comments w.r.t. how to
fix this, I'll put it up for review after some more testing.

Thanks for reporting this, rick
Comment 5 Patryk 2020-06-03 06:26:22 UTC
(In reply to Rick Macklem from comment #2)
   I think he is referring to a case where:
   - exports file is updated, sighup posted
   - mountd is in get_exportslist();
     --> while mountd is in getexport_list()…
         - exports file is updated and sighup is posted
   - mountd completes get_exportlist(), but does not
     do so again, since it set got_sighup = 0 after
     the last sighup posted was handled.

   Patryk, do I have the above correct?

Exactly :)

   An example solution might be:
   - time how long get_exportlist() takes.
   - note that sighup(s) have been posted.
   - do get_exportlist() again, but only after a delay at least as large
     as the time the last get_exportlist() took.
   --> That way, the updated exports would be processed, but mountd
       would still do other RPC work as well.
   --> It might be that even a delay of 1sec between get_exportlist() calls
       would be sufficient to allow it to do the Mount RPC requests.
       (There shouldn't be that many of them.)

Sounds great to me!
Comment 6 Patryk 2020-06-03 07:19:11 UTC
(In reply to Rick Macklem from comment #4)
I tested it but with one little difference because I'm currently working on FreeBSD 12.0 i changed the line:
get_exportlist(1);
to
get_exportlist(0);

But I assume it doesn't matter.

My test was:
Run 150 threads where each changes the exports file and sends SIGHUP to mountd (there is Lock while changing file so there is no race here)

I run this test 4 times and without the changes in code the race condition occurred each time.

After changes it didn't occur at all (also run it 4 times).

Thank you for great work! :)

Patryk
Comment 7 Patryk 2020-06-03 07:22:03 UTC
(In reply to Patryk from comment #6)
Sorry for mistake I changed the mentioned line for:

get_exportlist();

Patryk
Comment 8 Rick Macklem freebsd_committer freebsd_triage 2020-06-03 14:58:01 UTC
Great. Thanks for testing it.
Once I test it some more myself, I'll try and get it reviewed/committed.
Comment 9 commit-hook freebsd_committer freebsd_triage 2020-06-06 00:41:00 UTC
A commit references this bug:

Author: rmacklem
Date: Sat Jun  6 00:40:02 UTC 2020
New revision: 361854
URL: https://svnweb.freebsd.org/changeset/base/361854

Log:
  Fix mountd so that it will not lose SIGHUPs that indicate "reload exports".

  Without this patch, if a SIGHUP is handled while the process is executing
  get_exportlist(), that SIGHUP is essentially ignored because the got_sighup
  variable is reset to 0 after get_exportlist().
  This results in the exports file(s) not being reloaded until another SIGHUP
  signal is sent to mountd.
  This patch fixes this by resetting got_sighup to zero before the
  get_exportlist() call while SIGHUP is blocked.
  It also defines a delay time of 250msec before doing another exports reload
  if there are RPC request(s) to process. This prevents repeated exports reloads
  from delaying handling of RPC requests significantly.

  PR:		246597
  Reported by:	patrykkotlowski@gmail.com
  Tested by:	patrykkotlowski@gmail.com
  Reviewed by:	markj
  MFC after:	2 weeks
  Differential Revision:	https://reviews.freebsd.org/D25127

Changes:
  head/usr.sbin/mountd/mountd.c
Comment 10 Rick Macklem freebsd_committer freebsd_triage 2020-06-06 00:46:12 UTC
Created attachment 215283 [details]
fix the race between SIGHUP and exports reload

Fixed for clock changes by using clock_gettime(CLOCK_MONOTONIC,..)
as suggested by markj@.
Comment 11 Rick Macklem freebsd_committer freebsd_triage 2020-06-06 00:47:05 UTC
Patch has been committed to head and will be MFC'd in 2 weeks.
Comment 12 commit-hook freebsd_committer freebsd_triage 2020-06-28 01:30:01 UTC
A commit references this bug:

Author: rmacklem
Date: Sun Jun 28 01:29:15 UTC 2020
New revision: 362713
URL: https://svnweb.freebsd.org/changeset/base/362713

Log:
  MFC: r361854
  Fix mountd so that it will not lose SIGHUPs that indicate "reload exports".

  Without this patch, if a SIGHUP is handled while the process is executing
  get_exportlist(), that SIGHUP is essentially ignored because the got_sighup
  variable is reset to 0 after get_exportlist().
  This results in the exports file(s) not being reloaded until another SIGHUP
  signal is sent to mountd.
  This patch fixes this by resetting got_sighup to zero before the
  get_exportlist() call while SIGHUP is blocked.
  It also defines a delay time of 250msec before doing another exports reload
  if there are RPC request(s) to process. This prevents repeated exports reloads
  from delaying handling of RPC requests significantly.

  PR:		246597

Changes:
_U  stable/12/
  stable/12/usr.sbin/mountd/mountd.c
Comment 13 commit-hook freebsd_committer freebsd_triage 2020-06-28 04:09:30 UTC
A commit references this bug:

Author: rmacklem
Date: Sun Jun 28 04:08:42 UTC 2020
New revision: 362717
URL: https://svnweb.freebsd.org/changeset/base/362717

Log:
  MFC: r361854
  Fix mountd so that it will not lose SIGHUPs that indicate "reload exports".

  Without this patch, if a SIGHUP is handled while the process is executing
  get_exportlist(), that SIGHUP is essentially ignored because the got_sighup
  variable is reset to 0 after get_exportlist().
  This results in the exports file(s) not being reloaded until another SIGHUP
  signal is sent to mountd.
  This patch fixes this by resetting got_sighup to zero before the
  get_exportlist() call while SIGHUP is blocked.
  It also defines a delay time of 250msec before doing another exports reload
  if there are RPC request(s) to process. This prevents repeated exports reloads
  from delaying handling of RPC requests significantly.

  PR:		246597

Changes:
_U  stable/11/
  stable/11/usr.sbin/mountd/mountd.c
Comment 14 Rick Macklem freebsd_committer freebsd_triage 2020-06-28 06:34:04 UTC
Patch has been committed and MFC'd.