Created attachment 204343 [details] make exportlist linked list hashed and add "-I" for incremental exports updates
Peter reported via email that mountd would take 16seconds to reload the exports upon receipt of a SIGHUP for his 72,000+ file system server, which results in a 16sec suspension of the nfsd threads. By inspection, it appears that mountd.c uses a single linked list for all exportlist elements (there is one for each file system). During reload, this list is searched for a match for each entry in the exports file, often going the entire length of the list without finding a match. Assuming one export entry for each file system and 1nsec to traverse a single SLIST() element, that would take: 1 * 72000 * 36000 -> 2.6sec (the 36,000 is because the list will grow from 0 length to 72000 during the load, so on average the list will have around 36000 elements) (A simple benchmark I ran on freefall.freebsd.org showed it took about 2nsec to traverse a single SLIST() element. For the above, I assumed a faster CPU. However, the actual total time could be 5-6sec.) Changing this to a hash table of lists with a load factor of 10 should reduce the total search time to something like 1 * 10 * 72000 -> 720usec. (Even if the estimate of 1nsec is low, this will be 1-2msec.) This is the first change in the above patch. I think that accounts for part of the time. I believe a majority of the rest is that mountd.c deletes and reloads all the exports in the kernel. The second change the patch does is add a "-I" option that tells mountd to only delete/load exports in the kernel for export file entries that have changed. There will somewhat increased CPU overheads for first doing the processing of the exports file(s) and then comparing the old and new to find the changes that need to be done via nmount() for the kernel. However, the processing of the exports file(s) can be done before the nfsd threads are suspended (assuming the "-S" option), so hopefully the duration of the nfsd threads being suspended will be shortened considerably for small changes (what will typically happen for a reload) to the exports file(s).
The patch (in the attachment) generates a lot of output via syslog(LOG_ERR..) that may be useful for debugging. However, if you want to get rid of that, just delete all the lines where "syslog(LOG_ERR..." are not indented. The patch won't apply easily to a head mountd.c, since I have already committed some parts of the patch to head. If you need the patch for head, just email me and I can send you a copy. Hopefully it will apply to most other fairly recent mountd.c sources.
Created attachment 204409 [details] make exphead into a hash table of lists and add -I option This is an updated version of the patch. I found out that statfs(2) resulted in significant overheads, so compare_nmount_exportlist() now compares the mount path as a sanity check instead of comparing the fsids returned by doing statfs(2) on both old and new directory paths. This way statfs(2) only needs to be done for the cases where the exports have changed, avoiding doing statfs(2) in compare_nmount_exportlist() for the common case of "same" when reloading the exports file(s). I believe that comparing the directory paths provides as good a sanity check to ensure that the old and new refer to the same file system, since they have the same fsid (found in search) and they have the same mount path. The reload still does a statfs(2) on every entry in the current exports file(s), but that is unavoidable, to ensure it has an up to date fsid for the current/new exports file(s).
Created attachment 204435 [details] make exphead into a hash table of lists and add -I The patch is updated with a fix from Peter Errikson, where a pair of { } were needed after I added a debug syslog() call. It would only have affected the outcome if the file system in the old version of the exports file(s) had been unmounted. Good catch Peter. I have to decide whether these debug syslog() calls need to stay in the patch, enabled via something like a command line argument or signal to the deamon. If anyone has a preference w.r.t. this, please let me know.
Created attachment 204458 [details] make exphead into a hash table of lists and add -I The patch has been updated with a significant fix for two problems. The previous version didn't handle the case of a "default" (for all other hosts) correctly and also used the incorrect security flavors list under some circumstances. It wouldn't save the anonymous credentials and security flavors correctly, so they could be properly compared and updated for -I when a SIGHUP was received. It also needed to use different lists of security flavors for certain cases. Hopefully, this version has gotten it all correct.
Created attachment 204460 [details] make exphead into a hash table of lists and add -I This variant of the patch replaces the syslog(LOG_ERR...) calls with syslog(LOG_DEBUG...) calls that are enabled/disabled by posting a SIGUSR1 to the mountd daemon. This way they can be turned on/off without restarting or recompiling the daemon. The debug information logged in /var/log/debug.log (or whatever you have LOG_DEBUG configured as in your /etc/syslog.conf) may be useful if you run into problems when using the "-I" command line option to enable the incremental updates of kernel exports when SIGHUP is posted to mountd. (Actually differential might be a better term for it, since "-I" means "just update the kernel exports with changes from the previous exports".)
Created attachment 204472 [details] make exphead into a hash table of lists and add -I This version of the patch is changed in a few ways: - Handling of "-public" has been fixed for the "-I" option, although I suspect few people use it. - Handling of multiple "V4:" lines has been fixed for the "-I" option. - The code has been changed from using SIGUSR1 to checking for the existence of a file called /var/log/mountd.debug to enable the LOGDEBUG() stuff. cy@ is looking at using DTrace instead of syslog(). If he comes up with a patch, I'll update it here. Unless you use "-public" or multiple "V4:" lines in your exports file(s), the previous version of the patch should be ok. (Or at least "as ok as this one".)
Created attachment 204552 [details] make exphead into a hash table of lists and add -I This update to the patch is changed from the previous one as follows: - compare_cred() and compare_sec() are fixed to handle cases where there are duplicate entries in one list (cr_groups or secflavors) and a changed value replacing the duplicate in the new list. They also both have code for handling the simple common cases before doing the full comparison. - A few cleanups that shouldn't affect semantics. - LOGDEBUG() is replaced with macros that cy@ will be using so that dtrace can be used. Until that code works, the macros MOUNTD_xxx() are all defined null, so the code can be built/tested. This patch will be updated again when cy@ has the dtrace stuff working.
Created attachment 204560 [details] make exphead into a hash table of lists and add -I I think this might be the final version of the patch. Thanks to cy@, it now has the DTrace stuff added to it and it has all the fixes/cleanups I have come up with. This version patches the man page as well, to add the "-I" option description. I will be continuing to test it (and I think Peter is as well) and will update the patch if it needs any fixes.
Created attachment 204580 [details] make exphead into a hash table of lists and add -I Yet another variant of the patch. This one reverts the changes for using dtrace. (If cy@ knows how to make it work so that the debugging info can be turned on/off without restarting the daemon, it may come back.) It has all the fixes of previous versions and may be the final one here (unless the dtrace variant comes back). Sorry for updating it so many times, but it is a fairly involved patch.
A commit references this bug: Author: rmacklem Date: Fri May 31 01:28:49 UTC 2019 New revision: 348452 URL: https://svnweb.freebsd.org/changeset/base/348452 Log: Replace a single linked list with a hash table of lists. mountd.c uses a single linked list of "struct exportlist" structures, where there is one of these for each exported file system on the NFS server. This list gets long if there are a large number of file systems exported and the list must be searched for each line in the exports file(s) when SIGHUP causes the exports file(s) to be reloaded. A simple benchmark that traverses SLIST() elements and compares two 32bit fields in the structure for equal (which is what the search is) appears to take a couple of nsec. So, for a server with 72000 exported file systems, this can take about 5sec during reload of the exports file(s). By replacing the single linked list with a hash table with a target of 10 elements per list, the time should be reduced to less than 1msec. Peter Errikson (who has a server with 72000+ exported file systems) ran a test program using 5 hashes to see how they worked. fnv_32_buf(fsid,..., 0) fnv_32_buf(fsid,..., FNV1_32_INIT) hash32_buf(fsid,..., 0) hash32_buf(fsid,..., HASHINIT) - plus simply using the low order bits of fsid.val[0]. The first three behaved about equally well, with the first one being slightly better than the others. It has an average variation of about 4.5% about the target list length and that is what this patch uses. Peter Errikson also tested this hash table version and found that the performance wasn't measurably improved by a larger hash table, so a load factor of 10 appears adequate. Tested by: pen@lysator.liu.se (with other patches) PR: 237860 MFC after: 1 month Changes: head/usr.sbin/mountd/mountd.c
A commit references this bug: Author: rmacklem Date: Mon Jun 3 22:58:51 UTC 2019 New revision: 348590 URL: https://svnweb.freebsd.org/changeset/base/348590 Log: Modify mountd so that it incrementally updates the kernel exports upon a reload. Without this patch, mountd would delete/load all exports from the exports file(s) when it receives a SIGHUP. This works fine for small exports file(s), but can take several seconds to do when there are large numbers (10000+) of exported file systems. Most of this time is spent doing the system calls that delete/export each of these file systems. When the "-S" option has been specified (the default these days), the nfsd threads are suspended for several seconds while the reload is done. This patch changes mountd so that it only does system calls for file systems where the exports have been changed/added/deleted as compared to the exports done for the previous load/reload of the exports file(s). Basically, when SIGHUP is posted to mountd, it saves the exportlist structures from the previous load and creates a new set of structures from the current exports file(s). Then it compares the current with the previous and only does system calls for cases that have been changed/added/deleted. The nfsd threads do not need to be suspended until the comparison step is being done. This results in a suspension period of milliseconds for a server with 10000+ exported file systems. There is some code using a LOGDEBUG() macro that allow runtime debugging output via syslog(LOG_DEBUG,...) that can be enabled by creating a file called /var/log/mountd.debug. This code is expected to be replaced with code that uses dtrace by cy@ in the near future, once issues w.r.t. dtrace in stable/12 have been resolved. The patch should not change the usage of the exports file(s), but improves the performance of reloading large exports file(s) where there are only a small number of changes done to the file(s). Tested by: pen@lysator.liu.se PR: 237860 Reviewed by: kib MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20487 Changes: head/usr.sbin/mountd/mountd.c head/usr.sbin/mountd/pathnames.h
A commit references this bug: Author: rmacklem Date: Mon Jun 17 00:00:12 UTC 2019 New revision: 349123 URL: https://svnweb.freebsd.org/changeset/base/349123 Log: MFC: r347476 Factor out some exportlist list operations into separate functions. This patch moves the code that removes and frees all exportlist elements out into a separate function called free_exports(). It does the same for the insertion of a new exportlist entry into a list. It also adds a second argument to ex_search() for the list to use. None of these changes have any semantic effect. They are being done to prepare the code for future patches that convert the single linked list for the exportlist to a hash table of lists and a patch that will do incremental changes of exports in the kernel. And it fixes the argument for SLIST_HEAD_INITIALIZER() to a pointer, which doesn't really matter, since SLIST_HEAD_INITIALIZER() doesn't use the argument. PR: 237860 Changes: _U stable/12/ stable/12/usr.sbin/mountd/mountd.c
A commit references this bug: Author: rmacklem Date: Mon Jun 17 00:11:46 UTC 2019 New revision: 349124 URL: https://svnweb.freebsd.org/changeset/base/349124 Log: MFC: r347476 Factor out some exportlist list operations into separate functions. This patch moves the code that removes and frees all exportlist elements out into a separate function called free_exports(). It does the same for the insertion of a new exportlist entry into a list. It also adds a second argument to ex_search() for the list to use. None of these changes have any semantic effect. They are being done to prepare the code for future patches that convert the single linked list for the exportlist to a hash table of lists and a patch that will do incremental changes of exports in the kernel. And it fixes the argument for SLIST_HEAD_INITIALIZER() to a pointer, which doesn't really matter, since SLIST_HEAD_INITIALIZER() doesn't use the argument. PR: 237860 Changes: _U stable/11/ stable/11/usr.sbin/mountd/mountd.c
A commit references this bug: Author: rmacklem Date: Mon Jun 17 00:20:39 UTC 2019 New revision: 349125 URL: https://svnweb.freebsd.org/changeset/base/349125 Log: MFC: r347498 Factor code into two new functions in preparation for a future commit. Factor code into two functions. read_exportfile() a functon which reads the exports file(s) and calls get_exportlist_one() to process each of them. delete_export() a function which deletes the exports in the kernel for a file system. The contents of these functions is just the same code as was used to do the operations, moved into separate functions. As such, there is no semantic change. This is being done in preparation for a future commit that will add an option to do incremental changes of kernel exports upon receiving SIGHUP. PR: 237860 Changes: _U stable/12/ stable/12/usr.sbin/mountd/mountd.c
A commit references this bug: Author: rmacklem Date: Mon Jun 17 00:29:40 UTC 2019 New revision: 349126 URL: https://svnweb.freebsd.org/changeset/base/349126 Log: MFC: r347498 Factor code into two new functions in preparation for a future commit. Factor code into two functions. read_exportfile() a functon which reads the exports file(s) and calls get_exportlist_one() to process each of them. delete_export() a function which deletes the exports in the kernel for a file system. The contents of these functions is just the same code as was used to do the operations, moved into separate functions. As such, there is no semantic change. This is being done in preparation for a future commit that will add an option to do incremental changes of kernel exports upon receiving SIGHUP. PR: 237860 Changes: _U stable/11/ stable/11/usr.sbin/mountd/mountd.c
A commit references this bug: Author: rmacklem Date: Mon Jun 17 00:37:56 UTC 2019 New revision: 349127 URL: https://svnweb.freebsd.org/changeset/base/349127 Log: MFC: r347583 Replace global list for grouplist with list(s) for each exportlist element. In mountd.c, the grouplist structures are linked into a single global linked list headed by "grphead". The only use of this linked list is to free all list elements when the exportlist elements are also all being free'd at the time the exports are being reloaded. This patch replaces this one global linked list head with a list head in each exportlist structure, where the grouplist elements for that exported file system are linked. The only change is that now the grouplist elements are free'd with the associated exportlist element as they are free'd instead of all grouplist elements being free'd after the exportlist elements are free'd. This change should have no effect in practice. This is being done, since a future patch that will add a "-I" option for incrementally updating the exports in the kernel needs to know which grouplist elements are associated with each exported file system and having them linked into a list headed by the exportlist element does that. PR: 237860 Changes: _U stable/12/ stable/12/usr.sbin/mountd/mountd.c
A commit references this bug: Author: rmacklem Date: Mon Jun 17 00:58:50 UTC 2019 New revision: 349128 URL: https://svnweb.freebsd.org/changeset/base/349128 Log: MFC: r347583 Replace global list for grouplist with list(s) for each exportlist element. In mountd.c, the grouplist structures are linked into a single global linked list headed by "grphead". The only use of this linked list is to free all list elements when the exportlist elements are also all being free'd at the time the exports are being reloaded. This patch replaces this one global linked list head with a list head in each exportlist structure, where the grouplist elements for that exported file system are linked. The only change is that now the grouplist elements are free'd with the associated exportlist element as they are free'd instead of all grouplist elements being free'd after the exportlist elements are free'd. This change should have no effect in practice. This is being done, since a future patch that will add a "-I" option for incrementally updating the exports in the kernel needs to know which grouplist elements are associated with each exported file system and having them linked into a list headed by the exportlist element does that. PR: 237860 Changes: _U stable/11/ stable/11/usr.sbin/mountd/mountd.c
A commit references this bug: Author: rmacklem Date: Fri Jul 5 00:55:46 UTC 2019 New revision: 349755 URL: https://svnweb.freebsd.org/changeset/base/349755 Log: MFC: r348452 Replace a single linked list with a hash table of lists. mountd.c uses a single linked list of "struct exportlist" structures, where there is one of these for each exported file system on the NFS server. This list gets long if there are a large number of file systems exported and the list must be searched for each line in the exports file(s) when SIGHUP causes the exports file(s) to be reloaded. A simple benchmark that traverses SLIST() elements and compares two 32bit fields in the structure for equal (which is what the search is) appears to take a couple of nsec. So, for a server with 72000 exported file systems, this can take about 5sec during reload of the exports file(s). By replacing the single linked list with a hash table with a target of 10 elements per list, the time should be reduced to less than 1msec. Peter Errikson (who has a server with 72000+ exported file systems) ran a test program using 5 hashes to see how they worked. fnv_32_buf(fsid,..., 0) fnv_32_buf(fsid,..., FNV1_32_INIT) hash32_buf(fsid,..., 0) hash32_buf(fsid,..., HASHINIT) - plus simply using the low order bits of fsid.val[0]. The first three behaved about equally well, with the first one being slightly better than the others. It has an average variation of about 4.5% about the target list length and that is what this patch uses. Peter Errikson also tested this hash table version and found that the performance wasn't measurably improved by a larger hash table, so a load factor of 10 appears adequate. PR: 237860 Changes: _U stable/12/ stable/12/usr.sbin/mountd/mountd.c
A commit references this bug: Author: rmacklem Date: Fri Jul 5 01:04:59 UTC 2019 New revision: 349756 URL: https://svnweb.freebsd.org/changeset/base/349756 Log: MFC: r348452 Replace a single linked list with a hash table of lists. mountd.c uses a single linked list of "struct exportlist" structures, where there is one of these for each exported file system on the NFS server. This list gets long if there are a large number of file systems exported and the list must be searched for each line in the exports file(s) when SIGHUP causes the exports file(s) to be reloaded. A simple benchmark that traverses SLIST() elements and compares two 32bit fields in the structure for equal (which is what the search is) appears to take a couple of nsec. So, for a server with 72000 exported file systems, this can take about 5sec during reload of the exports file(s). By replacing the single linked list with a hash table with a target of 10 elements per list, the time should be reduced to less than 1msec. Peter Errikson (who has a server with 72000+ exported file systems) ran a test program using 5 hashes to see how they worked. fnv_32_buf(fsid,..., 0) fnv_32_buf(fsid,..., FNV1_32_INIT) hash32_buf(fsid,..., 0) hash32_buf(fsid,..., HASHINIT) - plus simply using the low order bits of fsid.val[0]. The first three behaved about equally well, with the first one being slightly better than the others. It has an average variation of about 4.5% about the target list length and that is what this patch uses. Peter Errikson also tested this hash table version and found that the performance wasn't measurably improved by a larger hash table, so a load factor of 10 appears adequate. PR: 237860 Changes: _U stable/11/ stable/11/usr.sbin/mountd/mountd.c
A commit references this bug: Author: rmacklem Date: Fri Jul 5 22:36:32 UTC 2019 New revision: 349771 URL: https://svnweb.freebsd.org/changeset/base/349771 Log: MFC: r348590, r348591 Modify mountd so that it incrementally updates the kernel exports upon a reload. Without this patch, mountd would delete/load all exports from the exports file(s) when it receives a SIGHUP. This works fine for small exports file(s), but can take several seconds to do when there are large numbers (10000+) of exported file systems. Most of this time is spent doing the system calls that delete/export each of these file systems. When the "-S" option has been specified (the default these days), the nfsd threads are suspended for several seconds while the reload is done. This patch changes mountd so that it only does system calls for file systems where the exports have been changed/added/deleted as compared to the exports done for the previous load/reload of the exports file(s). Basically, when SIGHUP is posted to mountd, it saves the exportlist structures from the previous load and creates a new set of structures from the current exports file(s). Then it compares the current with the previous and only does system calls for cases that have been changed/added/deleted. The nfsd threads do not need to be suspended until the comparison step is being done. This results in a suspension period of milliseconds for a server with 10000+ exported file systems. There is some code using a LOGDEBUG() macro that allow runtime debugging output via syslog(LOG_DEBUG,...) that can be enabled by creating a file called /var/log/mountd.debug. This code is expected to be replaced with code that uses dtrace by cy@ in the near future, once issues w.r.t. dtrace in stable/12 have been resolved. The patch should not change the usage of the exports file(s), but improves the performance of reloading large exports file(s) where there are only a small number of changes done to the file(s). PR: 237860 Changes: _U stable/12/ stable/12/usr.sbin/mountd/mountd.c stable/12/usr.sbin/mountd/pathnames.h
A commit references this bug: Author: rmacklem Date: Fri Jul 5 22:48:32 UTC 2019 New revision: 349772 URL: https://svnweb.freebsd.org/changeset/base/349772 Log: MFC: r348590, r348591 Modify mountd so that it incrementally updates the kernel exports upon a reload. Without this patch, mountd would delete/load all exports from the exports file(s) when it receives a SIGHUP. This works fine for small exports file(s), but can take several seconds to do when there are large numbers (10000+) of exported file systems. Most of this time is spent doing the system calls that delete/export each of these file systems. When the "-S" option has been specified (the default these days), the nfsd threads are suspended for several seconds while the reload is done. This patch changes mountd so that it only does system calls for file systems where the exports have been changed/added/deleted as compared to the exports done for the previous load/reload of the exports file(s). Basically, when SIGHUP is posted to mountd, it saves the exportlist structures from the previous load and creates a new set of structures from the current exports file(s). Then it compares the current with the previous and only does system calls for cases that have been changed/added/deleted. The nfsd threads do not need to be suspended until the comparison step is being done. This results in a suspension period of milliseconds for a server with 10000+ exported file systems. There is some code using a LOGDEBUG() macro that allow runtime debugging output via syslog(LOG_DEBUG,...) that can be enabled by creating a file called /var/log/mountd.debug. This code is expected to be replaced with code that uses dtrace by cy@ in the near future, once issues w.r.t. dtrace in stable/12 have been resolved. The patch should not change the usage of the exports file(s), but improves the performance of reloading large exports file(s) where there are only a small number of changes done to the file(s). PR: 237860 Changes: _U stable/11/ stable/11/usr.sbin/mountd/mountd.c stable/11/usr.sbin/mountd/pathnames.h
Patches have been committed and MFC'd.