Bug 221743 - [patch] mountd at 100% CPU for 24+ hours - getmntinfo() inefficient with thousands of filesystems and snapshots
Summary: [patch] mountd at 100% CPU for 24+ hours - getmntinfo() inefficient with thou...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Conrad Meyer
Keywords: patch
Depends on:
Reported: 2017-08-23 15:45 UTC by Peter Eriksson
Modified: 2017-08-25 16:44 UTC (History)
2 users (show)

See Also:

Fixed getmntinfo.c.diff (2.02 KB, patch)
2017-08-23 15:45 UTC, Peter Eriksson
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Eriksson 2017-08-23 15:45:06 UTC
Created attachment 185694 [details]
Fixed getmntinfo.c.diff

I noticed that mountd on two of our file servers where running at 100% for over 24 hours. 

These systems are running FreeBSD 11.0 and are Dell 730xd servers with 256GB RAM and around 140TB of disk where there are around 16000 user filesystems (with around 20-40 hourly snapshots per filesystem).

A "truss" of one of them indicated it was busy in a loop calling getfsinfo(), munmap() and mmap() and slowly trying to loading more and more (one more per loop) filesystems into a dynamically allocated buffer. 

At the time I looked it was up at 280000 filesystems+snapshots out of the 360000 available ones (16000 filesystems, the rest snapshots).

Looking at the code for getmntinfo() in /usr/src/lib/libc/gen/getmntinfo.c I see that the code calls getfsinfo() and tries to load the list of filesystems - and if it sees that it could load more filesystems than expected, loops back and reretries with the buffer resized to fit one more filesystem.

The problem seems to be that at around 250000-300000 filesystems+snapshots the loop took so long that due to the 16000 new snapshots created every hour it never really catched up...

In the attached patch I've modified the getmntinfo() function to call getfsinfo() in the loop in order to get the new number of available filesystems - and also have a larger "extra" space - and just give up after 3 rounds in the loop and just return the list it has got at that time...

Btw we also noticed that the snapshots where only sometimes included in the list from getfsinfo() - but not always. It seems it must be accessed to show up in the list (ls -l in ".zfs/snapshot" triggers a "mount"), or like in our case - and rsync backup job).

I include a patch for a modified getmntinfo() function.
Comment 1 Conrad Meyer freebsd_committer 2017-08-23 16:21:00 UTC
Patch looks functionally fine to me.
Comment 2 Peter Eriksson 2017-08-23 16:24:50 UTC
Would have been nice if the getfsinfo() call could have take an option to only return "exported" filesystems, or a way to exclude snapshots ("look for the MNT_IGNORE" flag perhaps?)
Comment 3 Fabian Keil 2017-08-24 07:39:58 UTC
I like the patch in general but shouldn't mntbuf be freed if realloc() fails?
Comment 4 Conrad Meyer freebsd_committer 2017-08-24 14:33:52 UTC
I'm working on it.

(In reply to Fabian Keil from comment #3)
Yes.  I've adapted the patch to just use reallocf() instead.
Comment 5 commit-hook freebsd_committer 2017-08-25 16:38:50 UTC
A commit references this bug:

Author: cem
Date: Fri Aug 25 16:38:22 UTC 2017
New revision: 322895
URL: https://svnweb.freebsd.org/changeset/base/322895

  getmntinfo(3): Scale faster, and return sooner

  getmntinfo(3) is designed around a relatively static or slow growing set of
  current mounts.  It tried to detect a race with somewhat concurrent mount
  and re-call getfsstat(2) in that case, looping indefinitely.  It also
  allocated space for a single extra mount as slop.

  In the case where the user has a large number of mounts and is adding them
  at a rapid pace, it fell over.

  This patch makes two functional changes:

  1. Allocate even more slop.  Double whatever the last getfsstat(2) returned.

  2. Abort and return some known results after looping a few times
     (arbitrarily, 3).  If the list is constantly changing, we can't guarantee
     we return a full result to the user at any point anyways.

  While here, add very basic functional tests for getmntinfo(3) to the libc

  PR:		221743
  Submitted by:	Peter Eriksson <peter AT ifm.liu.se> (earlier version)
  Sponsored by:	Dell EMC Isilon

Comment 6 Conrad Meyer freebsd_committer 2017-08-25 16:44:30 UTC