Bug 223786

Summary: Remounting a UFS filesystem read-only takes very long time
Product: Base System Reporter: Mikhail T. <mi>
Component: kernAssignee: freebsd-fs mailing list <fs>
Status: New ---    
Severity: Affects Only Me CC: cem, imp, kib, mckusick, mi
Priority: --- Keywords: regression
Version: 10.4-STABLE   
Hardware: Any   
OS: Any   

Description Mikhail T. 2017-11-21 17:08:16 UTC
I usually keep my / mounted read-only -- mostly for safety. When I need to change some aspect of configuration or add a port, I remount rw, do what needs doing, and then remount back to read-only mode.

This worked for me for many years, but lately I noticed, the remounting back to the ro-mode is taking longer than it used to. Many seconds, especially, when it was active shortly before the remount command.

Today, however, with the freshly rebuilt kernel (FreeBSD 10.4-STABLE #15 r325999M), it takes well over a minute! A minute, during which the machine largely freezes -- for example, making new logins impossible.

I understand the need to flush the unwritten to disk, when switching to ro, but just how much should that take in the scenario I just had:

  1048  11:56   mount -orw -u /
  1049  11:56   chmod u+s /opt/cyrus/bin/deliver
  1050  11:56   mount -oro -u /

? The second remount took well over minute and it seems wrong.

The filesystem is based on SSD and looks like:

  /dev/ada0s1a on / (ufs, NFS exported, local, read-only)

`tunefs -p /` reports:

tunefs: POSIX.1e ACLs: (-a)                                disabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       disabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  1048576
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: space to hold for metadata blocks: (-k)            0
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)
Comment 1 Kirk McKusick freebsd_committer 2017-11-22 18:01:51 UTC
This report is perhaps the third or fourth about the slow sync problem. Unfortunately we have not been able to reproduce the slowdown.

We believe it started with r230553.  Note that I carefully use the word 'started' and not 'caused', because I believe that the change in r230553 is required to avoid loss of user data.  But it is not clear what makes the sync run for such a long time.

No users who reported the slowdown have provided a useful insight into their configuration. What we need is a description of your filesystem. Ideal would be a dump image that when restored demonstrates the problem.
Comment 2 Mikhail T. 2017-11-22 18:41:47 UTC
I don’t know, what to say. I’ve been seeing this problem for a while on all of the machines I manage (that’s about 8).

Besides the tunefs output already included, I’m not sure, what to add. While the mount command is waiting, it is in the biowr-state.
Comment 3 Kirk McKusick freebsd_committer 2017-11-22 21:17:47 UTC
(In reply to Mikhail T. from comment #2)
Being in the biowr state is expected as the filesystem needs to sync out all the changes. The question is why there are so many of them. Randomly guessing, perhaps it decides that there are a large number of access times that need to be written. When upgrading to read-write, include the -o noatime option so there is no attempt to update access times.

It would also be useful to collect the number of I/O (particularly writes) that the downgrade command is doing (time -l if your shell does not otherwise privide the I/O statistics). You can also get an idea of the I/Os being done using iostat.
Comment 4 Warner Losh freebsd_committer 2017-11-22 21:20:26 UTC
I think that r230553 counts as a while ago: It was made Wed Jan 25 20:54:09 2012
Comment 5 Konstantin Belousov freebsd_committer 2017-11-22 21:42:46 UTC
You may try the patch from https://reviews.freebsd.org/D13197, but I doubt that it would change much for you.
Comment 6 Mikhail Teterin freebsd_committer 2018-08-26 18:03:19 UTC
Seeing this manifested pretty bad today - on a freshly-rebuilt world (r338291M). The other thing that changed is the RAM upgrade -- from 10GB to 40GB.

Could it be, the kernel is trying to flush ALL filesystems, rather than just the one being remounted? Because, unlike the / on UFS, other filesystems could be quite busy...

While waiting, I press Ctrl-T once in a while, and get:

root@narawntapu:/etc/mail (1022) mount -oro -u /
load: 0.93  cmd: mount 26321 [biowr] 230.01r 0.00u 9.01s 2% 2224k
load: 0.47  cmd: mount 26321 [biowr] 283.84r 0.00u 10.99s 1% 2224k
load: 0.57  cmd: mount 26321 [biowr] 318.95r 0.00u 12.25s 2% 2224k

The actual change to the /-filesystem is trivial.
Comment 7 Mikhail Teterin freebsd_committer 2018-08-26 18:52:43 UTC
(In reply to Kirk McKusick from comment #3)
> include the -o noatime option

No obvious difference:

root@narawntapu:/etc/mail (1024) mount -orw,noatime /
root@narawntapu:/etc/mail (1025) touch q
root@narawntapu:/etc/mail (1026) rm q
root@narawntapu:/etc/mail (1027) mount -oro -u /
load: 0.71  cmd: mount 27675 [biowr] 7.92r 0.00u 0.72s 4% 2224k
load: 0.52  cmd: mount 27675 [biowr] 54.83r 0.00u 2.64s 2% 2224k
load: 0.47  cmd: mount 27675 [biowr] 68.05r 0.00u 3.12s 2% 2224k
load: 0.58  cmd: mount 27675 [biowr] 121.83r 0.00u 5.08s 3% 2224k
load: 0.36  cmd: mount 27675 [biowr] 254.59r 0.00u 9.97s 2% 2224k

> time -l

root@narawntapu:/etc/mail (1034) /usr/bin/time -l mount -oro /
      330,18 real         0,00 user        13,01 sys
      2212  maximum resident set size
        22  average shared memory size
         7  average unshared data size
       121  average unshared stack size
       112  page reclaims
         0  page faults
         0  swaps
         0  block input operations
    344120  block output operations
         0  messages sent
         0  messages received
         0  signals received
    344178  voluntary context switches
      1014  involuntary context switches

The entire FS is:

Filesystem   1M-blocks  Used Avail Capacity  Mounted on
/dev/ada0s1a     22808 16268  4715    78%    /

*what* 344120 "block output operations"?!
Comment 8 Conrad Meyer freebsd_committer 2018-08-26 20:03:56 UTC
(In reply to Mikhail Teterin from comment #7)
Perhaps more interestingly,

> 344178  voluntary context switches

Almost every single one of those bios is written synchronously, it looks like.
Comment 9 Conrad Meyer freebsd_committer 2018-08-26 20:05:47 UTC
How many inodes does the filesystem have?  Does mount -oro,noatime -u (in addition to rw,noatime) help in any way?