Bug 251906

Summary: autofs: automounter always only adds to /var/db/mounttab but never clears it
Product: Base System Reporter: Martin Birgmeier <d8zNeCFG>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: In Progress ---    
Severity: Affects Only Me CC: emaste, lwhsu, rew, trasz
Priority: ---    
Version: 12.2-RELEASE   
Hardware: Any   
OS: Any   
URL: https://reviews.freebsd.org/D27801
Attachments:
Description Flags
attempt 2: use f_mntfromname for umount
none
create a common unmount routine for automount
none
use rpc.umntall(8) after succesful unmount(2) none

Description Martin Birgmeier 2020-12-16 20:49:23 UTC
Scenario:
- FreeBSD 12.2-RELEASE-p2 #4 r368500M
- autofs on /net using a self-developed /etc/autofs/special_hosts3 (similar to the regular special_hosts)

Result:
- On every auto mount, an entry is added to /var/db/mounttab
- On no auto unmount any entry is deleted from /var/db/mounttab
- When forcibly timing out all mounts and none remain, /var/db/mounttab is still full of (even duplicated) entries:

[1]# df -t nfs ; automount -u ; echo after ; df -t nfs ; cat /var/db/mounttab
Filesystem    1K-blocks   Used    Avail Capacity  Mounted on
gandalf:/z/ss  65015520 629996 64385524     1%    /net/gandalf/z/ss
after
1608103434      hal     /z/SRC/FreeBSD/releng/12.2
1608117698      hal     /z/release/FreeBSD-ports/amd64/packages-12
1608117947      hal     /z/VOL/FreeBSD-ports
1608145583      hal     /z/SRC/FreeBSD-ports/head
1608145583      hal     /z/VOL/FreeBSD-ports
1608145584      hal     /z/SRC/FreeBSD/releng/12.2
1608145643      hal     /z/VOL/FreeBSD-ports
1608145685      hal     /z/VOL/ftp
1608145715      hal     /z/VOL/FreeBSD-ports
1608145720      hal     /z/SRC/FreeBSD/releng/12.2
1608148304      hal     /z/release/FreeBSD-ports/amd64/packages-12
1608148333      hal     /z/VOL/FreeBSD-ports
1608148364      gandalf /z/SRC/src.local
1608148531      gandalf /z/SRC/src.local
1608148859      gandalf /z/SRC/src.local
1608151096      gandalf /z/SRC/src.local
1608151236      gandalf /z/ss
[0]# \rm /var/db/mounttab
[0]# 

Expected result:
- The autounmountd should remove one matching line from /var/db/mounttab whenever it times out a mount

Note:
- See also bug #251395.

-- Martin
Comment 1 Martin Birgmeier 2020-12-28 13:11:42 UTC
It seems there is also a similar issue with /var/db/mountdtab: On the NFS server, entries only accumulate in that file but are never cleared.

-- Martin
Comment 2 Robert Wing freebsd_committer 2020-12-28 20:08:47 UTC
(In reply to Martin Birgmeier from comment #1)

Hey Martin,

This patch should also fix that.

With this patch, `automount -u` will try to use umount(8) when unmounting filesystems.

umount(8) does some additional work such as notifying the mountd server that a NFS mount has been unmounted, which will remove the /var/db/mountdtab entry on the mountd server. If the notification to the mountd server is successful, the
/var/db/mounttab entry will also be removed.

-Rob
Comment 3 Martin Birgmeier 2020-12-29 09:28:06 UTC
Hi Robert,

Thank you for your quick reaction.

Will this be merged to releng/12.2 eventually, or only to stable/12? And do you know when?

Or should I merge myself (to releng/12.2) and report the results?

Best regards,

Martin
Comment 4 Robert Wing freebsd_committer 2020-12-30 06:08:59 UTC
(In reply to Martin Birgmeier from comment #3)

Only to stable/12.

Maybe 2-3 weeks into CURRENT and then merged into stable/12 a week after that.
Comment 5 Martin Birgmeier 2020-12-30 19:51:55 UTC
I have merged the patch in D27801 to releng/12.2 (together with D27832 for bug #224601). This now results in the following:

[0]# df -t nfs ; automount -u ; echo after ; df -t nfs ; wc /var/db/mounttab && sort -u +1 /var/db/mounttab
Filesystem                          1K-blocks    Used     Avail Capacity  Mounted on
hal:/z/SRC/FreeBSD/base/releng/12.2 807739509 2810359 804929150     0%    /net/hal/z/SRC/FreeBSD/base/releng/12.2
umount: 04ff003a3a: statfs: No such file or directory
umount: 04ff003a3a: unknown file system
automount: "umount \M-p\M-]\M^?\M^?\M^?\^?04ff003a3a", pid 1101, terminated with exit status 1
after
       1       3      47 /var/db/mounttab
1609357704      hal     /z/SRC/FreeBSD/base/releng/12.2
[0]# 

-- Martin
Comment 6 Robert Wing freebsd_committer 2020-12-30 20:16:06 UTC
Thanks for testing this.

Looks like an issue with how I'm building up the fsid, I'll try to reproduce and see what the hang up is..

Would it be possible to get the output of `mount -v` while the automount filesystem is mounted? Mostly interested in the fsid field.
Comment 7 Martin Birgmeier 2020-12-30 20:25:16 UTC
Hmmm... I already reverted the change (it's all on zfs and I rolled back).

Here is the output on the system running without the two changes mentioned in comment #5:

[0]# mount -v
/dev/ufs/disk908a on / (ufs, NFS exported, local, soft-updates, writes: sync 20 async 80, reads: sync 747 async 5, fsid 2cfcd657fffe02fc)
devfs on /dev (devfs, fsid 00ff007171000000)
/dev/ufs/disk908d on /usr (ufs, NFS exported, local, soft-updates, writes: sync 2 async 162, reads: sync 1141 async 25, fsid 3afcd65706f40533)
/dev/md0 on /tmp (ufs, local, soft-updates, writes: sync 2 async 14, reads: sync 9 async 0, fsid 68e1ec5f11845b81)
procfs on /proc (procfs, local, fsid 01ff000202000000)
fdescfs on /dev/fd (fdescfs, fsid 02ff005959000000)
map -hosts3 on /net (autofs, fsid 03ff00cfcf000000)
hal:/z/SRC/FreeBSD/base/releng/12.2 on /net/hal/z/SRC/FreeBSD/base/releng/12.2 (nfs, nosuid, automounted, fsid 04ff003a3a000000)
[0]# 

-- Maritn
Comment 8 Robert Wing freebsd_committer 2020-12-30 20:55:43 UTC
Created attachment 221111 [details]
attempt 2: use f_mntfromname for umount

This uses f_mntfromname instead, let me know if you get around to testing it.
Comment 9 Martin Birgmeier 2021-01-04 15:12:22 UTC
I applied the patch to releng/12.2 and tested it. It seems to be working. Both /var/db/mounttab on the client and /var/db/mountdtab on the server are upated as expected.

Thank you for your efforts.

-- Martin
Comment 10 Martin Birgmeier 2021-01-04 15:13:23 UTC
One more thing - how are parallel mounts and unmounts handled with respect to updating /var/db/mounttab?

-- Martin
Comment 11 Martin Birgmeier 2021-01-05 12:21:46 UTC
I installed the patch on all my machines now.

Something is still amiss because very often, even though everything has been unmounted, /var/db/mounttab and the corresponding entries on the server in /var/db/mountdtab are not fully cleared.

Such a situation can be worked around by again automounting the still-existing entries and the auto-unmounting them again. In most cases, this will clear /var/db/mounttab (and the corresponding entries on the server in /var/db/mountdtab).

Maybe there is another path whereby the unmount of an auto-mounted directory can take place?

-- Martin
Comment 12 Martin Birgmeier 2021-01-05 15:38:31 UTC
It seems that a similar patch needs to be done in usr.sbin/autofs/autounmoountd.c in order to catch the case where filesystems are unmounted after not being used for some time.

-- Martin
Comment 13 Martin Birgmeier 2021-01-05 16:03:06 UTC
It seems that there are further problems with mounttab handling... I am using chroot to change into different environments, and in each environment the same automount structure is being used:

#
# $FreeBSD: releng/12.2/usr.sbin/autofs/auto_master 337749 2018-08-14 13:52:08Z trasz $
#
# Automounter master map, see auto_master(5) for details.
#
/net            -hosts3         -nobrowse,nosuid,intr
# When using the -media special map, make sure to edit devd.conf(5)
# to move the call to "automount -c" out of the comments section.
#/media         -media          -nosuid,noatime,autoro
#/-             -noauto

/z/netboot/920/net   -hosts3 -nobrowse,nosuid,intr

/z/netboot/921/net   -hosts3 -nobrowse,nosuid,intr

(special_hosts3 is an improved version of special_hosts.)

This means that one server directory can well be mounted into multiple locations on the client.

But in both mounttab and mountdtab, only the host path is added, so that when unmounting on the client it is not clear how many mounts of the same server directory are still active.

It seems that the handling of /var/db/mounttab and /var/db/mountdtab need be thoroughly re-worked, including the client path (or in the case of mountdtab maybe just a count), and including making it race-free when multiple programs try to modify these files.

-- Martin
Comment 14 Martin Birgmeier 2021-01-06 11:22:33 UTC
Next issue: It seems that if the automount is loopback (instead of NFS, achieved by using "-fstype=nullfs" in the automount map), unmounting always first fails using the following lines but then succeeds anyway:

umount: unmount of /z/NCVS/cvs.local failed: Device busy
automount: "umount /z/NCVS/cvs.local", pid 65867, terminated with exit status 1

-- Martin
Comment 15 Robert Wing freebsd_committer 2021-01-08 01:26:10 UTC
Created attachment 221372 [details]
create a common unmount routine for automount

This patch goes back to using FSID's. The problem with my original patch using FSID's was that I was calling `strlen()` on an uninitialized character array which was producing undefined results. I was able to reproduce the problem.

This patch also creates a common unmount routine that will try to unmount using umount(8) and if that fails will fallback to using unmount(3).

Comment #14 was likely from an error when using `f_mntfromname` and should be fixed with this patch by using the FSID instead.

Comment #13 is a separate problem from this one. I was able to reproduce that issue as well though - I'll look into it.

Thanks for the reporting and testing Martin, much appreciated!

-Rob
Comment 16 Martin Birgmeier 2021-01-08 14:10:47 UTC
I have installed the new patch on my machines. First tests seem to indicate it is not working correctly. Specifically, /var/db/mounttab is not cleared, and neither /var/db/mountdtab on the NFS server.

-- Martin
Comment 17 Martin Birgmeier 2021-01-08 14:12:52 UTC
... it seems the issue is with what I wrote in comment 12 - the patch also needs to be done in autounmountd.c.

-- Martin
Comment 18 Martin Birgmeier 2021-01-08 14:17:07 UTC
Sorry, I see it is, but it is not working for timed-out unmounts, only when using automount -u.

Something is still broken for autounmountd.

-- Martin
Comment 19 Martin Birgmeier 2021-01-08 15:26:47 UTC
I have recompiled again, this time cleaning out the autofs obj directory entirely (I am recompiling using "make -DNO_CLEAN buildworld").

It seems to be working now... maybe there is a dependency issue in the Makefile for autofs?

-- Martin
Comment 20 Martin Birgmeier 2021-01-08 15:39:18 UTC
Another problem remaining is that for busy NFS mounts, /var/log/messages is now spammed with error output from umount(8).

-- Martin
Comment 21 Martin Birgmeier 2021-01-08 16:45:42 UTC
Maybe instead of spawning umount(8) it would be better to use the routines from usr.sbin/rpc.umntall/mounttab.c together with unmount(2), similar to what umount(8) is doing.

-- Martin
Comment 22 Robert Wing freebsd_committer 2021-01-08 17:23:33 UTC
(In reply to Martin Birgmeier from comment #21)

Darn, I'm really coming up short on this one so far..

You may be right, trying to bring umount(8) into the scene is proving to be fraught with errors. I'll look into putting the mounttab and moundtab handling directing in automount.

Out of curiosity, what were the spam errors in /var/log/messages? If they're gone, don't worry about - just curious.

-Rob
Comment 23 Martin Birgmeier 2021-01-08 17:25:59 UTC
Like this for example:

Jan  8 16:59:02 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 68544, terminated with exit status 1
Jan  8 16:59:02 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 16:59:02 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 68553, terminated with exit status 1
Jan  8 16:59:02 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 16:59:02 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 68556, terminated with exit status 1
Jan  8 16:59:02 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 16:59:32 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 77196, terminated with exit status 1
Jan  8 16:59:32 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 16:59:32 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 77246, terminated with exit status 1
Jan  8 16:59:32 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 16:59:34 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 77654, terminated with exit status 1
Jan  8 16:59:34 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 17:00:04 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 86766, terminated with exit status 1
Jan  8 17:00:04 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 17:00:04 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 86777, terminated with exit status 1
Jan  8 17:00:04 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 17:00:04 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 86783, terminated with exit status 1
Jan  8 17:00:04 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 17:00:34 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 1512, terminated with exit status 1
Jan  8 17:00:34 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 17:00:34 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 1521, terminated with exit status 1
Jan  8 17:00:34 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 17:00:47 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 5179, terminated with exit status 1
Jan  8 17:00:47 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy
Jan  8 17:00:59 mizar autounmountd[46891]: "umount 17ff003a3a000000", pid 10264, terminated with exit status 1
Jan  8 17:00:59 mizar autounmountd[46891]: cannot unmount /net/hal/z/SRC/FreeBSD/ports/head (FSID:973143831:58): Device busy

The same can be seen on the console when manually using "automount -u".

-- Martin
Comment 24 Martin Birgmeier 2021-01-08 17:27:42 UTC
Maybe the stuff from usr.sbin/rpc.umntall/mounttab.[ch] should be put in a library which is then being used by mount(8), umount(8), autofs(8), rpc.umntall, ...

-- Martin
Comment 25 Robert Wing freebsd_committer 2021-01-08 17:46:57 UTC
It looks like unmount(3) is failing there too because the error is only logged after umount(8) and unmount(3) fails.
Comment 26 Martin Birgmeier 2021-01-08 18:05:59 UTC
Yes, but this is expected because these directories/mount points are in use on the client and so cannot be unmounted.

This is a normal use case for an auto(un)mounter.

-- Martin
Comment 27 Robert Wing freebsd_committer 2021-01-08 21:09:30 UTC
(In reply to Martin Birgmeier from comment #26)

Those are expected results, got it. At first, I thought those were unexpected.

Even unpatched `automount -u` will report 'Device Busy' errors.

An unpatched `autounmountd` doesn't log 'Device Busy' errors, unless `autounmountd` was called with '-d' or '-v' - in which case, you'll see the 'Device Busy' errors. My patch did change the behavior of `autounmountd` to log all errors (including 'Device Busy' errors), it appears that was a mistake.

Other than the logging issues, it sounds like the patch works as expected? 

If the patch is working, I'll be more inclined on getting the log messages dialed in. The error messages being spammed in /var/log/messages are all logged from the automount code.

I agree, it would be handy to have a library to share some of the code between these programs.
Comment 28 Martin Birgmeier 2021-01-09 19:35:00 UTC
Indeed, the patch is working.

Thank you for your efforts!

-- Martin
Comment 29 Robert Wing freebsd_committer 2021-02-05 17:18:47 UTC
(In reply to Martin Birgmeier from comment #28)

Hey Martin,

I spoke too soon about this patch making it into base. The consensus is that my approach to fixing the problem with umount(8) is flawed.

Maybe someone else might come up with a version of this patch that will be acceptable.

Your testing and detailed bug reports are highly appreciated, thanks for your help. 

Sorry for not being able to follow through on this.

-Rob
Comment 30 Martin Birgmeier 2021-02-05 17:32:54 UTC
Thank you. I guess what I've written in comment #24 would need to be done.

Is there any concrete time plan when an improved solution could be available?

-- Martin
Comment 31 Robert Wing freebsd_committer 2021-02-05 18:09:11 UTC
That I do not know.

I’ve cc’ed Edward (@trasz), in on this (author/maintainer of automount)- he might be able to shed some light on that..
Comment 32 Robert Wing freebsd_committer 2021-03-03 00:27:33 UTC
Created attachment 222930 [details]
use rpc.umntall(8) after succesful unmount(2)

Hey Martin,

Here's another patch, if you get around to trying it out - let me know the results.

Thanks,
Rob
Comment 33 Martin Birgmeier 2021-03-10 19:32:17 UTC
Hello Robert,

Please excuse me - I currently do not have the time to test this (but I am still happily running your previous patches :-)). I have briefly looked at the patch and assume it will be working. It appears like quite a sledgehammer method, first because it seems to nondiscriminatorely unmount everything, and second because it still uses an exec (popen).

I believe it would be better to build a small library for dealing with the interaction of NFS mounts and maintaining the mounttab (and probably also the mountdtab on the server) file and call that from both the automounter and rpc.umntall etc. That library should also take care of possibly simultaneous accesses to these files and properly lock/unlock them so that they cannot get corrupted while they are being updated/in use. Finally, this library should also take care of counting any possible mounts - one and the same NFS client might (and in my case, does) mount the same export twice in different places, and this must be properly handled in both mounttab and mountdtab.

Best regards,

Martin
Comment 34 Robert Wing freebsd_committer 2021-03-12 15:25:54 UTC
(In reply to Martin Birgmeier from comment #33)

No worries.

Keep in mind that rpc.umntall(8) doesn't unmount anything - it only notifies the NFS server of an unmounted NFS file system.

Couple other points:

- popen(rpc.umntall) is only called after a successful unmount(2)
- rpc.umntall -k only notifies the NFS server of an unmounted NFS file system
  when the NFS entry is found in the mounttab and is no longer mounted.
Comment 35 commit-hook freebsd_committer 2021-03-12 15:48:57 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=88e531f38c2412bf030f4e8dd563efc45b70797e

commit 88e531f38c2412bf030f4e8dd563efc45b70797e
Author:     Robert Wing <rew@FreeBSD.org>
AuthorDate: 2021-02-17 07:51:38 +0000
Commit:     Robert Wing <rew@FreeBSD.org>
CommitDate: 2021-03-12 15:41:55 +0000

    autofs: best effort to maintain mounttab and mountdtab

    When an automounted filesystem is successfully unmounted, call
    rpc.umntall(8) with the -k flag.

    rpc.umntall(8) is used to clean up /var/db/mounttab on the client and
    /var/db/mountdtab on the server. This is only useful for NFSv3.

    PR:     251906
    Reviewed by: trasz
    Differential Revision:  https://reviews.freebsd.org/D27801

 usr.sbin/autofs/automount.c    |  2 ++
 usr.sbin/autofs/autounmountd.c |  3 ++-
 usr.sbin/autofs/common.c       | 13 +++++++++++++
 usr.sbin/autofs/common.h       |  1 +
 4 files changed, 18 insertions(+), 1 deletion(-)