|Summary:||Unresponsive NFS mount on AWS EFS|
|Product:||Base System||Reporter:||Alex Dupre <ale>|
|Component:||kern||Assignee:||freebsd-bugs (Nobody) <bugs>|
|Severity:||Affects Only Me||CC:||cperciva, emaste, rmacklem|
Description Alex Dupre 2021-11-24 09:08:04 UTC
I'm experiencing annoying issues with an AWS EFS mountpoint on FreeBSD 13 EC2 instances. The filesystem is mounted by 3 instances (2 with the same access patterns, 1 with a different one) Initially I had the /etc/fstab entry configured with: `rw,nosuid,noatime,bg,nfsv4,minorversion=1,rsize=1048576,wsize=1048576,timeo=600,oneopenown` and this after a few days led my java application to have all threads blocked on never returning `stat64` kernel calls, without the ability to even kill -9 the process. After digging it up it seems the normal behavior for hard mount points, even if I fail to understand why one should prefer to have the system completely freezed when the NFS mount point is not responding. So I later changed the configuration with: `rw,nosuid,noatime,bg,nfsv4,minorversion=1,intr,soft,retrans=2,rsize=1048576,wsize=1048576,timeo=600,oneopenown` by adding `intr,soft,retrans=2`. Btw, I think there is a typo in mount_nfs(8), it says to set `retrycnt` instead of `retrans` for the `soft` option, can you confirm? After the change `nfsstat -m` reports: `nfsv4,minorversion=1,oneopenown,tcp,resvport,soft,intr,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2` I wonder why it seems that the timeo,rsize,wsize have been ignored, but this is irrelevant to the issue. After a few days the application on the two similar EC2 instances stopped working again, though. Any command accessing the mounted efs filesystem didn't complete in reasonable time (ls, df, umount, etc.), but I could kill the processes. The only way to recover the situation was to reboot the instances, though. On one of them I've seen the following kernel messages, but they have been generated only when I tried to debug the issue hours later, and only on one EC2 instance, so I'm not sure if they are relevant or helpful: ``` kernel: newnfs: server 'fs-xxx.efs.us-east-1.amazonaws.com' error: fileid changed. fsid 0:0: expected fileid 0x4d2369b89a58a920, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE) kernel: nfs server fs-xxx.efs.us-east-1.amazonaws.com:/: not responding ``` The third EC2 instance survived and was still able to access the filesystem, but I think it wasn't accessing the filesystem when there have been the network/nfs issue that affected the two others.
Comment 1 Rick Macklem 2021-11-24 15:44:42 UTC
Assorted comments: - FreeBSD 13.0 shipped with a bug in the TCP stack which could result in a missed socket receive upcall. --> This normally causes problems for the server, but could cause problems for the client side as well. When you do a "netstat -a" on the hung system, the TCP connection for the mount is shown as ESTABLISHED with Recv-Q non-zero. --> The fix is tp upgrade to stable/13. There is more info on this in PR#254590. - AWS uses a small, fixed number of open_owners, which is why "oneopenown" is needed. If it also uses a small, fixed number of lock_owners (for byte range locking instead of Windows Open locks), then you could run out of these. --> If so, you are SOL (Shit Outa Luck) and my only suggestion would be to try an nfsv3 mount with the "nolockd" mount option. "netstat -E -c" should show you how many lock_owners are allocated at the time of the hang. Other than that, if AWS has now added support for delegations, there could be assorted breakage. Not running the nfscbd(8) daemon should avoid isuuing of delegations, if that is the case. If a soft mount fails a syscall, then the session slot is screwed up and this makes the mount fail in weird ways. "umount -N <mnt_path>" is a much better way to deal with hung mounts. As a starting point, posting the output of: ps axHl procstat -kk netstat -a nfsstat -E -c on the client when hung will give us more information. Good luck with it, rick
Comment 2 Colin Percival 2021-11-24 16:46:17 UTC
No ideas about the hanging, unless it's networking related, but... > expected fileid 0x4d2369b89a58a920, got 0x2 This looks suspiciously like ENOENT is getting turned into a fileid value somewhere.
Comment 3 Rick Macklem 2021-11-24 22:45:10 UTC
Or the server has followed the *unix* tradition of using fileno == 2 for the root directory of a file system, but did not indicate that a server file system boundary was crossed by changing the FSID (file system ID), which is an attribute that defines which server file system is which. I have no idea if Amazon's EFSI uses multiple file systems?
Comment 4 Alex Dupre 2021-11-25 10:15:20 UTC
Thanks for the debug suggestions, I'll run those commands next time it happens and report here. For the records, I'm not running any additional NFS daemon and haven't anything NFS-specific in rc.conf, it's just a plain mount, and it's not a heavily accessed file system either. I see you recommend to use `hard` mounts. I tried to use the `soft` mounts to avoid the infinite hanging and inability to kill, hoping that would help in recovering, but from what I've understood now even the hard mount point should recover when the NFS server comes back, so the problem was really a different one and is affecting both mount types. Any idea why a few arguments seems to have been ignored? Does it make sense to set higher rsize/wsize on tcp endpoints? I see that the recent efs automounter doesn't use any of them, so probably it is not worth.
Comment 5 Rick Macklem 2021-11-25 16:41:04 UTC
Mount options are "negotiated" with the NFS server and other tunables in the system. For example, to increase rsize/wsize to 128K, you must set vfs.maxbcachebuf=131072 in /boot/loader.conf. To increase rsize/wsize to 1Mbyte, you must set vfs,maxbcachebuf=1048576 in /boot/loader.conf and set kern.ipc.maxsockbuf=4737024 (or larger) in /etc/sysctl.conf. --> This assumes you have at least 4Gbytes of ram on the system. The further you move away from defaults, the less widely tested your configuration is. Also, in the case rszie/wsize, the system will use the largest size that is "negotiable" given other tuning. The use of the rsize/wsize options is mainly to reduce the size below the maximum negotiable. --> From my limited testing, sizes above 256K do not perform better, but what works best for EFS? I have no idea. If a server restarts, clients should recover. If a client is hung like you describe, either due to an unresponsive server, a broken server (that generates bogus replies or no replies to certain RPCs) or a client bug: # umount -N <mnt_path> is your best bet at getting rid of the mount.
Comment 6 Rick Macklem 2021-11-25 17:13:59 UTC
Oh, and my comment 2.r.t. don't run the nfscbd may not be good advice. Here's a more complete answer: The nfscbd(8) provides callback handling when it is running. Callbacks need to be working for the server to issue delegations or layouts (the latter is pNFS only). When the nfscbd(8) is not running, the server should work fine and never issue delegations or layouts. It should set a flag in the reply to the Sequence operation (the first one in each compound RPC) called SEQ4_STATUS_CB_PATH_DOWN. This is an FYI for the client. I found another round of bugs related to delegations during a recent IETF NFSv4 testing event. These are fixed in stable/13, but not 13.0. --> As such, delegations are problematic and you don't want them being issued. --> Don't run the nfscbd(8) daemon. Unfortunately Amazon does not attend these testing events, so what their server does is ??? for me. However, if it is known that the Amazon EFS never issues delegations or layouts (I believe cpercival@ said that was the case three years ago), then the server might be broken and get "perterbed" by the callbacks not working. --> In this case, you should run the nfscbd daemon by setting nfscbd_enable="YES" in your /etc/rc.conf or start it manually. And, given the above, I think you can see why my initial advice was just "don't run it".
Comment 7 Colin Percival 2021-11-25 18:24:04 UTC
FWIW, EFS is still documented as "does not support... Client delegation or callbacks of any type".