Bug 250770

Summary: AWS EC2 system freezes up possibly associated with NFS (EFS)
Product: Base System Reporter: Gunther Schadow <raj>
Component: kernAssignee: freebsd-virtualization (Nobody) <virtualization>
Status: Closed Not A Bug    
Severity: Affects Some People CC: cperciva, rmacklem
Priority: ---    
Version: 12.2-RELEASE   
Hardware: Any   
OS: Any   

Description Gunther Schadow 2020-11-01 09:40:17 UTC
I use light-weight small cloneable FreeBSD systems built from the basic AWS EC2 AMI, first with 12.0-RELEASE and just now re-built with 12.2-RELEASE version. The idea is to have a minimal system rootfs while the bulk of the system disk, the /usr tree, is mounted from EFS (NFSv4). The AMI out of the box for 12.0 required 8 GB system disk, and now for 12.2 it's 10 GB, if I can do it in 1 GB I have some significant cost savings in redundant EBS volumes while also an easier time to distribute application software updates to my worker and server farm. I have described the principle here: https://forums.freebsd.org/threads/amazon-aws-ec2-super-light-configuration.69084/

The problem is that no matter what I do, the system locks up after about 4 hours. And it's not a kernel panic, it's not associated with any error, and I cannot see it at all. Nothing on the console screenshot, nothing in the logs and nothing in the /var/log/messages upon force-stop & re-start. 

Moreover, AWS EC2 console thinks the system is up in its status check. But any attempt to do anything results in hangs. SSH connection hangs. telnet to port 22, no response at all (not even connection reset, etc.) just hangs. If I happen to be logged in on the system whatever shell command I am doing will just hang, unable to CTRL-C break out nor CTRL-Z to the background. Just hang.

So I have nothing to show you. But I can provide anyone who wants to see it with a setup and would appreciate if someone would help out here. 

Here is my /etc/fstab:

# Custom /etc/fstab for FreeBSD VM images
/dev/gpt/rootfs  /        ufs     rw      1       1
/dev/gpt/varfs   /var     ufs     rw      1       1
fdesc            /dev/fd  fdescfs rw      0       0
proc             /proc    procfs  rw      0       0
/usr             /.usr    nullfs  rw      0       0
fs-xxxxxx.efs.us-xxxxx-x.amazonaws.com:/ /usr nfs rw,nfsv4,minorversion=1,oneopenown,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,late,bg 0 0

It is freezing up within around 4 hours, even just idling around doing nothing, and it doesn't seem like heavy work activity would accelerate the lock-up situation. Also doesn't depend on which instance type, specifically I have tried t3.nano, t3.micro, c5.large, m5a.large, it happens on all of them about the same time.
Comment 1 Gunther Schadow 2020-11-01 10:04:40 UTC
Note, this is not like the ENA kernel panic nor the other AWS EC2 freeze on t3 bug (apparently fixed in 12.1-RELENG) nor does the presence or absence of the "intsmb0: Could not allocate I/O space" error during boot (open since 2018) make any difference. So this is not a duplicate of those bugs.

sysctl -a |...
kern.features.nfsd: 1
kern.features.nfscl: 1
vfs.nfs.downdelayinitial: 12
vfs.nfs.downdelayinterval: 30
vfs.nfs.defect: 0
vfs.nfs.iodmax: 20
vfs.nfs.iodmin: 0
vfs.nfs.iodmaxidle: 120
vfs.nfs.use_buf_pager: 1
vfs.nfs.fileid_maxwarnings: 10
vfs.nfs.diskless_rootpath:
vfs.nfs.diskless_valid: 0
vfs.nfs.nfs_ip_paranoia: 1
vfs.nfs.nfs_directio_allow_mmap: 1
vfs.nfs.nfs_keep_dirty_on_error: 0
vfs.nfs.nfs_directio_enable: 0
vfs.nfs.clean_pages_on_close: 1
vfs.nfs.commit_on_close: 0
vfs.nfs.prime_access_cache: 0
vfs.nfs.access_cache_timeout: 60
vfs.nfs.dssameconn: 0
vfs.nfs.ignore_eexist: 0
vfs.nfs.pnfsiothreads: -1
vfs.nfs.userhashsize: 100
vfs.nfs.debuglevel: 0
vfs.nfs.callback_addr:
vfs.nfs.realign_count: 0
vfs.nfs.realign_test: 0
vfs.nfs.pnfsmirror: 1
vfs.nfs.enable_uidtostring: 0
vfs.nfs.dsretries: 2
vfs.nfs.skip_wcc_data_onerr: 1
vfs.nfs.nfs3_jukebox_delay: 10
vfs.nfs.reconnects: 0
vfs.nfs.bufpackets: 4

searching for ways to do some debug logging, should I enable this?

debug.fail_point.status_nfscl_force_fileid_warning: off
debug.fail_point.nfscl_force_fileid_warning: off

How might I log NFS warnings? Current setup with out of the box unchanged /etc/syslog.conf

# $FreeBSD: releng/12.2/usr.sbin/syslogd/syslog.conf 338146 2018-08-21 17:01:47Z brd $
#
#       Spaces ARE valid field separators in this file. However,
#       other *nix-like systems still insist on using tabs as field
#       separators. If you are sharing this file between systems, you
#       may want to use only tabs as field separators here.
#       Consult the syslog.conf(5) manpage.
*.err;kern.warning;auth.notice;mail.crit                /dev/console
*.notice;authpriv.none;kern.debug;lpr.info;mail.crit;news.err   /var/log/messages
security.*                                      /var/log/security
auth.info;authpriv.info                         /var/log/auth.log
mail.info                                       /var/log/maillog
cron.*                                          /var/log/cron
!-devd
*.=debug                                        /var/log/debug.log
*.emerg                                         *
# uncomment this to log all writes to /dev/console to /var/log/console.log
# touch /var/log/console.log and chmod it to mode 600 before it will work
#console.info                                   /var/log/console.log
# uncomment this to enable logging of all log messages to /var/log/all.log
# touch /var/log/all.log and chmod it to mode 600 before it will work
#*.*                                            /var/log/all.log
# uncomment this to enable logging to a remote loghost named loghost
#*.*                                            @loghost
# uncomment these if you're running inn
# news.crit                                     /var/log/news/news.crit
# news.err                                      /var/log/news/news.err
# news.notice                                   /var/log/news/news.notice
# Uncomment this if you wish to see messages produced by devd
# !devd
# *.>=notice                                    /var/log/devd.log
!*
include                                         /etc/syslog.d
include                                         /usr/local/etc/syslog.d

in /var/log/messages you see the reboot without any issues reported:

Nov  1 04:00:13 freebsd ec2[777]: #############################################################
...
Nov  1 05:49:13 freebsd su[1602]: ec2-user to root on /dev/pts/5
Nov  1 08:56:09 freebsd syslogd: exiting on signal 15
Nov  1 08:59:01 freebsd syslogd: kernel boot file is /boot/kernel/kernel
Nov  1 08:59:01 freebsd kernel: ---<<BOOT>>---

not sure why syslogd is exiting on signal 15, that might have to do wit the lock-up, but more likely has to do with my force-stop action on the AWS console. Here is the earlier lock-up sequence:

Oct 31 23:58:12 freebsd su[888]: ec2-user to root on /dev/pts/2
Oct 31 23:58:23 freebsd su[891]: ec2-user to root on /dev/pts/1
Oct 31 23:58:43 freebsd fsck[861]: /dev/gpt/varfs: 550 files, 1650 used, 125234 free (106 frags, 15641 blocks, 0.1% fragmentation)
Nov  1 03:48:09 freebsd syslogd: exiting on signal 15
Nov  1 03:50:35 freebsd syslogd: kernel boot file is /boot/kernel/kernel
Nov  1 03:50:35 freebsd kernel: ---<<BOOT>>---

about 4 hours like I estimated, and the "syslogd: exiting on signal 15" is just too close to the next boot that I don't think it is a herald of the lock-up, but rather it happens when I do the force-stop on the AWS EC2 management dashboard. And this would prove that we are not losing log messages due to the lock-up. If the system had anything to say, it would have said it!
Comment 2 Gunther Schadow 2020-11-01 14:54:45 UTC
Before anyone spends time on this, I am still trying some options.

For now I have made 3 modification in my NFS mount options:

  1. timeo=150 down from 600 - that alone didn't help
  2. oneopenown - that alone didn't help
  3. minorversion=1 added, the manual isn't clear if it is a requirement for oneopenown or if it implicitly sets minorversion=1, but since I added that, I have not frozen up for almost 6 hours. That's new.

I also enabled crude NFS logging with 

  sysctl vfs.nfs.debuglevel=4

And this is what I see in the /var/log/messages:

Nov  1 08:59:23 freebsd su[926]: ec2-user to root on /dev/pts/1
Nov  1 10:28:20 freebsd su[1271]: ec2-user to root on /dev/pts/2
Nov  1 10:28:38 freebsd kernel: clnt call=0
Nov  1 10:28:43 freebsd syslogd: last message repeated 8 times
Nov  1 10:28:43 freebsd kernel: readrpc: aft doiods=5
Nov  1 10:28:43 freebsd kernel: clnt call=0
Nov  1 10:29:14 freebsd syslogd: last message repeated 1125 times
Nov  1 10:31:20 freebsd syslogd: last message repeated 63 times
Nov  1 10:36:05 freebsd syslogd: last message repeated 368 times
Nov  1 10:36:05 freebsd kernel: readrpc: aft doiods=5
Nov  1 10:36:05 freebsd kernel: clnt call=0
...
Nov  1 13:25:37 freebsd kernel: clnt call=0
Nov  1 13:25:37 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:25:37 freebsd kernel: clnt call=0
Nov  1 13:25:37 freebsd syslogd: last message repeated 6 times
Nov  1 13:25:37 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:25:37 freebsd kernel: clnt call=0
Nov  1 13:25:37 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:25:37 freebsd kernel: clnt call=0
Nov  1 13:25:37 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:25:37 freebsd kernel: clnt call=0
Nov  1 13:25:37 freebsd syslogd: last message repeated 6 times
Nov  1 13:25:37 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:25:37 freebsd kernel: clnt call=0
Nov  1 13:25:37 freebsd syslogd: last message repeated 6 times
Nov  1 13:25:37 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:25:37 freebsd kernel: clnt call=0
Nov  1 13:25:37 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:25:37 freebsd kernel: clnt call=0
Nov  1 13:25:56 freebsd syslogd: last message repeated 53 times
Nov  1 13:25:56 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:25:56 freebsd kernel: clnt call=0
Nov  1 13:26:27 freebsd syslogd: last message repeated 80 times
Nov  1 13:26:55 freebsd syslogd: last message repeated 204 times
Nov  1 13:26:55 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:26:55 freebsd kernel: clnt call=0
Nov  1 13:27:36 freebsd syslogd: last message repeated 58 times
Nov  1 13:29:26 freebsd syslogd: last message repeated 87 times
Nov  1 13:30:47 freebsd syslogd: last message repeated 78 times
Nov  1 13:30:47 freebsd kernel: readrpc: aft doiods=5
Nov  1 13:30:47 freebsd kernel: clnt call=0
Nov  1 13:31:21 freebsd syslogd: last message repeated 26 times
Nov  1 13:33:00 freebsd syslogd: last message repeated 58 times
Nov  1 13:43:21 freebsd syslogd: last message repeated 119 times
Nov  1 13:53:06 freebsd syslogd: last message repeated 277 times
Nov  1 14:03:00 freebsd syslogd: last message repeated 263 times
Nov  1 14:13:22 freebsd syslogd: last message repeated 216 times
Nov  1 14:23:06 freebsd syslogd: last message repeated 5110 times
Nov  1 14:33:00 freebsd syslogd: last message repeated 491 times
Nov  1 14:43:22 freebsd syslogd: last message repeated 120 times

nothing really meaningful to me. But I hope that if it does lock up, I will see something peculiar. 

On the other hand, now on c5.large with:
 1. minorversion=1
 2. sysctl vfs.nfs.debuglevel=4

I have not locked up again. If I can make it a full day, I will take back these 3 changes one by one, first the debug logging, then the minorversion=1, to see when the problem recurs.
Comment 3 Colin Percival freebsd_committer freebsd_triage 2020-11-01 17:04:02 UTC
My understanding is that EFS should have oneopenown and minorversion=1.  BTW on recent EC2 images I have an "efs" special automount target (https://www.daemonology.net/blog/2020-05-31-Some-new-FreeBSD-EC2-features.html) -- not sure if this is useful to you here.

Something you haven't mentioned: Is your EFS I/O quota ok?  Running out of I/O quota can make a system extremely slow (especially with a small filesystem which gets more I/O tokens slowly).
Comment 4 Colin Percival freebsd_committer freebsd_triage 2020-11-01 17:04:40 UTC
Adding rmacklem to CC since he's the NFS expert...
Comment 5 Gunther Schadow 2020-11-01 19:17:01 UTC
Thank you so much for the feedback. 

I am up over 10 hours now. So it looks like the spell has finally been broken!

So, it may well be that the minorversion=1 is necessary for the oneopenown to work. I suggest updating the man page. It currently reads:

             minorversion=<value>
                     Override the default of 0 for the minor version of the
                     NFS Version 4 protocol.  The only minor version currently
                     supported is 1.  This option is only meaningful when used
                     with the nfsv4 option.

             oneopenown
                     Make a minor version 1 of the NFS Version 4 protocol
                     mount use a single OpenOwner for all Opens.  This may be
                     useful for a server with a very low limit on OpenOwners,
                     such as AmazonEFS.  It can only be used with an NFSv4.1
                     mount.  It may not work correctly when Delegations are
                     being issued by a server, but note that the AmazonEFS
                     server does not issued delegations at this time.

             pnfs    Enable support for parallel NFS (pNFS) for minor version
                     1 of the NFS Version 4 protocol.  This option is only
                     meaningful when used with the minorversion option.

The clear statement of pnfs "This option is only meaningful when used with the minorversion option." would remove all ambiguity.

I do wonder also about the speed compared to a "native" Amazon Linux mount -t efs mount. I feel like it's slower.

My throughput allocation utilization is barely a blip, not an issue for me. 

I personally can't really use your efs automounter because I need to be in precise control of the mount /usr during boot. I wonder if more improvements can be gleaned from the mount_efs Linux code.

Anyway, it looks more and more like this is a no-bug and just somehow a lack of clear and conspiciously placed documentation and warning. This week, if nothing else fails, I will close this issue and I will update all the places where I had left some negative commments. I love FreeBSD after all. Thank you.
Comment 6 Colin Percival freebsd_committer freebsd_triage 2020-11-01 19:23:55 UTC
> I wonder if more improvements can be gleaned from the mount_efs Linux code.

My impression from the last time I looked at that code is that they're setting a lot of values explicitly, but those values are all the same as our defaults.  If you notice any setting changes which could improve FreeBSD's EFS performance please let me know though -- I'll be happy to adjust our automount defaults.

I've considered creating a "mount_efs" for FreeBSD but my impression was that for most users the automount option all they need -- you're clearly an exception to this.
Comment 7 Rick Macklem freebsd_committer freebsd_triage 2020-11-01 22:00:27 UTC
In your initial post you show:
 nfsv4,minorversion=1,oneopenown
so why was it hanging then?

However, you are correct that oneopenown
is needed for EFS and it only applies to minorversion
1 (or higher for FreeBSD13 if EFS supports it).

I suppose I thought that NFSv4.1 was obvious
shorthand for NFS version 4 minorversion 1,
but the man page can be updated to clarify that.

Btw, if you do "nfsstat -m" on the client,
you see what mount options are actually being
used. (The rsize, wsize is capped at
vfs.maxbcachebuf, which is 64K by default and
can be increased to MAXPHYS.)

Oh, and I find the best way to debug these things
is looking at a packet capture in wireshark.
You can do the capture via "tcpdump -s 0 -w out.pcap host <server>"
on the client and then load out.pcap into wireshark.
(Unlike tcpdump, wireshark knows NFS.)

Anyhow, sounds like you have it working now, rick
Comment 8 Gunther Schadow 2020-11-02 00:22:24 UTC
Yes, I apologize, I had one by one edited my options:

  1. timeo=150 down from 600 - that alone didn't help
  2. oneopenown - that alone didn't help
  3. minorversion=1 added, the manual isn't clear if it is a requirement for oneopenown or if it implicitly sets minorversion=1, but since I added that, I have not frozen up for almost 6 hours. That's new.

and I wrote this bug report when I was already at step 3, yet no time had passed yet for me to trust that now it would work.
Comment 9 Rick Macklem freebsd_committer freebsd_triage 2020-11-02 01:35:58 UTC
Cool. So is an update to the man page sufficient to
resolve this?
Comment 10 Gunther Schadow 2020-11-02 18:45:14 UTC
Confirmed, 33 hours of uptime without problems. Thank you for being a sounding board so I could solve my problem myself. It seems like you didn't do much other than reading this and giving some comments, but without you guys being there and giving me the challenge do a decent job in reporting my problem in detail and helping myself, I would have gone another year without figuring this out.
Comment 11 Rick Macklem freebsd_committer freebsd_triage 2020-11-02 23:52:50 UTC
Good. I will put improving the man page on my todo list.
Comment 12 commit-hook freebsd_committer freebsd_triage 2020-12-01 23:33:27 UTC
A commit references this bug:

Author: rmacklem
Date: Tue Dec  1 23:33:11 UTC 2020
New revision: 368268
URL: https://svnweb.freebsd.org/changeset/base/368268

Log:
  Improve man page for AmazonEFS mounts.

  PR#250770 was actually just a misunderstanding of what
  NFS mount options are needed for AmazonEFS mounts.
  This patch attempts to clarify the manpage to clarify this.

  This is a content change.

  PR:		250770
  Reviewed by:	bcr
  MFC after:	1 week
  Differential Revision:	https://reviews.freebsd.org/D27430

Changes:
  head/sbin/mount_nfs/mount_nfs.8
Comment 13 commit-hook freebsd_committer freebsd_triage 2020-12-08 22:37:50 UTC
A commit references this bug:

Author: rmacklem
Date: Tue Dec  8 22:37:30 UTC 2020
New revision: 368466
URL: https://svnweb.freebsd.org/changeset/base/368466

Log:
  MFC: r368268
  Improve man page for AmazonEFS mounts.

  PR#250770 was actually just a misunderstanding of what
  NFS mount options are needed for AmazonEFS mounts.
  This patch attempts to clarify the manpage to clarify this.

  This is a content change.

  PR:		250770

Changes:
_U  stable/12/
  stable/12/sbin/mount_nfs/mount_nfs.8