Created attachment 219934 [details]
tar file with all configs and command outputs of server, client and jail
I ran into an really nasty problem after introducing FreeBSD to a really big company.
All machines are running FreeBSD 12.1 Release and use 10GBit/s ethernet links with Cisco Nexus 55xx switches. Additional 1GBit/s networks for admin- und security related issues exists.
I have this problem on all my NFS client machine running FreeBSD.
NFS Server (FreeBSD 12.1) exports a ZFS dataset with ca. 1,5 TB.
Linux clients mounting this dataset with nfsv3 TCP mounts have no problems.
FreeBSD hosts mount this dataset with nfs3 TCP.
FreeBSD client mounts share. No problems.
FreeBSD client mounts share and relay it using nullfs mounts in VNET (if_bridge) Jails.
FreeBSD client mounts share twice. One mountpoint is located in his primary file system tree and the other one is inside the jail's sub tree.
Accessing the share from inside the jail hangs the NFS mount after only a few minutes of usage.
In Scenario 3 only the mount inside the jails directory sub tree is blocked. The other one is working. Nevertheless df & friends on the main host is blocked too because they touch the Jails subtree o9n their traversal down.
Same in Scenario 2. There the only NFS mount is blocked.
There are no error messages at this point of time. No console text about stale NFS links etc. Nothing in the Jail's console file.
umount of the mount blocks too but the blocking mount line vanished in the mount(8) output.
The only way to unfreeze is to stop the jail in which sub tree the NULLFS or NFS mount is located in.
Stopping the jail immediately frees the hanging df and ls etc. processes.
On the console I can read after umount -f / jail stop:
newnfs: server '192.168.67.38' error: fileid changed. fsid 0:0: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
I read the comments about this error message in /usr/src/sys/fs/nfsclient/nfs_clport.c. They are not matching this setup. There is no broken middleware... only cisco switches, a HP Proliant Gen9 FreeBSD NFS Server and a few Oracle Sunfire-something x86_64 FreeBSD clients.
The NFS Server is running FreeBSD 12.1 and the network is as simple as a 10 GBit/s network can be.
Playing with enforce_statfs seems to have no effect.
What's going wrong here?
I use the jib script from /usr/src/share/exampes/jails after adding two lines to add MTU support to handle the Jumbo 10 Gig network.
Firewall configurations are enabled in the kernel. To understand the problem here all is disabled or in pass through configurations.
There are no other visible problems.
I add this files as attachment:
-rw-r--r-- 1 arne ego 21833 Nov 24 18:24 NFS-Client-ADM001
-rw------- 1 arne ego 3254 Nov 24 00:07 NFS-Client-devfs.rules
-rw-r----- 1 arne ego 17288 Nov 24 18:57 NFS-Client-df
-rw-r--r-- 1 arne ego 19051 Nov 24 18:24 NFS-Client-dmesg.boot
-rw-r--r-- 1 arne ego 636 Nov 24 14:41 NFS-Client-fstab
-rw-r----- 1 arne ego 25965 Nov 24 18:24 NFS-Client-ifconfig-a
-rw-r--r-- 1 arne ego 3600 Nov 24 18:33 NFS-Client-jail.conf
-rw-r--r-- 1 arne ego 1239 Nov 23 23:31 NFS-Client-loader.conf
-rw-r----- 1 arne ego 522 Nov 24 18:58 NFS-Client-nfsstat-m
-rw-r----- 1 arne ego 76519 Nov 24 03:23 NFS-Client-pf.conf
-rw-r--r-- 1 arne ego 25053 Nov 24 18:21 NFS-Client-rc.conf
-rw-r--r-- 1 arne ego 1084 Nov 23 23:33 NFS-Client-sysctl.conf
-rw-r--r-- 1 arne ego 21767 Nov 24 19:05 Fileserver-FS001
-rw-r--r-- 1 arne ego 27395 Nov 24 19:06 Fileserver-dmesg.boot
-rw-r----- 1 arne ego 14781 Nov 24 19:03 Fileserver-ifconfig-a
-rw-r--r-- 1 arne ego 1020 Nov 16 21:40 Fileserver-loader.conf
-rw-r--r-- 1 arne ego 14356 Nov 24 19:04 Fileserver-rc.conf
-rw-r--r-- 1 arne ego 1106 Nov 16 21:38 Fileserver-sysctl.conf
-rw-r----- 1 arne ego 6398 Nov 24 18:14 Fileserver-zfs_properties
-rw-r----- 1 arne ego 1671 Nov 24 19:01 inside-jail-ifconfig-a
-r-------- 1 arne ego 9 Nov 24 00:22 inside-jail-pf.conf
-rw-r--r-- 1 arne ego 636 Nov 24 19:00 inside-jail-rc.conf
-rw-r--r-- 1 arne ego 709 Nov 24 00:17 inside-jail-sysctl.conf
Thanks in advance,
It seems like the mount -> nullfs -> export setup is producing corrupt NFS RPC responses (i.e., "BROKEN NFS SERVER").
Only the fileserver is exporting. Do you mean the jails on the fileserver?
The nfs clients I have trouble with are all not exporting anything.
Ah few new findings:
binding portmapper, nfsd, mountd, et.al. to one ip address on server and client changed nothing. This is no surprise, I'm in the try&error phase.
Switching to NFSv4 makes no difference but gives a more detailed error message on the console every time I stop the jail associated with the nullfs mount.
newnfs: server '192.168.67.38' error: fileid changed. fsid deaf3afe:f75a86de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR MODDLEWARE)
This message is only on the console. There is nothing in dmesg output or log files.
Can someone enlighten me how to decode the error message? I got the exact same numbers twice now so I'm curious.
Thanks in advance,
The FreeBSD based ZFS NFS server seems to be not the problem.
I get now exact the same problem with another share mounted from a Linux server.
First, switch to 12.2. There were fixes for some NFS client issues that might be
relevant (could be not).
I am sure that NFS client is not VNET-aware, i.e. it does not switch context
to the proper VNET as needed. Esp. when offloading async io to nfsiod daemons.
In other words, I do not expect the scenario 3 to work.
For the second scenario, nullfs remount of NFS mount, VNET jail should be
irrelevant. When the hang occurs, gather the procstat -kk -a output (on the
host). Also it might worth a try to '-o nocache' for nullfs mounts, I remember there were some problems with nullfs+nfs+caching.
I am not a jails guy, so I don't know if that was a factor.
W.r.t "..BROKEN MIDDLEWARE.."
It refers to a case where something between the NFS server
and client tries to cache Getattr replies, but doesn't get
Try testing where the NFS client connects to the server
without anything like a cisco switch in between. Just a wire
or a really dumb no frills switch.
Also, try an NFSv4 mount, since the NFSv4 RPCs are much
harder to cache and send replies.
If the problem persists, add another comment to this bug
report and we can try some other stuff.
(The error is generated because a file's fileno has changed
and that should never happen.)
My guess is that having the two mounts for the same file
system has somehow triggered the generation of a bogus cached
(In reply to Rick Macklem from comment #6)
as said before:
Using NFSv4 changes nothing :-(
I'm unable to change the wiring because the complete setup is in a computer center 600km from here but I have access to all lights out service processors, the switches, etc.
I have the same problem now with a Linux NFS server so I think the problem is not on the server side.
To make the picture clearer:
two main administrative servers: adm001 and adm002
Both have a couple of VNET jails (if_bridge, not netgraph) running nameserver etc.
And both have so called login server jails. Just a jail where the people can ssh too and jump (ssh) inside the development networks.
Two fileserver, one Linux one FreeBSD share the $HOMEs and a pool share.
These are mounted at adm001 and amd002 to /l/home and /l/pool.
NULLFS mounts forward these in the two login jails:
/l/home --> /l/prison/login1/l/home and
/l/pool --> /l/prison/login1/l/pool.
Same procedure on adm002 with login002.
After 30 seconds to 60 minutes one of the two NFS mounts freezes.
df, ls, umount etc. hang.
And now the strange part:
"sh /etc/rc.d/jail stop login1"
immediately unfreezes the NFS mount and everything is working normal again but, of course, without the jail.
In the second the Jail is dead the "BROKEN NFS..." text is displayed only on the console, not in dmesg buffer or log files and, this is strange, only on adm001.
adm001 and adm002 are different hardware (other CPU) but there are clones, real clones made with zfs send/receive. Of course the hostids are different.
The workaround is to allow the users to use adm001 and adm002 directly. But this is not the way we want to have it.
When I don't use the NULLFS and make two independent NFS mounts to /l/home and /l/prison/login1/l/home then only the second freezes.
I guess this is not normal and a real bug...
Thanks in advance
Ok, I took a quick look and I don't think
an NFS mount within a vnet jail is going to work.
When the kernel RPC does a socreate() it passes
the argument cred as curthread->cred.
If this is done by an nfsiod thread, the credential
won't be for the correct vnet.
This can be fixed by adding cred arguments to assorted
functions so that the credential used at mount time can
be passed in, but the fix is not trivial.
I've never used a vnet jail, so there might be other
Sorry, but the short answer is that you will need to
figure out a way to do what you are doing without an
NFS mount in a vnet jail, I think.
I'll take this bug, but don't expect a quick fix.
(In reply to Rick Macklem from comment #8)
Yes, we know that NFS does not work from inside a jail.
This is the reason using NULLFS is best practice for years now to give a jail (vnet or not) access to a NFS mount point.
The real NFS mountpoint is outside the jail.
And nullfs should cover this. Or?
I have no idea what the freeze triggers...
I simulate heavy (!) load from inside the jail and it works up to ca. one hour and tons of gigabyte without problems...
Really, the answer can't be to use a fuse'ed NTFS filesystem to share mountpoints between two FreeBSD server...
Well, I took a closer look at the krpc code and does
do TCP reconnects with the mount time credentials.
Although I don't test it, I do know some people
use NFS mounts within jails, but I don't know
if anyone does an NFS mount within a vnet prison?
just to say that we are using NFS (V4) mounts inside (non-VNET) jails for years and it works like a charm, we never had an issue:
for example on one of our production server:
orval% mount -t nfs
filer.prod.lan:/pictures/collections on /usr/jails/www1/filer/pictures/collection (nfs, read-only, nfsv4acls)
filer.prod.lan:/webapps on /usr/jails/www1/filer/webapps (nfs, nfsv4acls)
filer.prod.lan:/documents on /usr/jails/www1/filer/documents (nfs, read-only, nfsv4acls)
filer.prod.lan:/geoserver on /usr/jails/java1/filer/geoserver (nfs, nfsv4acls)
filer.prod.lan:/pictures/collections on /usr/jails/j_www1/filer/pictures/collection (nfs, read-only, nfsv4acls)
filer.prod.lan:/webapps on /usr/jails/j_www1/filer/webapps (nfs, nfsv4acls)
filer.prod.lan:/documents on /usr/jails/j_www1/filer/documents (nfs, read-only, nfsv4acls)
filer.prod.lan:/apache on /usr/jails/j_www1/filer/apache (nfs, nfsv4acls)
filer.prod.lan:/pypi on /usr/jails/j_www1/filer/devpi (nfs, nfsv4acls)
filer.prod.lan:/webapps/phegea on /usr/jails/j_www1/filer/webapps/phegea (nfs, read-only, nfsv4acls)
filer.prod.lan:/pictures/collections on /usr/jails/j_www1_rb1/filer/pictures/collection (nfs, read-only, nfsv4acls)
filer.prod.lan:/webapps on /usr/jails/j_www1_rb1/filer/webapps (nfs, nfsv4acls)
filer.prod.lan:/ipt on /usr/jails/j_ipt1/filer/ipt (nfs, nfsv4acls)
filer.prod.lan:/geoserver on /usr/jails/geoserver1/filer/geoserver (nfs, nfsv4acls)
(BTW I remember that I was wondering some years ago what would be the best way of mounting several NFS shares in a jail: one mount one the HOST + NULLFS or multiple mounts of the same share in different jails, as the latter was the best)
guys I have the same problem which I didn't have back at 11.0 kernel for both client and server.
I never had any problems with insecure shares on both sides.
But I use full kerberos protection (krb5i) where the whole of all RPC calls are passing through kerberos authentication.
Again, zero problems for both client and server at 11.0 kernel.
At 12.1 kernel client, I get this message again (I also had it back at version 9)
the "BROKEN NFS SERVER OR MIDDLEWARE" message refers to MIDDLEWARE for the RPC CALL. The RPC call is the MIDDLEWARE.
Now, kerberos must issue a new key for each of the mounts. A TGT will be produced at mount time and then all RPC calls are using KDC for key creation. Theoretically this slows down the process, but today most hardware is armed with CRYPTO hardware device on motherboard which is PERFECTLY used by the FreeBSD kernel having as a result a very fast crypto process making it almost transparent. Yes, there is a difference in speed but not that much to be considered about.
I DO BELIEVE it is about file attributes not commonly translated.
But honestly, I have not a clear answer.
I hope I helped on what "MIDDLEWARE" is, but guys, I face the same here.
Have a nice day.
Ok, let me try to explain what the "...BROKEN MIDDLEWARE OR.."
message means. There are certain file attributes, such as fileno
(think i-node#) that should *never change*.
When the NFS client receives file attributes where fileno for a
given file has changed, it knows something is "badly broken".
One cause of this was a middleware box (hardware/software that
sits between the NFS client and NFS server in the network
infrastructure) that could fail.
- This "middleware box" cached NFS requests/replies. If it saw
a request from the NFS client for attributes for the same file
it replied to the Getattr with cached attributes.
--> This reduced NFS server load, since the NFS server never saw
the Getattr RPC request.
Such a technology existed and would sometimes reply with bogus
attributes for a different file. What was this device called?
I have no idea. The guy who told me about this gave no details
w.r.t. vendor/product/... (I assumed he was under NDA and could
not disclose details beyond this broken device generating the above
Since it seems that the FreeBSD server is not broken in this regard
(I would see a lot more bug reports about this if it was), then
what else might cause this to happen? (ie. fileno mysteriously changes)
Here's some unlikely, but possible theories:
- Flakey memory in the NFS server that sometimes flips a bit
that happens to be used to store the "fileno" attribute.
- Flakey network interface transmit side that flips a bit before
calculating the network checksum, so that the network checksum
--> It would seem that most garbled network packets would be
caught by checksum failures, but checksums are not infallible.
You may be able to dream up more. Mostly within the network fabric
between the client<-->server.
Given how unlikely these latter possibilities are, you can see why
the known case of the "broken middleware box" gets mention in the
The problem I reported in my (this) bug report is, that NFS client and server is NOT WORKING inside a VNET/VIMAGE jail environment which is a real pain in my projects using FreeBSD/ZFS as secure storage heads.
(In reply to Arne Steinkamm from comment #14)
According to lsvfs NFS is not jail-friendly. I resorted to mount NFS shared on the jailhost into the jail dataset.
This only works with non-VNET-Jails.
I can build secure and jailed samba server but just NFS cannot be used as server OR client in a VNET jail.
To make the difference clear:
NFS client side works with traditional jails when the host makes the mount.
Yes, I am aware that NFS mounts do not work well
with VNET jails. Unless someone conversant with VNET
jails comes up with a patch, it won't be fixed anytime soon.
I think one problem (probably not the only one) happens
when the NFS client needs to re-connect to the server.
The code needs to do a soconnect() and assorted other
calls. These are only allowed if in the correct VNET jail,
but the NFS client code does not know what that is.
--> reconnect fails and mountpoint is dead.
However, I do not know how using an NFS mount with a
VNET jail will produce the "..BROKEN MIDDLEWARE.."