Bug 257187

Summary: NFSv3 server creates symlinks that local clients can't read [ZFS]
Product: Base System Reporter: Garrett Wollman <wollman>
Component: kernAssignee: freebsd-fs (Nobody) <fs>
Status: New ---    
Severity: Affects Only Me CC: asomers
Priority: ---    
Version: 12.2-RELEASE   
Hardware: Any   
OS: Any   

Description Garrett Wollman freebsd_committer 2021-07-14 16:20:16 UTC
Our backup vendor alerted us to this issue.  Most of our NFS clients are Ubuntu systems, and use NFSv3.  Somehow, one of our users managed to create a bunch of symbolic links for which lstat(2) succeeds, but readlink(2) fails with [ENOENT].  The links work just fine on the Linux clients over NFS.

Example ktrace on the server side:

  7597 readlink CALL  fstatat(AT_FDCWD,0x7fffffffed99,0x7fffffffe9d0,0x200<AT_SYMLINK_NOFOLLOW>)
  7597 readlink NAMI  "/export/sanchez/results/pls/victory/scc/2021/04-ugf/symlinks/RC/ugf_cc_scc-road_central"
  7597 readlink STRU  struct stat {dev=16608010843616580276, ino=14916386, mode=0120777, nlink=1, uid=23398, gid=21158, rdev=18446744073709551615, atime=1619701157.648845720, mtime=1619701157.648845720, ctime=1619701157.648845720, birthtime=1619701157.648845720, size=298, blksize=4096, blocks=1, flags=0x800 }
  7597 readlink RET   fstatat 0
  7597 readlink CALL  readlink(0x7fffffffed99,0x7fffffffdff4,0x3ff)
  7597 readlink NAMI  "/export/sanchez/results/pls/victory/scc/2021/04-ugf/symlinks/RC/ugf_cc_scc-road_central"
  7597 readlink RET   readlink -1 errno 2 No such file or directory

Results of Linux stat(1) utility on the client side:

$ stat /data/sanchez/results/pls/victory/scc/2021/04-ugf/symlinks/RC/ugf_cc_scc-road_central
  File: /data/sanchez/results/pls/victory/scc/2021/04-ugf/symlinks/RC/ugf_cc_scc-road_central -> ../../41-cc-clean/2021_04_25-01-r5326-ugf_scc-1_4_16_64cores-34cqCap-200gvtP-OoO-t_2-prio_Timestamp-issWidth_4-bfrs_16-l1KB_32-l2PerCoreKB_256-l2Latency_9-llcPerCoreKB_1024-llcLatency_12-memGBps_192-freq_3500-lnkByts_24-subnts_4-serializeHints_True-tbPolicy_Dequeue-clea.../ugf_cc_scc-road_central/
  Size: 298             Blocks: 27         IO Block: 131072 symbolic link
Device: 36h/54d Inode: 14916386    Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (23398/ victory)   Gid: (21158/sanchez-grp)
Access: 2021-07-14 12:06:14.024176419 -0400
Modify: 2021-04-29 08:59:17.648845720 -0400
Change: 2021-04-29 08:59:17.648845720 -0400
 Birth: -

This seems like a bug in either the ZPL or the NFS server.  I can't even see how local lstat() and NFSPROC_READLINK could succeed and local readlink() still fail.
Comment 1 Alan Somers freebsd_committer 2021-07-14 23:10:44 UTC
Firstly, does that symlink cross a mountpoint?  It may be that the target exists as seen from the NFS client, but not on the server.

Assuming that isn't the problem, dtrace is your friend.  Try something like this:

sudo dtrace -i 'sdt:zfs::set-error /arg0 == 2/ {stack();}' -c "readlink /foo/bar/baz"

If the error is coming from ZFS, you should find out where.  But if it isn't coming from ZFS, then run this instead:

sudo dtrace -i 'fbt:kernel::return /arg1 == 2 && pid == $target/ {trace(".");}' -c "readlink /foo/bar/baz"

That command will have plenty of false positives, so don't print the stack traces just yet.  If you spy a function that looks likely, then suitably modify your D script and prints its stack trace.
Comment 2 Garrett Wollman freebsd_committer 2021-07-15 00:03:46 UTC
(In reply to Alan Somers from comment #1)
I get two hits on sdt:zfs::set-error, which I think are actually the same:

dtrace: description 'sdt:zfs::set-error ' matched 1 probe
dtrace: pid 87135 exited with status 1
CPU     ID                    FUNCTION:NAME
 14  47740                   none:set-error 

 14  47740                   none:set-error 

Looks like the only place this can happen is:

                switch (data_op) {
                case SA_LOOKUP:
                        if (bulk[i].sa_addr == NULL)
                                return (SET_ERROR(ENOENT));

which I guess means that zfs_freebsd_readlink() tried to read the ZPL_SYMLINK attribute and it wasn't found -- but that still doesn't explain how it is that readlink() succeeds over NFS and not locally.  (A symbolic link is just uninterpreted text as far as readlink() is concerned, it shouldn't matter what its value is!)

This indicates a second bug: that [ENOENT] should get transformed into a more appropriate error on exit to indicate an internal filesystem inconsistency.
Comment 3 Alan Somers freebsd_committer 2021-07-15 00:34:53 UTC
I would suggest using dtrace to trace both the local and NFS paths.  They might be looking up the link differently.  For example, in zfs_readlink you can see there are two paths, based on zp->z_is_sa.

That raises another question, what is the value of the xattr property for your filesystem?  And are you using the builtin ZFS in FreeBSD 12.2, or the openzfs kmod?
Comment 4 Garrett Wollman freebsd_committer 2021-07-15 01:05:31 UTC
(In reply to Alan Somers from comment #3)
Stock 12.2, xattr is off.