Bug 254000

Summary: net/glusterfs: Directory entries corrupted
Product: Ports & Packages Reporter: Eirik Oeverby <ltning-freebsd>
Component: Individual Port(s)Assignee: freebsd-ports-bugs (Nobody) <ports-bugs>
Status: New ---    
Severity: Affects Some People CC: chris, daniel
Priority: --- Flags: bugzilla: maintainer-feedback? (daniel)
Version: Latest   
Hardware: Any   
OS: Any   

Description Eirik Oeverby 2021-03-03 22:44:02 UTC
When using glusterfs - both old 3.x version and new 8.x - we observe that while most (all?) files are correctly replicated, listing directories do not show all files.

Tested with older and current glusterfs on FreeBSD versions from 10.x via 11.x to (now) 12.2.

Example: Source filesystem has ~12k files across ~100 directories. We rsync this to a gluster volume mounted on /mnt.

Afterwards we run
  find /mnt -type f | wc -l
on two different nodes in the cluster. The numbers are not the same.

Then we do, on two nodes
  find /mnt -type f | sort > /mnt/$(hostname).lst

When comparing the two, the sets of files on the two seem different.
HOWEVER: When performing spot-checks, we DO find allegedly missing files in the filesystem, but ONLY when referring to it directly. An 'ls' in the directory that contains a "missing" file will not show the file, but a 'file <filename>' or 'ls -la <file>' or any other operation that directly accesses the file works fine.

Again, we've seen this every time we've tested glusterfs, causing us to abort our attempts. Now that we have a "fresh" maintainer, I'm hoping perhaps this can be solved. I don't know if the problem is in gluster or fuse or elsewhere..
Comment 1 Eirik Oeverby 2021-03-03 22:49:02 UTC
/var/log/glusterfs/mnt.log has the following entries; presumably one for each directory where one or more file entries are missing:

[2021-03-03 22:48:00.165217] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 164414: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:00.271681] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 165663: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:00.360047] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 166731: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:00.462738] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 167974: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:00.551223] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 169088: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:00.741055] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 171454: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:00.828434] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 172534: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:00.918418] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 173662: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:01.169557] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 176589: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:01.266421] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 177692: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:01.522581] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 180580: READDIR => -1 (Invalid argument)
[2021-03-03 22:48:01.611598] W [fuse-bridge.c:3589:fuse_readdir_cbk] 0-glusterfs-fuse: 181700: READDIR => -1 (Invalid argument)
Comment 2 Eirik Oeverby 2021-03-03 22:49:30 UTC
(In reply to Eirik Oeverby from comment #1)
(This is a result of a simple 'find .|wc -l' in the mount point)
Comment 3 Daniel Morante 2021-03-05 04:12:08 UTC
Thanks you for testing and reporting this. I never noticed/ran into this issue myself with my limited testing, but it does sound terrible.

I will try to duplicate this issue using my Vagrant test setup (https://github.com/tuaris/Vagrant_GlusterFS) on a new branch in that repo this weekend.

This way I can go to the upstream devs with more information and (hopefully) have someone there to take a closer look.