Bug 252989

Summary: net/glusterfs glusterfs mount not accessible if second node is down
Product: Ports & Packages Reporter: ekoort <eimar.koort>
Component: Individual Port(s)Assignee: freebsd-ports-bugs (Nobody) <ports-bugs>
Status: Closed FIXED    
Severity: Affects Only Me CC: daniel
Priority: --- Flags: daniel: maintainer-feedback+
Version: Latest   
Hardware: Any   
OS: Any   

Description ekoort 2021-01-25 10:17:27 UTC
Created simple gluster volume. 
One "node" is rasbperry pi 4 13.0-ALPHA1  (fpi4)
Second "node is virtualbox FreeBSD 12.2-RELEASE-p1 (fbsd)
Both have latest Glusterfs 8.0_2

/etc/fstab contains line(s) accordingly:

fpi4:/gluvol /mnt fusefs rw,late,backupvolfile-server=fbsd:/gluvol,mountprog=/usr/local/sbin/mount_glusterfs 0 0

fbsd:/gluvol /mnt fusefs rw,late,backupvolfile-server=fpi4:/gluvol,mountprog=/usr/local/sbin/mount_glusterfs 0 0

Now. If one of them goes like "poweroff", then glusterfs mount becomes unaccessible.

root@fpi4:~ # ls -la /mnt
total 5
drwxr-xr-x   3 root  wheel    4 Jan 25 11:13 .
drwxr-xr-x  21 root  wheel  512 Jan 25 09:11 ..
-rw-r--r--   1 root  wheel   34 Jan 25 11:14 test.txt

root@fbsd:~ # halt -p

root@fpi4:~ # gluster peer status
Number of Peers: 1

Hostname: fbsd
Uuid: 8b93554c-7389-40fd-9447-b622b5c9a444
State: Peer in Cluster (Disconnected)

root@fpi4:~ # ls -la /mnt
total 0
ls: /mnt: Socket is not connected


I would expect /mnt still to be accessible despite second node availabilty.
Comment 1 Daniel Morante 2021-01-26 20:16:33 UTC
I had run into this issue myself.  Try to use "backup-volfile-servers" instead of "backupvolfile-server" in your /etc/fstab.

For example, I have three Gluster nodes "sun, earth, moon" with a volume named "replicated".   Each one of those nodes have this in the fstab:

```
earth:replicated	/mnt/replicated	fusefs	rw,_netdev,backup-volfile-servers=sun:moon,mountprog=/usr/local/sbin/mount_glusterfs,late	0	0

moon:replicated	/mnt/replicated	fusefs	rw,_netdev,backup-volfile-servers=sun:earth,mountprog=/usr/local/sbin/mount_glusterfs,late	0	0

sun:replicated	/mnt/replicated	fusefs	rw,_netdev,backup-volfile-servers=earth:moon,mountprog=/usr/local/sbin/mount_glusterfs,late	0	0
```
Comment 2 Daniel Morante 2021-01-27 04:22:44 UTC
I did an additional test by adding a 4th host as a client only.  I added this to /etc/fstab

```
sun.gluster:replicated	/mnt/replicated	fusefs	rw,_netdev,backup-volfile-servers=earth.gluster:moon.gluster,mountprog=/usr/local/sbin/mount_glusterfs,late	0	0
```

And mounted it

```
daniel@pacyworld-pc1 ~> df -h /mnt/replicated/
Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/fuse      19G    275M     19G     1%    /mnt/replicated
```

I then rebooted the node named "sun" while writing to the location.  Gluster failed over properly and there was no interruption is accessibility.

```
[2021-01-27 04:17:39.572025 +0000] I [glusterfsd-mgmt.c:2642:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: sun.gluster
[2021-01-27 04:17:39.572062 +0000] I [glusterfsd-mgmt.c:2682:mgmt_rpc_notify] 0-glusterfsd-mgmt: connecting to next volfile server earth.gluster
[2021-01-27 04:17:42.592661 +0000] I [glusterfsd-mgmt.c:2171:mgmt_getspec_cbk] 0-glusterfs: Received list of available volfile servers: moon:24007 
```
Comment 3 ekoort 2021-01-27 04:37:38 UTC
Hello,
Do You have gluster and client(s) on separate hosts?
In my setup i have only 2 hosts both acting as gluster server and clients.

fstab on one host:
fpi4:/gluvol /mnt fusefs rw,late,backup-volfile-servers=fpi4:fbsd:/gluvol,mountprog=/usr/local/sbin/mount_glusterfs 0 0

And it still gives me "socket error". Note the "df" output.

root@fpi4:~ # mount |grep mnt
/dev/fuse on /mnt (fusefs)
root@fpi4:~ # df -h /mnt
Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/fuse       0B      0B      0B   100%    /mnt
root@fpi4:~ # ls -la /mnt
total 0
ls: /mnt: Socket is not connected
root@fpi4:~ # gluster peer status
Number of Peers: 1

Hostname: fbsd
Uuid: 8b93554c-7389-40fd-9447-b622b5c9a444
State: Peer in Cluster (Disconnected)

Second host is down. Not touching first node (fpi4) and powering on virtualbox.

oot@fpi4:~ # gluster peer status
Number of Peers: 1

Hostname: fbsd
Uuid: 8b93554c-7389-40fd-9447-b622b5c9a444
State: Peer in Cluster (Connected)

root@fpi4:~ # df -h /mnt
Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/fuse      13G    2.5G     11G    19%    /mnt

root@fpi4:~ # ls -la /mnt
total 5
drwxr-xr-x   3 root  wheel    4 Jan 25 11:13 .
drwxr-xr-x  21 root  wheel  512 Jan 27 04:27 ..
-rw-r--r--   1 root  wheel   34 Jan 25 11:14 test
Comment 4 ekoort 2021-01-27 04:45:48 UTC
I did re-read Your prevous comments.
Also played a bit with fstab line:

fpi4:/gluvol /mnt fusefs rw,late,backup-volfile-servers=fbsd:/gluvol,mountprog=/usr/local/sbin/mount_glusterfs 0 0

root@fpi4:~ # df /mnt
Filesystem 1K-blocks Used Avail Capacity  Mounted on
/dev/fuse          0    0     0   100%    /mnt

I'll try to make a test with linux today and see how it behaves in that environment.
Comment 5 ekoort 2021-01-27 09:12:00 UTC
it works as exptected with Ubuntu 18.04 & Glusterfs 7.5
Comment 6 Daniel Morante 2021-01-29 05:04:57 UTC
(In reply to eimar.koort from comment #5)

Does that mean it is automatically failing over using the fstab from your initial comment?

Could I ask you to try it with Gluster 9.0? I tried your version of fstab with that and it worked for me.  It's not in the ports tree yet, but the port can be found here: https://github.com/tuaris/freebsd-glusterfs/tree/master/net/glusterfs9.  It's possible that it's a bug that was introduced sometime after 7.5 and fixed in 9.0.
Comment 7 ekoort 2021-01-29 05:09:55 UTC
With linux  /mnt was still accessible from the running host while second host/node was powered off.
I'll try with glusterfs 9 on freebsd and let You know. 
Thanks!
Comment 8 ekoort 2021-01-29 08:27:54 UTC
glusterfs 9 works as expected. 
Nothing else changed, just compiled 9 and it works.
Thank You!
Comment 9 ekoort 2021-03-07 06:03:48 UTC
Hi, 
I think that this bug is solved and can be closed.
Thanks!