Bug 223671 - net/glusterfs: glusterfs volume status not showing online
Summary: net/glusterfs: glusterfs volume status not showing online
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-ports-bugs (Nobody)
URL:
Keywords: needs-patch, needs-qa
Depends on:
Blocks:
 
Reported: 2017-11-14 20:25 UTC by markhamb
Modified: 2020-07-29 20:37 UTC (History)
9 users (show)

See Also:
linimon: maintainer-feedback? (craig001)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description markhamb 2017-11-14 20:25:21 UTC
After I setup a simple glusterfs volume (replica 2, gluster shows all bricks offline, but I can mount the volume and it appears to work (read/write files).

# gluster volume status
Status of volume: apple
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gtest4:/groot/apple                   N/A       N/A        N       N/A
Brick gtest3:/groot/apple                   N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        N       967
Self-heal Daemon on gtest4                  N/A       N/A        N       970

Task Status of Volume apple
------------------------------------------------------------------------------
There are no active volume tasks

I don't see anything particularly obvious in the logs.

------------------
How to Reproduce:
------------------

 # pkg install glusterfs
 # zfs create -omountpoint=/groot zroot/groot
 # service glusterd onestart
 # gluster peer probe gtest4
 # gluster volume create apple replica 2 gtest4:/groot/apple
gtest3:/groot/apple
 # gluster volume start apple
 # gluster volume status
Status of volume: apple
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gtest4:/groot/apple                   N/A       N/A        N       N/A
Brick gtest3:/groot/apple                   N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        N       967
Self-heal Daemon on gtest4                  N/A       N/A        N       970

Task Status of Volume apple
------------------------------------------------------------------------------
There are no active volume tasks


--------------------
LOGS:
--------------------

----- glusterd.log -----
[2017-11-08 22:29:09.132427] I [MSGID: 100030] [glusterfsd.c:2476:main]
0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
version 3.11.1 (args: /usr/local/sbin/glusterd
--pid-file=/var/run/glusterd.pid)
[2017-11-08 22:29:09.138608] I [MSGID: 106478] [glusterd.c:1422:init]
0-management: Maximum allowed open file descriptors set to 65536
[2017-11-08 22:29:09.138676] I [MSGID: 106479] [glusterd.c:1469:init]
0-management: Using /var/db/glusterd as working directory
[2017-11-08 22:29:09.143727] E [rpc-transport.c:283:rpc_transport_load]
0-rpc-transport: Cannot open
"/usr/local/lib/glusterfs/3.11.1/rpc-transport/rdma.so"
[2017-11-08 22:29:09.143758] W [rpc-transport.c:287:rpc_transport_load]
0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not
valid or not found on this machine
[2017-11-08 22:29:09.143777] W [rpcsvc.c:1660:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2017-11-08 22:29:09.143796] E [MSGID: 106243] [glusterd.c:1693:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2017-11-08 22:29:09.147373] E [MSGID: 101032]
[store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
/var/db/glusterd/glusterd.info. [No such file or directory]
[2017-11-08 22:29:09.147414] E [MSGID: 101032]
[store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
/var/db/glusterd/glusterd.info. [No such file or directory]
[2017-11-08 22:29:09.147416] I [MSGID: 106514]
[glusterd-store.c:2215:glusterd_restore_op_version] 0-management:
Detected new install. Setting op-version to maximum : 31100
[2017-11-08 22:29:09.147477] E [MSGID: 101032]
[store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
/var/db/glusterd/options. [No such file or directory]
[2017-11-08 22:29:09.168706] I [MSGID: 106194]
[glusterd-store.c:3772:glusterd_store_retrieve_missed_snaps_list]
0-management: No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.socket.listen-backlog 128
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/db/glusterd
 15: end-volume
 16:
+------------------------------------------------------------------------------+
[2017-11-08 22:29:41.245073] I [MSGID: 106487]
[glusterd-handler.c:1242:__glusterd_handle_cli_probe] 0-glusterd:
Received CLI probe req gtest4 24007
[2017-11-08 22:29:41.269548] I [MSGID: 106129]
[glusterd-handler.c:3623:glusterd_probe_begin] 0-glusterd: Unable to
find peerinfo for host: gtest4 (24007)
[2017-11-08 22:29:41.310764] W [MSGID: 106062]
[glusterd-handler.c:3399:glusterd_transport_inet_options_build]
0-glusterd: Failed to get tcp-user-timeout
[2017-11-08 22:29:41.310813] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2017-11-08 22:29:41.310890] W [MSGID: 101002]
[options.c:954:xl_opt_validate] 0-management: option 'address-family' is
deprecated, preferred is 'transport.address-family', continuing with
correction
[2017-11-08 22:29:41.317314] I [MSGID: 106498]
[glusterd-handler.c:3549:glusterd_friend_add] 0-management: connect
returned 0
[2017-11-08 22:29:41.318182] E [MSGID: 101032]
[store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
/var/db/glusterd/glusterd.info. [No such file or directory]
[2017-11-08 22:29:41.318258] I [MSGID: 106477]
[glusterd.c:190:glusterd_uuid_generate_save] 0-management: generated
UUID: cee1794a-5e83-415e-88e0-eae018f9e745
[2017-11-08 22:29:41.426904] I [MSGID: 106511]
[glusterd-rpc-ops.c:261:__glusterd_probe_cbk] 0-management: Received
probe resp from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3, host: gtest4
[2017-11-08 22:29:41.426950] I [MSGID: 106511]
[glusterd-rpc-ops.c:421:__glusterd_probe_cbk] 0-glusterd: Received resp
to probe req
[2017-11-08 22:29:41.455508] I [MSGID: 106493]
[glusterd-rpc-ops.c:485:__glusterd_friend_add_cbk] 0-glusterd: Received
ACC from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3, host: gtest4, port: 0
[2017-11-08 22:29:41.496977] I [MSGID: 106163]
[glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 31100
[2017-11-08 22:29:41.521751] I [MSGID: 106490]
[glusterd-handler.c:2890:__glusterd_handle_probe_query] 0-glusterd:
Received probe from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3
[2017-11-08 22:29:41.525628] I [MSGID: 106493]
[glusterd-handler.c:2953:__glusterd_handle_probe_query] 0-glusterd:
Responded to gtest4.ssimicro.com, op_ret: 0, op_errno: 0, ret: 0
[2017-11-08 22:29:41.538527] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3
[2017-11-08 22:29:41.557101] I [MSGID: 106493]
[glusterd-handler.c:3799:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to gtest4 (0), ret: 0, op_ret: 0
[2017-11-08 22:29:41.574502] I [MSGID: 106492]
[glusterd-handler.c:2717:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3
[2017-11-08 22:29:41.574567] I [MSGID: 106502]
[glusterd-handler.c:2762:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2017-11-08 22:29:41.574634] I [MSGID: 106493]
[glusterd-rpc-ops.c:700:__glusterd_friend_update_cbk] 0-management:
Received ACC from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3
[2017-11-08 22:30:47.866210] W [MSGID: 101095]
[xlator.c:162:xlator_volopt_dynload] 0-xlator: Cannot open
"/usr/local/lib/glusterfs/3.11.1/xlator/nfs/server.so"
[2017-11-08 22:30:55.267120] I [MSGID: 106143]
[glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick
/groot/apple on port 49152
[2017-11-08 22:30:55.267548] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2017-11-08 22:30:55.346373] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-snapd: setting
frame-timeout to 600
[2017-11-08 22:30:55.346761] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-nfs: setting frame-timeout
to 600
[2017-11-08 22:30:55.346875] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already
stopped
[2017-11-08 22:30:55.346924] I [MSGID: 106568]
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: nfs service is
stopped
[2017-11-08 22:30:55.346968] I [MSGID: 106600]
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management:
nfs/server.so xlator is not installed
[2017-11-08 22:30:55.347136] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-glustershd: setting
frame-timeout to 600
[2017-11-08 22:30:55.347911] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd
already stopped
[2017-11-08 22:30:55.347950] I [MSGID: 106568]
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: glustershd
service is stopped
[2017-11-08 22:30:55.348012] I [MSGID: 106567]
[glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting
glustershd service
[2017-11-08 22:30:56.365516] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-quotad: setting
frame-timeout to 600
[2017-11-08 22:30:56.365837] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-bitd: setting frame-timeout
to 600
[2017-11-08 22:30:56.366019] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already
stopped
[2017-11-08 22:30:56.366071] I [MSGID: 106568]
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: bitd service
is stopped
[2017-11-08 22:30:56.366235] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-scrub: setting
frame-timeout to 600
[2017-11-08 22:30:56.366372] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already
stopped
[2017-11-08 22:30:56.366409] I [MSGID: 106568]
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service
is stopped
[2017-11-08 22:30:57.707989] E [run.c:190:runner_log] (-->0x803ecad43
<notify+0xb1de3> at
/usr/local/lib/glusterfs/3.11.1/xlator/mgmt/glusterd.so -->0x8008a6504
<runner_log+0x104> at /usr/local/lib/libglusterfs.so.0 ) 0-management:
Failed to execute script:
/var/db/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=apple
--first=yes --version=1 --volume-op=start --gd-workdir=/var/db/glusterd
[2017-11-08 22:30:57.722114] E [run.c:190:runner_log] (-->0x803ecad43
<notify+0xb1de3> at
/usr/local/lib/glusterfs/3.11.1/xlator/mgmt/glusterd.so -->0x8008a6504
<runner_log+0x104> at /usr/local/lib/libglusterfs.so.0 ) 0-management:
Failed to execute script:
/var/db/glusterd/hooks/1/start/post/S30samba-start.sh --volname=apple
--first=yes --version=1 --volume-op=start --gd-workdir=/var/db/glusterd
[2017-11-08 22:31:01.537697] I [MSGID: 106499]
[glusterd-handler.c:4302:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume apple
[2017-11-08 22:31:05.749436] I [MSGID: 106499]
[glusterd-handler.c:4302:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume apple
[2017-11-08 22:31:55.783261] I [MSGID: 106499]
[glusterd-handler.c:4302:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume apple
[2017-11-08 22:34:59.660782] I [MSGID: 106487]
[glusterd-handler.c:1484:__glusterd_handle_cli_list_friends] 0-glusterd:
Received cli list req
Comment 1 craig001 2017-11-21 23:14:21 UTC
Hello Folks

Thanks for bringing this to my attention, will look into it and report back shortly.  There are a couple of issues with GlusterFS that need upstreaming and getting more hands on to correct.
Hopefully have something worth while soon.

Kind Regards

Craig Butler
Comment 2 markhamb 2017-12-18 16:04:19 UTC
Any updates on this?  Anything I can do to help move this forwards?

Best,
-Markham
Comment 3 Eshin Kunishima 2018-01-12 08:25:06 UTC
I have the same issue. Started volume but it's still offline. Is there any progress?
Comment 4 Roman Serbski 2018-04-26 13:04:24 UTC
+1

Same here under FreeBSD 11.1-RELEASE-p8 with glusterfs-3.11.1_4 installed.
Comment 5 r00t 2018-05-10 13:08:05 UTC
The reason why you see N/As is because 'gluster volume status' relies on RDMA (libverbs in particular, which as far as I understood doesn't exist in FreeBSD).

If you try to compile glusterfs manually you'll notice that:

...
checking for ibv_get_device_list in -libverbs... no
checking for rdma_create_id in -lrdmacm... no
...

Which will result in:

GlusterFS configure summary
===========================
...
Infiniband verbs     : no
...

And consequently will produce the following in the logs:

E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: Cannot open "/usr/local/lib/glusterfs/3.13.2/rpc-transport/rdma.so"
W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine

I'm not sure about the support of userspace access to RDMA in FreeBSD. According to https://wiki.freebsd.org/InfiniBand you can try to add WITH_OFED=yes to /etc/src.conf and build/installworld.
Comment 6 bgorbutt 2018-07-13 19:32:02 UTC
I am also experiencing this issue on FreeBSD 11.1-RELEASE-P11 running GlusterFS 3.11.1.
Comment 7 Vincent Milum Jr 2019-01-27 18:18:02 UTC
At first I thought when I ran into this issue, it was just a minor inconvenience of the UI, because the cluster still functioned properly with read/write operations. However...

It turns out that "gluster volume heal" commands instantly fail because it checks the "online" status of each node first before issuing the command. This makes it impossible to issue certain administrative commands on the cluster at all.

Also, the Gluster version available in FreeBSD is significantly outdated and contains a known memory leak bug that has already been addressed and fixed upstream. If needed, I can open a second issue about this here.
Comment 8 Kubilay Kocak freebsd_committer freebsd_triage 2019-02-09 03:18:05 UTC
This issue needs isolation and a patch (if relevant) to progress.

Since the port is out of date, I would also encourage attempting to update the port to a later version, and test reproduction of the issue again, to identify whether it has been fixed upstream.
Comment 9 gnoma 2019-10-23 11:40:56 UTC
Hello,

Any update on this? 

I'm running on glusterfs-3.11.1_6 and still have the same issue. 

Not being able to run gluster volume heal is a huge problem since in case of possible data corruption you will not be able to fix it. 


Thanks
Comment 10 Daniel Morante 2020-07-21 18:36:02 UTC
The quick fix for this problem is to make sure you have procfs mounted as per https://www.freebsd.org/doc/en_US.ISO8859-1/articles/linux-users/procfs.html

After doing so, it shows the brick as online.  This was tested on Gluster 7.6 and probably works for 8.0 (https://reviews.freebsd.org/D25037)

root@moon:~ # mount proc
root@moon:~ # df -h
Filesystem    Size    Used   Avail Capacity  Mounted on
/dev/da0p2     18G    2.4G     15G    14%    /
devfs         1.0K    1.0K      0B   100%    /dev
gluster        19G    461M     18G     2%    /gluster
/dev/fuse      19G    655M     18G     3%    /mnt/replicated
procfs        4.0K    4.0K      0B   100%    /proc

root@moon:~ # service glusterd restart
Stopping glusterd.
Waiting for PIDS: 490, 490.
Starting glusterd.
root@moon:~ # gluster volume status
Status of volume: replicated
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick sun:/gluster/replicated               N/A       N/A        N       N/A  
Brick earth:/gluster/replicated             N/A       N/A        N       N/A  
Brick moon:/gluster/replicated              49152     0          Y       831  
Self-heal Daemon on localhost               N/A       N/A        Y       877  
Self-heal Daemon on earth                   N/A       N/A        N       N/A  
Self-heal Daemon on sun.gluster.morante.com N/A       N/A        N       N/A  
 
Task Status of Volume replicated
------------------------------------------------------------------------------
There are no active volume tasks

The core of the problem appears to be in the "gf_is_pid_running" function in 
https://github.com/gluster/glusterfs/blob/v7.6/libglusterfs/src/common-utils.c#L4098

It needs to be patched with a FreeBSD equivalent that doesn't use "proc"
Comment 11 Daniel Morante 2020-07-23 22:58:25 UTC
I've made patches to remove the procfs requirement
https://github.com/tuaris/freebsd-glusterfs7/tree/master/net/glusterfs7/files
Comment 12 gnoma 2020-07-24 14:50:06 UTC
Thank you very much for the update and for testing it on such recent versions of glusterfs. 
However in FreeBSD ports and pkg_ng repositories we still have Glusterfs version 3.11 so your update doesn't really help.


root@hal9000:/usr/ports/net/glusterfs # cat distinfo 
TIMESTAMP = 1499632037
SHA256 (glusterfs-3.11.1.tar.gz) = c7e0502631c9bc9da05795b666b74ef40a30a0344f5a2e205e65bd2faefe1442
SIZE (glusterfs-3.11.1.tar.gz) = 9155001


root@hal9000:/usr/ports/net/glusterfs # pkg search gluster
glusterfs-3.11.1_7             GlusterFS distributed file system
root@hal9000:/usr/ports/net/glusterfs # 

So unless there's a link to the instructions how to build and install such later versions on FreeBSD your fix is useless. 
Don't get me wrong, thank you so much for the effort but still I would like to be able to install it and to use it on FreeBSD as a real user. 

Thank you
Comment 13 Daniel Morante 2020-07-24 18:37:36 UTC
The port is currently in the process of being updated to 8.0.  In the mean time you can use the Github repo I linked to, to test version 7.6
Comment 14 commit-hook freebsd_committer freebsd_triage 2020-07-29 20:34:17 UTC
A commit references this bug:

Author: flo
Date: Wed Jul 29 20:34:02 UTC 2020
New revision: 543674
URL: https://svnweb.freebsd.org/changeset/ports/543674

Log:
  Update to 8.0, this is a collaborative effort between Daniel Morante and
  myself.

  - update to 8.0
  - make it possible to mount gluster volumes on boot [1]
  - reset maintainer [1], I would have set it to ports@ but Daniel volunteered
    to maintain the port
  - add pkg-message to point out that procfs is required for some operations
    like "gluster volume status" which is also required for self healing. [2]

  This version works although I still see the same memory leak as with the
  3.X series.

  PR:		236112 [1], 223671 [2]
  Submitted by:	Daniel Morante <daniel@morante.net>, flo
  Obtained from:	https://github.com/tuaris/freebsd-glusterfs7
  Differential Revision:	D25037

Changes:
  head/net/glusterfs/Makefile
  head/net/glusterfs/distinfo
  head/net/glusterfs/files/glusterd.in
  head/net/glusterfs/files/patch-configure
  head/net/glusterfs/files/patch-configure.ac
  head/net/glusterfs/files/patch-contrib_fuse-lib_mount.c
  head/net/glusterfs/files/patch-extras_Makefile.in
  head/net/glusterfs/files/patch-libglusterfs_src_common-utils.c
  head/net/glusterfs/files/patch-libglusterfs_src_syscall.c
  head/net/glusterfs/files/patch-xlators_mgmt_glusterd_src_Makefile.am
  head/net/glusterfs/pkg-message
  head/net/glusterfs/pkg-plist
Comment 15 Florian Smeets freebsd_committer freebsd_triage 2020-07-29 20:37:07 UTC
Should be fixed.