Bug 223671

Summary: net/glusterfs: glusterfs volume status not showing online
Product: Ports & Packages Reporter: markham_breitbach
Component: Individual Port(s)Assignee: freebsd-ports-bugs mailing list <ports-bugs>
Status: Open ---    
Severity: Affects Only Me CC: bgorbutt, craig001, freebsd, kunishima, mefystofel, r00t
Priority: --- Keywords: needs-patch, needs-qa
Version: LatestFlags: linimon: maintainer-feedback? (craig001)
Hardware: amd64   
OS: Any   

Description markham_breitbach 2017-11-14 20:25:21 UTC
After I setup a simple glusterfs volume (replica 2, gluster shows all bricks offline, but I can mount the volume and it appears to work (read/write files).

# gluster volume status
Status of volume: apple
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gtest4:/groot/apple                   N/A       N/A        N       N/A
Brick gtest3:/groot/apple                   N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        N       967
Self-heal Daemon on gtest4                  N/A       N/A        N       970

Task Status of Volume apple
------------------------------------------------------------------------------
There are no active volume tasks

I don't see anything particularly obvious in the logs.

------------------
How to Reproduce:
------------------

 # pkg install glusterfs
 # zfs create -omountpoint=/groot zroot/groot
 # service glusterd onestart
 # gluster peer probe gtest4
 # gluster volume create apple replica 2 gtest4:/groot/apple
gtest3:/groot/apple
 # gluster volume start apple
 # gluster volume status
Status of volume: apple
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gtest4:/groot/apple                   N/A       N/A        N       N/A
Brick gtest3:/groot/apple                   N/A       N/A        N       N/A
Self-heal Daemon on localhost               N/A       N/A        N       967
Self-heal Daemon on gtest4                  N/A       N/A        N       970

Task Status of Volume apple
------------------------------------------------------------------------------
There are no active volume tasks


--------------------
LOGS:
--------------------

----- glusterd.log -----
[2017-11-08 22:29:09.132427] I [MSGID: 100030] [glusterfsd.c:2476:main]
0-/usr/local/sbin/glusterd: Started running /usr/local/sbin/glusterd
version 3.11.1 (args: /usr/local/sbin/glusterd
--pid-file=/var/run/glusterd.pid)
[2017-11-08 22:29:09.138608] I [MSGID: 106478] [glusterd.c:1422:init]
0-management: Maximum allowed open file descriptors set to 65536
[2017-11-08 22:29:09.138676] I [MSGID: 106479] [glusterd.c:1469:init]
0-management: Using /var/db/glusterd as working directory
[2017-11-08 22:29:09.143727] E [rpc-transport.c:283:rpc_transport_load]
0-rpc-transport: Cannot open
"/usr/local/lib/glusterfs/3.11.1/rpc-transport/rdma.so"
[2017-11-08 22:29:09.143758] W [rpc-transport.c:287:rpc_transport_load]
0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not
valid or not found on this machine
[2017-11-08 22:29:09.143777] W [rpcsvc.c:1660:rpcsvc_create_listener]
0-rpc-service: cannot create listener, initing the transport failed
[2017-11-08 22:29:09.143796] E [MSGID: 106243] [glusterd.c:1693:init]
0-management: creation of 1 listeners failed, continuing with succeeded
transport
[2017-11-08 22:29:09.147373] E [MSGID: 101032]
[store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
/var/db/glusterd/glusterd.info. [No such file or directory]
[2017-11-08 22:29:09.147414] E [MSGID: 101032]
[store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
/var/db/glusterd/glusterd.info. [No such file or directory]
[2017-11-08 22:29:09.147416] I [MSGID: 106514]
[glusterd-store.c:2215:glusterd_restore_op_version] 0-management:
Detected new install. Setting op-version to maximum : 31100
[2017-11-08 22:29:09.147477] E [MSGID: 101032]
[store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
/var/db/glusterd/options. [No such file or directory]
[2017-11-08 22:29:09.168706] I [MSGID: 106194]
[glusterd-store.c:3772:glusterd_store_retrieve_missed_snaps_list]
0-management: No missed snaps list.
Final graph:
+------------------------------------------------------------------------------+
  1: volume management
  2:     type mgmt/glusterd
  3:     option rpc-auth.auth-glusterfs on
  4:     option rpc-auth.auth-unix on
  5:     option rpc-auth.auth-null on
  6:     option rpc-auth-allow-insecure on
  7:     option transport.socket.listen-backlog 128
  8:     option event-threads 1
  9:     option ping-timeout 0
 10:     option transport.socket.read-fail-log off
 11:     option transport.socket.keepalive-interval 2
 12:     option transport.socket.keepalive-time 10
 13:     option transport-type rdma
 14:     option working-directory /var/db/glusterd
 15: end-volume
 16:
+------------------------------------------------------------------------------+
[2017-11-08 22:29:41.245073] I [MSGID: 106487]
[glusterd-handler.c:1242:__glusterd_handle_cli_probe] 0-glusterd:
Received CLI probe req gtest4 24007
[2017-11-08 22:29:41.269548] I [MSGID: 106129]
[glusterd-handler.c:3623:glusterd_probe_begin] 0-glusterd: Unable to
find peerinfo for host: gtest4 (24007)
[2017-11-08 22:29:41.310764] W [MSGID: 106062]
[glusterd-handler.c:3399:glusterd_transport_inet_options_build]
0-glusterd: Failed to get tcp-user-timeout
[2017-11-08 22:29:41.310813] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2017-11-08 22:29:41.310890] W [MSGID: 101002]
[options.c:954:xl_opt_validate] 0-management: option 'address-family' is
deprecated, preferred is 'transport.address-family', continuing with
correction
[2017-11-08 22:29:41.317314] I [MSGID: 106498]
[glusterd-handler.c:3549:glusterd_friend_add] 0-management: connect
returned 0
[2017-11-08 22:29:41.318182] E [MSGID: 101032]
[store.c:433:gf_store_handle_retrieve] 0-: Path corresponding to
/var/db/glusterd/glusterd.info. [No such file or directory]
[2017-11-08 22:29:41.318258] I [MSGID: 106477]
[glusterd.c:190:glusterd_uuid_generate_save] 0-management: generated
UUID: cee1794a-5e83-415e-88e0-eae018f9e745
[2017-11-08 22:29:41.426904] I [MSGID: 106511]
[glusterd-rpc-ops.c:261:__glusterd_probe_cbk] 0-management: Received
probe resp from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3, host: gtest4
[2017-11-08 22:29:41.426950] I [MSGID: 106511]
[glusterd-rpc-ops.c:421:__glusterd_probe_cbk] 0-glusterd: Received resp
to probe req
[2017-11-08 22:29:41.455508] I [MSGID: 106493]
[glusterd-rpc-ops.c:485:__glusterd_friend_add_cbk] 0-glusterd: Received
ACC from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3, host: gtest4, port: 0
[2017-11-08 22:29:41.496977] I [MSGID: 106163]
[glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
0-management: using the op-version 31100
[2017-11-08 22:29:41.521751] I [MSGID: 106490]
[glusterd-handler.c:2890:__glusterd_handle_probe_query] 0-glusterd:
Received probe from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3
[2017-11-08 22:29:41.525628] I [MSGID: 106493]
[glusterd-handler.c:2953:__glusterd_handle_probe_query] 0-glusterd:
Responded to gtest4.ssimicro.com, op_ret: 0, op_errno: 0, ret: 0
[2017-11-08 22:29:41.538527] I [MSGID: 106490]
[glusterd-handler.c:2539:__glusterd_handle_incoming_friend_req]
0-glusterd: Received probe from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3
[2017-11-08 22:29:41.557101] I [MSGID: 106493]
[glusterd-handler.c:3799:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to gtest4 (0), ret: 0, op_ret: 0
[2017-11-08 22:29:41.574502] I [MSGID: 106492]
[glusterd-handler.c:2717:__glusterd_handle_friend_update] 0-glusterd:
Received friend update from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3
[2017-11-08 22:29:41.574567] I [MSGID: 106502]
[glusterd-handler.c:2762:__glusterd_handle_friend_update] 0-management:
Received my uuid as Friend
[2017-11-08 22:29:41.574634] I [MSGID: 106493]
[glusterd-rpc-ops.c:700:__glusterd_friend_update_cbk] 0-management:
Received ACC from uuid: 2dcbff56-88e8-48c8-814a-101f3d32c2b3
[2017-11-08 22:30:47.866210] W [MSGID: 101095]
[xlator.c:162:xlator_volopt_dynload] 0-xlator: Cannot open
"/usr/local/lib/glusterfs/3.11.1/xlator/nfs/server.so"
[2017-11-08 22:30:55.267120] I [MSGID: 106143]
[glusterd-pmap.c:279:pmap_registry_bind] 0-pmap: adding brick
/groot/apple on port 49152
[2017-11-08 22:30:55.267548] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2017-11-08 22:30:55.346373] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-snapd: setting
frame-timeout to 600
[2017-11-08 22:30:55.346761] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-nfs: setting frame-timeout
to 600
[2017-11-08 22:30:55.346875] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already
stopped
[2017-11-08 22:30:55.346924] I [MSGID: 106568]
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: nfs service is
stopped
[2017-11-08 22:30:55.346968] I [MSGID: 106600]
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management:
nfs/server.so xlator is not installed
[2017-11-08 22:30:55.347136] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-glustershd: setting
frame-timeout to 600
[2017-11-08 22:30:55.347911] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd
already stopped
[2017-11-08 22:30:55.347950] I [MSGID: 106568]
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: glustershd
service is stopped
[2017-11-08 22:30:55.348012] I [MSGID: 106567]
[glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting
glustershd service
[2017-11-08 22:30:56.365516] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-quotad: setting
frame-timeout to 600
[2017-11-08 22:30:56.365837] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-bitd: setting frame-timeout
to 600
[2017-11-08 22:30:56.366019] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already
stopped
[2017-11-08 22:30:56.366071] I [MSGID: 106568]
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: bitd service
is stopped
[2017-11-08 22:30:56.366235] I
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-scrub: setting
frame-timeout to 600
[2017-11-08 22:30:56.366372] I [MSGID: 106132]
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already
stopped
[2017-11-08 22:30:56.366409] I [MSGID: 106568]
[glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service
is stopped
[2017-11-08 22:30:57.707989] E [run.c:190:runner_log] (-->0x803ecad43
<notify+0xb1de3> at
/usr/local/lib/glusterfs/3.11.1/xlator/mgmt/glusterd.so -->0x8008a6504
<runner_log+0x104> at /usr/local/lib/libglusterfs.so.0 ) 0-management:
Failed to execute script:
/var/db/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=apple
--first=yes --version=1 --volume-op=start --gd-workdir=/var/db/glusterd
[2017-11-08 22:30:57.722114] E [run.c:190:runner_log] (-->0x803ecad43
<notify+0xb1de3> at
/usr/local/lib/glusterfs/3.11.1/xlator/mgmt/glusterd.so -->0x8008a6504
<runner_log+0x104> at /usr/local/lib/libglusterfs.so.0 ) 0-management:
Failed to execute script:
/var/db/glusterd/hooks/1/start/post/S30samba-start.sh --volname=apple
--first=yes --version=1 --volume-op=start --gd-workdir=/var/db/glusterd
[2017-11-08 22:31:01.537697] I [MSGID: 106499]
[glusterd-handler.c:4302:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume apple
[2017-11-08 22:31:05.749436] I [MSGID: 106499]
[glusterd-handler.c:4302:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume apple
[2017-11-08 22:31:55.783261] I [MSGID: 106499]
[glusterd-handler.c:4302:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume apple
[2017-11-08 22:34:59.660782] I [MSGID: 106487]
[glusterd-handler.c:1484:__glusterd_handle_cli_list_friends] 0-glusterd:
Received cli list req
Comment 1 craig001 2017-11-21 23:14:21 UTC
Hello Folks

Thanks for bringing this to my attention, will look into it and report back shortly.  There are a couple of issues with GlusterFS that need upstreaming and getting more hands on to correct.
Hopefully have something worth while soon.

Kind Regards

Craig Butler
Comment 2 markham_breitbach 2017-12-18 16:04:19 UTC
Any updates on this?  Anything I can do to help move this forwards?

Best,
-Markham
Comment 3 Eshin Kunishima 2018-01-12 08:25:06 UTC
I have the same issue. Started volume but it's still offline. Is there any progress?
Comment 4 Roman Serbski 2018-04-26 13:04:24 UTC
+1

Same here under FreeBSD 11.1-RELEASE-p8 with glusterfs-3.11.1_4 installed.
Comment 5 r00t 2018-05-10 13:08:05 UTC
The reason why you see N/As is because 'gluster volume status' relies on RDMA (libverbs in particular, which as far as I understood doesn't exist in FreeBSD).

If you try to compile glusterfs manually you'll notice that:

...
checking for ibv_get_device_list in -libverbs... no
checking for rdma_create_id in -lrdmacm... no
...

Which will result in:

GlusterFS configure summary
===========================
...
Infiniband verbs     : no
...

And consequently will produce the following in the logs:

E [rpc-transport.c:283:rpc_transport_load] 0-rpc-transport: Cannot open "/usr/local/lib/glusterfs/3.13.2/rpc-transport/rdma.so"
W [rpc-transport.c:287:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine

I'm not sure about the support of userspace access to RDMA in FreeBSD. According to https://wiki.freebsd.org/InfiniBand you can try to add WITH_OFED=yes to /etc/src.conf and build/installworld.
Comment 6 bgorbutt 2018-07-13 19:32:02 UTC
I am also experiencing this issue on FreeBSD 11.1-RELEASE-P11 running GlusterFS 3.11.1.
Comment 7 Vincent Milum 2019-01-27 18:18:02 UTC
At first I thought when I ran into this issue, it was just a minor inconvenience of the UI, because the cluster still functioned properly with read/write operations. However...

It turns out that "gluster volume heal" commands instantly fail because it checks the "online" status of each node first before issuing the command. This makes it impossible to issue certain administrative commands on the cluster at all.

Also, the Gluster version available in FreeBSD is significantly outdated and contains a known memory leak bug that has already been addressed and fixed upstream. If needed, I can open a second issue about this here.
Comment 8 Kubilay Kocak freebsd_committer freebsd_triage 2019-02-09 03:18:05 UTC
This issue needs isolation and a patch (if relevant) to progress.

Since the port is out of date, I would also encourage attempting to update the port to a later version, and test reproduction of the issue again, to identify whether it has been fixed upstream.