There are known memory leaks in upstream Gluster 3.11 and 3.12 which has since been resolved. Those patches have yet to hit FreeBSD, and I can easily confirm that I'm hitting the exact same memory leaks described in the upstream bugs. Working with a large number of files (in my case, writing 100k-1m files into Gluster) will eventually cause GlusterFS to consume all RAM and SWAP until the system becomes unresponsive due to OOM. Upstream details: https://bugzilla.redhat.com/show_bug.cgi?id=1593884
Upstream "backport to 3.12.x" request: https://bugzilla.redhat.com/show_bug.cgi?id=1613512 "This is fixed with 3.12.13 now.
please patch and change maintainer
I guess the previous comment could be read as a resignation from maintainership and, thus, this PR should not anymore be waiting on "maintainer feedback". Hoping this will be fixed soon! (and so soon as I have some time on hand I could try the backport myself)
Created attachment 205495 [details] upgrade to 3.12.15 This is the easiest upgrade (with current patchset) from current version that contains a fix for this leak (was fixed upstream in 3.12.13). Unfortunately, to my testing, it doesn't change much (if at all) as there are other leaks. Upgrading to 3.13 or later might solve those, but that needs new patches that I didn't attempt.
(In reply to Lapo Luchini from comment #5) Thank you for the patch Lapo. Could you reference from where this patch came? Upstream issue/pr/commit references would be great to comment the patch with
It's a simple upgrade s/3.11.1/3.12.15/ but that had a compilation error, which I fixed by adding a missing "gf_" in a function name (as suggested by the compiler itself). (I don't understand how that even compiles upstream) Truth said: I'm not *sure* what that function does and if my fix is strictly correct, but given the absence of "uuid_is_null()" in the sources, the presence of "gf_uuid_is_null()" and the fact that the variable is called "gfid" and the following line calls "gf_uuid_copy()" I'd say it's safe to assume that's the correct fix.
Found it upstream: https://github.com/gluster/glusterfs/commit/d945c44c6289eadcda805f728fcc638586658c37
Hello, Any update on this? The package I see in the repository is still glusterfs-3.11.1_6. Thanks
Hello, There is 3.13.2 available. But it doesn't compile on 12.1-STABLE: --- glfs-fops.lo --- glfs-fops.c:4894:13: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses] if (!xstat->flags_handled & GFAPI_XREADDIRP_STAT) ^ ~ glfs-fops.c:4894:13: note: add parentheses after the '!' to evaluate the bitwise operator first if (!xstat->flags_handled & GFAPI_XREADDIRP_STAT) ^ ( ) glfs-fops.c:4894:13: note: add parentheses around left hand side expression to silence this warning if (!xstat->flags_handled & GFAPI_XREADDIRP_STAT) ^ ( ) --- glfs-mgmt.lo --- mv -f .deps/glfs-mgmt.Tpo .deps/glfs-mgmt.Plo --- glfs-resolve.lo --- mv -f .deps/glfs-resolve.Tpo .deps/glfs-resolve.Plo --- glfs.lo --- error: versioned symbol glfs_upcall_register@@GFAPI_3.13.0 must be defined error: versioned symbol glfs_upcall_unregister@@GFAPI_3.13.0 must be defined 2 errors generated. *** [glfs.lo] Error code 1 make[5]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2/api/src --- glfs-handleops.lo --- mv -f .deps/glfs-handleops.Tpo .deps/glfs-handleops.Plo --- glfs-fops.lo --- 1 warning generated. mv -f .deps/glfs-fops.Tpo .deps/glfs-fops.Plo 1 error make[5]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2/api/src *** [all-recursive] Error code 1 make[4]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2/api 1 error make[4]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2/api *** [all-recursive] Error code 1 make[3]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2 1 error make[3]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2 *** [all] Error code 2 make[2]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2 1 error Is there way to fix it?
I'm still seeing memory leaks in what I think is the client part (the mounted gluster fs) after the Update to 8.0
Is is PR still relevant after the port had been updated to version 8.0 with ports r543674?
It seems that the supposed memory leak is with the glusterfs `fuse` process. For example: ``` /usr/local/sbin/glusterfs --process-name fuse --volfile-server=moon --volfile-server=sun --volfile-server=earth --volfile-id=replicated /mnt/replicated (glusterfsd) ``` I was seeing some OOM kills on some servers but that was on an older version. I will upgrade a test server to 8.0 and let it run for a few days and see if the process's memory usage keeps growing.
I have run a 3 node cluster and left it up for 50 days so far. Not seeing any OOM kills or any memory leaks. Mem: 21M Active, 2006M Inact, 908M Wired, 395M Buf, 1289M Free Mem: 19M Active, 2635M Inact, 760M Wired, 395M Buf, 819M Free Mem: 18M Active, 1615M Inact, 983M Wired, 395M Buf, 1704M Free 1169 root 8 20 0 51M 18M select 0 932:34 1.33% glusterf 1261 root 14 20 0 128M 79M select 1 908:46 1.25% glusterf 1264 root 8 20 0 107M 61M select 1 830:05 1.17% glusterf 692 root 10 20 0 61M 19M select 0 788:17 1.08% glusterf 1546 root 8 20 0 53M 18M select 1 655:57 0.84% glusterf 1549 root 13 20 0 144M 96M select 1 625:14 0.81% glusterf 1552 root 8 20 0 105M 59M select 1 565:24 0.71% glusterf 703 root 10 20 0 62M 18M select 1 532:11 0.67% glusterf 663 root 8 20 0 48M 16M select 0 96:22 0.92% glusterf 726 root 13 20 0 80M 30M select 1 92:24 0.85% glusterf 729 root 8 20 0 64M 18M select 1 84:52 0.72% glusterf 723 root 10 20 0 63M 17M select 1 81:32 0.70% glusterf I will close this bug since it seems the issue has been resolved in the 8.0 release.
Turns out the memory leak occurs when you read/write to the fuse client. After some digging around I found these two bug reports: https://github.com/gluster/glusterfs/issues/1440 https://github.com/gluster/glusterfs/issues/1413 Looks like this is a known issue upstream and a fix has yet to be committed. The suggested workaround is to disable open-behind: ``` gluster volume set <volname> open-behind off ```
(In reply to Daniel Morante from comment #14) If the root cause issues are yet to be resolved, should we not leave this issue open? One resolution option pending resolution upstream is a pkg-message indicating the issue, with details on the workaround (until permanently resolved)
(In reply to Kubilay Kocak from comment #16) Yes, I agree it makes sense to leave it open. Is there a way to mark this bug as waiting for something external? Or do we just leave it as is?
(In reply to Daniel Morante from comment #17) @Maintainer / Reporter Updates/progress on this? Can we resolve this in any other way?
From the looks of it the upstream project isn't interested in helping resolve this leak. I don't have the needed skills to track down memory leak. The best way to move forward is to see if someone with the correct experience is willing to volunteer some time to (1) tracking down the leak, and (2) provide a patch to fix it. I can assist in the process of having the patch applied upstream.
The Gluster version if Ports has been updated well beyond 3.x and from what I can tell, there is no active memory leak anymore in the current version. I've been running 8.x for some time now without any issues.