Bug 235305 - net/glusterfs: Memory leak in 3.11/3.12
Summary: net/glusterfs: Memory leak in 3.11/3.12
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-ports-bugs (Nobody)
URL:
Keywords: needs-patch, needs-qa
Depends on:
Blocks:
 
Reported: 2019-01-29 19:39 UTC by Vincent Milum Jr
Modified: 2021-02-23 02:12 UTC (History)
8 users (show)

See Also:
koobs: maintainer-feedback? (daniel)
koobs: merge-quarterly?


Attachments
upgrade to 3.12.15 (1.81 KB, patch)
2019-07-03 07:54 UTC, Lapo Luchini
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Milum Jr 2019-01-29 19:39:26 UTC
There are known memory leaks in upstream Gluster 3.11 and 3.12 which has since been resolved. Those patches have yet to hit FreeBSD, and I can easily confirm that I'm hitting the exact same memory leaks described in the upstream bugs. 

Working with a large number of files (in my case, writing 100k-1m files into Gluster) will eventually cause GlusterFS to consume all RAM and SWAP until the system becomes unresponsive due to OOM.

Upstream details: https://bugzilla.redhat.com/show_bug.cgi?id=1593884
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2019-01-30 03:04:09 UTC
Upstream "backport to 3.12.x" request: https://bugzilla.redhat.com/show_bug.cgi?id=1613512

"This is fixed with 3.12.13 now.
Comment 2 craig001 2019-02-14 14:42:44 UTC
please patch and change maintainer
Comment 3 Lapo Luchini 2019-06-14 13:10:24 UTC
I guess the previous comment could be read as a resignation from maintainership and, thus, this PR should not anymore be waiting on "maintainer feedback".

Hoping this will be fixed soon!
(and so soon as I have some time on hand I could try the backport myself)
Comment 4 Lapo Luchini 2019-06-14 13:21:02 UTC
I guess the previous comment could be read as a resignation from maintainership and, thus, this PR should not anymore be waiting on "maintainer feedback".

Hoping this will be fixed soon!
(and so soon as I have some time on hand I could try the backport myself)
Comment 5 Lapo Luchini 2019-07-03 07:54:16 UTC
Created attachment 205495 [details]
upgrade to 3.12.15

This is the easiest upgrade (with current patchset) from current version that contains a fix for this leak (was fixed upstream in 3.12.13).

Unfortunately, to my testing, it doesn't change much (if at all) as there are other leaks.

Upgrading to 3.13 or later might solve those, but that needs new patches that I didn't attempt.
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2019-07-03 09:08:18 UTC
(In reply to Lapo Luchini from comment #5)

Thank you for the patch Lapo. Could you reference from where this patch came?

Upstream issue/pr/commit references would be great to comment the patch with
Comment 7 Lapo Luchini 2019-07-03 10:12:09 UTC
It's a simple upgrade s/3.11.1/3.12.15/ but that had a compilation error, which I fixed by adding a missing "gf_" in a function name (as suggested by the compiler itself). (I don't understand how that even compiles upstream)

Truth said: I'm not *sure* what that function does and if my fix is strictly correct, but given the absence of "uuid_is_null()" in the sources, the presence of "gf_uuid_is_null()" and the fact that the variable is called "gfid" and the following line calls "gf_uuid_copy()" I'd say it's safe to assume that's the correct fix.
Comment 9 gnoma 2019-10-23 11:37:42 UTC
Hello, 

Any update on this? 
The package I see in the repository is still glusterfs-3.11.1_6. 


Thanks
Comment 10 iron.udjin 2019-12-15 22:15:00 UTC
Hello,

There is 3.13.2 available. But it doesn't compile on 12.1-STABLE:

--- glfs-fops.lo ---
glfs-fops.c:4894:13: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses]
        if (!xstat->flags_handled & GFAPI_XREADDIRP_STAT)
            ^                     ~
glfs-fops.c:4894:13: note: add parentheses after the '!' to evaluate the bitwise operator first
        if (!xstat->flags_handled & GFAPI_XREADDIRP_STAT)
            ^
             (                                          )
glfs-fops.c:4894:13: note: add parentheses around left hand side expression to silence this warning
        if (!xstat->flags_handled & GFAPI_XREADDIRP_STAT)
            ^
            (                    )
--- glfs-mgmt.lo ---
mv -f .deps/glfs-mgmt.Tpo .deps/glfs-mgmt.Plo
--- glfs-resolve.lo ---
mv -f .deps/glfs-resolve.Tpo .deps/glfs-resolve.Plo
--- glfs.lo ---
error: versioned symbol glfs_upcall_register@@GFAPI_3.13.0 must be defined
error: versioned symbol glfs_upcall_unregister@@GFAPI_3.13.0 must be defined
2 errors generated.
*** [glfs.lo] Error code 1

make[5]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2/api/src
--- glfs-handleops.lo ---
mv -f .deps/glfs-handleops.Tpo .deps/glfs-handleops.Plo
--- glfs-fops.lo ---
1 warning generated.
mv -f .deps/glfs-fops.Tpo .deps/glfs-fops.Plo
1 error

make[5]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2/api/src
*** [all-recursive] Error code 1

make[4]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2/api
1 error

make[4]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2/api
*** [all-recursive] Error code 1

make[3]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2
1 error

make[3]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2
*** [all] Error code 2

make[2]: stopped in /usr/ports/net/glusterfs/work/glusterfs-3.13.2
1 error

Is there way to fix it?
Comment 11 Florian Smeets freebsd_committer 2020-07-29 20:38:53 UTC
I'm still seeing memory leaks in what I think is the client part (the mounted gluster fs) after the Update to 8.0
Comment 12 Alexey Dokuchaev freebsd_committer 2020-12-04 03:28:48 UTC
Is is PR still relevant after the port had been updated to version 8.0 with ports r543674?
Comment 13 Daniel Morante 2020-12-06 12:39:13 UTC
It seems that the supposed memory leak is with the glusterfs `fuse` process.

For example:

```
/usr/local/sbin/glusterfs --process-name fuse --volfile-server=moon --volfile-server=sun --volfile-server=earth --volfile-id=replicated /mnt/replicated (glusterfsd)
```

I was seeing some OOM kills on some servers but that was on an older version.

I will upgrade a test server to 8.0 and let it run for a few days and see if the process's memory usage keeps growing.
Comment 14 Daniel Morante 2021-01-26 20:26:14 UTC
I have run a 3 node cluster and left it up for 50 days so far.  Not seeing any OOM kills or any memory leaks.

Mem: 21M Active, 2006M Inact, 908M Wired, 395M Buf, 1289M Free
Mem: 19M Active, 2635M Inact, 760M Wired, 395M Buf, 819M Free
Mem: 18M Active, 1615M Inact, 983M Wired, 395M Buf, 1704M Free

 1169 root          8  20    0    51M    18M select   0 932:34   1.33% glusterf
 1261 root         14  20    0   128M    79M select   1 908:46   1.25% glusterf
 1264 root          8  20    0   107M    61M select   1 830:05   1.17% glusterf
  692 root         10  20    0    61M    19M select   0 788:17   1.08% glusterf

 1546 root          8  20    0    53M    18M select   1 655:57   0.84% glusterf
 1549 root         13  20    0   144M    96M select   1 625:14   0.81% glusterf
 1552 root          8  20    0   105M    59M select   1 565:24   0.71% glusterf
  703 root         10  20    0    62M    18M select   1 532:11   0.67% glusterf

  663 root          8  20    0    48M    16M select   0  96:22   0.92% glusterf
  726 root         13  20    0    80M    30M select   1  92:24   0.85% glusterf
  729 root          8  20    0    64M    18M select   1  84:52   0.72% glusterf
  723 root         10  20    0    63M    17M select   1  81:32   0.70% glusterf

I will close this bug since it seems the issue has been resolved in the 8.0 release.
Comment 15 Daniel Morante 2021-01-29 07:39:14 UTC
Turns out the memory leak occurs when you read/write to the fuse client.  After some digging around I found these two bug reports:

https://github.com/gluster/glusterfs/issues/1440
https://github.com/gluster/glusterfs/issues/1413

Looks like this is a known issue upstream and a fix has yet to be committed.  The suggested workaround is to disable open-behind:

```
gluster volume set <volname> open-behind off
```
Comment 16 Kubilay Kocak freebsd_committer freebsd_triage 2021-01-31 01:21:43 UTC
(In reply to Daniel Morante from comment #14)

If the root cause issues are yet to be resolved, should we not leave this issue open? One resolution option pending resolution upstream is a pkg-message indicating the issue, with details on the workaround (until permanently resolved)
Comment 17 Daniel Morante 2021-02-23 02:12:56 UTC
(In reply to Kubilay Kocak from comment #16)
Yes, I agree it makes sense to leave it open.  Is there a way to mark this bug as waiting for something external?  Or do we just leave it as is?