Summary: | All the memory eaten away by ZFS 'solaris' malloc | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Mark.Martinec | ||||
Component: | kern | Assignee: | Mark Johnston <markj> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | asomers, avg, markj, peter.x.eriksson | ||||
Priority: | --- | Keywords: | regression | ||||
Version: | 11.2-RELEASE | ||||||
Hardware: | Any | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Mark.Martinec
2018-08-17 13:33:29 UTC
Apparently a regression in ZFS vs. 10.x. Running zfsd also triggers this memory leak (or a similar one) when you have a defunct pool. Seems zfsd probes the pool fairly often also... So a machine with 256GB RAM ran out of it in a day or so :) Created attachment 197091 [details]
proposed patch
Mark, are you still able to reproduce the problem? Could you give the attached patch a try?
Thanks, great! Will try it on Monday (I'm on vacation now, with limited connectivity). Tried it now (based on 11.2-RELEASE-p3). It is a major improvement: the 'vmstat -m' InUse count on a solaris zone came down to 18 (from previous 520 or so) per invocation of a 'zpool list' on a defunct pool: # (while true; do zpool list stuff >/dev/null; vmstat -m | fgrep solaris; sleep 1; done) | awk '{print $2-a, $3; a=$2}' 41167 20851K 18 20854K 18 20856K 18 20859K 18 20861K 18 20863K 18 20866K 18 20868K 18 20870K 18 20873K 18 20875K 18 20878K 18 20880K 18 20882K 18 20885K So instead of four days, this host would now stay up 30 times longer. (In reply to Mark.Martinec from comment #5) This might be the result of zfs_dbgmsg(). If so, the memory usage will stop increasing once zfs_dbgmsg_size hits the 4MB limit. Could you run zpool -Hp for a while and see if the memory usage stops increasing? > This might be the result of zfs_dbgmsg(). If so, the memory usage
> will stop increasing once zfs_dbgmsg_size hits the 4MB limit.
> Could you run zpool -Hp for a while and see if the memory usage
> stops increasing?
Indeed this seems to be the case.
The growth stopped when the MemUse on solaris reached about 5 MB:
18 20854K
18 20856K
18 20859K
[...]
18 26371K
18 26373K
18 26376K
18 26378K
18 26381K
18 26383K
18 26385K
18 26388K
18 26390K
-1 26390K
1 26390K
0 26390K
2 26390K
0 26390K
0 26390K
0 26390K
0 26390K
0 26390K
(In reply to Mark.Martinec from comment #8) Great, thank you! A commit references this bug: Author: markj Date: Mon Sep 17 16:16:58 UTC 2018 New revision: 338724 URL: https://svnweb.freebsd.org/changeset/base/338724 Log: Fix an nvpair leak in vdev_geom_read_config(). Also change the behaviour slightly: instead of freeing "config" if the last nvlist doesn't pass the tests, return the last config that did pass those tests. This matches the comment at the beginning of the function. PR: 230704 Diagnosed by: avg Reviewed by: asomers, avg Tested by: Mark Martinec <Mark.Martinec@ijs.si> Approved by: re (gjb) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17202 Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c A commit references this bug: Author: markj Date: Mon Sep 24 14:48:27 UTC 2018 New revision: 338904 URL: https://svnweb.freebsd.org/changeset/base/338904 Log: MFC r338724: Fix an nvpair leak in vdev_geom_read_config(). PR: 230704 Changes: _U stable/11/ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c A commit references this bug: Author: markj Date: Mon Sep 24 14:50:45 UTC 2018 New revision: 338905 URL: https://svnweb.freebsd.org/changeset/base/338905 Log: MFC r338724: Fix an nvpair leak in vdev_geom_read_config(). PR: 230704 Changes: _U stable/10/ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Thanks for your help in tracking this down, Mark. |