Affected: ZFS on 11.2-RELEASE and 11.1-RELEASE-p11 (but not on 10.3). Running commands like 'zpool list', 'zpool status' or 'zpool iostat' on a defunct pool (with broken underlying disks) leaks memory. When such command runs frequently (like by a monitoring tool 'telegraf'), in a couple of days the system runs out of memory, applications start swapping and eventually everything grinds to a standstill, requiring a forced reboot. In few days, shortly before a freeze, the vmstat -m shows the 'solaris' malloc approaching the total size of the memory (prior to that this number was steadily growing linearly): $ vmstat -m : Type InUse MemUse HighUse Requests Size(s) solaris 39359484 2652696K - 234986296 ... How to repeat: # create a test pool on md mdconfig -a -t swap -s 1Gb gpart create -s gpt /dev/md0 gpart add -t freebsd-zfs -a 4k /dev/md0 zpool create test /dev/md0p1 # destroy the disk underneath the pool, making it "unavailable" mdconfig -d -u 0 -o force reboot now (before a reboot the trouble does not start yet) Now run 'zpool list' periodically, monitoring the growth of the 'solaris' malloc zone: (while true; do zpool list >/dev/null; vmstat -m | \ fgrep solaris; sleep 0.5; done) | awk '{print $2-a; a=$2}' 12224540 2509 3121 5022 2507 1834 2508 2505 As suggested by Mark Johnston, here is a dtrace https://www.ijs.si/usr/mark/tmp/dtrace-cmd.out.bz2 from the following command: # dtrace -c "zpool list -Hp" -x temporal=off -n ' dtmalloc::solaris:malloc /pid == $target/{@allocs[stack(), args[3]] = count()} dtmalloc::solaris:free /pid == $target/{@frees[stack(), args[3]] = count();}' This will record all allocations and frees from a single instance of "zpool list". Andriy Gapon wrote on the mailing list: I see one memory leak, not sure if it's the only one. It looks like vdev_geom_read_config() leaks all parsed vdev nvlist-s but the last. The problems seems to come from r316760. Before that commit the function would return upon finding the first valid config, but now it keeps iterating. The memory leak should not be a problem when vdev-s are probed sufficiently rarely, but it appears that with an unhealthy pool the probing can happen much more frequently (e.g., every time pools are listed). The whole discussion leading to the above findings is on the stable@freebsd.org mailing list, 2018-07-23 to 2018-08-14, subject: "All the memory eaten away by ZFS 'solaris' malloc - on 11.1-R amd64"
Apparently a regression in ZFS vs. 10.x.
Running zfsd also triggers this memory leak (or a similar one) when you have a defunct pool. Seems zfsd probes the pool fairly often also... So a machine with 256GB RAM ran out of it in a day or so :)
Created attachment 197091 [details] proposed patch Mark, are you still able to reproduce the problem? Could you give the attached patch a try?
Thanks, great! Will try it on Monday (I'm on vacation now, with limited connectivity).
Tried it now (based on 11.2-RELEASE-p3). It is a major improvement: the 'vmstat -m' InUse count on a solaris zone came down to 18 (from previous 520 or so) per invocation of a 'zpool list' on a defunct pool: # (while true; do zpool list stuff >/dev/null; vmstat -m | fgrep solaris; sleep 1; done) | awk '{print $2-a, $3; a=$2}' 41167 20851K 18 20854K 18 20856K 18 20859K 18 20861K 18 20863K 18 20866K 18 20868K 18 20870K 18 20873K 18 20875K 18 20878K 18 20880K 18 20882K 18 20885K So instead of four days, this host would now stay up 30 times longer.
(In reply to Mark.Martinec from comment #5) This might be the result of zfs_dbgmsg(). If so, the memory usage will stop increasing once zfs_dbgmsg_size hits the 4MB limit. Could you run zpool -Hp for a while and see if the memory usage stops increasing?
https://reviews.freebsd.org/D17202
> This might be the result of zfs_dbgmsg(). If so, the memory usage > will stop increasing once zfs_dbgmsg_size hits the 4MB limit. > Could you run zpool -Hp for a while and see if the memory usage > stops increasing? Indeed this seems to be the case. The growth stopped when the MemUse on solaris reached about 5 MB: 18 20854K 18 20856K 18 20859K [...] 18 26371K 18 26373K 18 26376K 18 26378K 18 26381K 18 26383K 18 26385K 18 26388K 18 26390K -1 26390K 1 26390K 0 26390K 2 26390K 0 26390K 0 26390K 0 26390K 0 26390K 0 26390K
(In reply to Mark.Martinec from comment #8) Great, thank you!
A commit references this bug: Author: markj Date: Mon Sep 17 16:16:58 UTC 2018 New revision: 338724 URL: https://svnweb.freebsd.org/changeset/base/338724 Log: Fix an nvpair leak in vdev_geom_read_config(). Also change the behaviour slightly: instead of freeing "config" if the last nvlist doesn't pass the tests, return the last config that did pass those tests. This matches the comment at the beginning of the function. PR: 230704 Diagnosed by: avg Reviewed by: asomers, avg Tested by: Mark Martinec <Mark.Martinec@ijs.si> Approved by: re (gjb) MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17202 Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
A commit references this bug: Author: markj Date: Mon Sep 24 14:48:27 UTC 2018 New revision: 338904 URL: https://svnweb.freebsd.org/changeset/base/338904 Log: MFC r338724: Fix an nvpair leak in vdev_geom_read_config(). PR: 230704 Changes: _U stable/11/ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
A commit references this bug: Author: markj Date: Mon Sep 24 14:50:45 UTC 2018 New revision: 338905 URL: https://svnweb.freebsd.org/changeset/base/338905 Log: MFC r338724: Fix an nvpair leak in vdev_geom_read_config(). PR: 230704 Changes: _U stable/10/ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
Thanks for your help in tracking this down, Mark.