Bug 250226

Summary: graphics/drm-fbsd12.0-kmod: massive memory leak
Product: Ports & Packages Reporter: Dirk Meyer <dinoex>
Component: Individual Port(s)Assignee: freebsd-x11 (Nobody) <x11>
Status: New ---    
Severity: Affects Some People CC: bennett, gljennjohn, swills
Priority: --- Flags: bugzilla: maintainer-feedback? (x11)
Version: Latest   
Hardware: amd64   
OS: Any   

Description Dirk Meyer freebsd_committer 2020-10-09 14:36:27 UTC
drm-fbsd12.0-kmod is 4.16.g20200221.

My desktop computer has 16 GiB RAM, but it seems that drm-fbsd12.0-kmod keeps leaking memory, using up all memory in one day and forcing me to reboot.

This computer uses Intel i5-6200U CPU and its integrated Intel Skylake graphics.
The operating system is FreeBSD 12.1.

The version of drm-fbsd12.0-kmod is 4.16.g20200221.

When the system was just booted, wired memory was below 1 GiB.

top:
Mem: 3049M Active, 393M Inact, 174M Laundry, 981M Wired, 497M Buf, 11G Free
Swap: 16G Total, 16G Free

After the system had run for a few hours, wired memory became 14 GiB.
The desktop was still responsive, but the usage of swap was increasing.

After the system had run longer, wired memory become 16 GiB.
The desktop was non-responsive, even shell commands are frozen for 40 secs.

"vmstat -z" keeps stable:
1,392,114,304 TOTAL 1,333,297,844 58,816,460

See also:
https://github.com/FreeBSDDesktop/kms-drm/issues/247
Comment 1 Scott Bennett 2020-10-12 06:42:15 UTC
     Your description matches that of a collection of bugs present in 11.2 to
11.4 and 12.x that others began complaining of on other lists within days of the
release of 11.2.  I have complained on the x11 and stable lists about them a few
times this year because they are still a problem and because the FreeBSD
developers have not addressed them.  Over time others have found a number of
sysctl workarounds, and I have found a few others and have accumulated several
that, in combination, allow me to keep my system usable for weeks at a time,
rather than a day to three or four days.  They do not fix the bugs, however.
     One glaring bug is that vm.max_wired is now ignored by the kernel.  With
vm.max_wired=786432 I have seen the amount of real memory tied up in pagefixing
(a.k.a. "wiring" in Berkeley dialect) exceed 6700 MB on an 8 GB machine.  My view is that
vm.max_wired should either be honored or removed from the source code tree.
     It is worth noting that ZFS ARC does not appear to be to blame.  It rarely
exceeds the quasi-limit of vfs.zfs.arc_max by more than ~200 MB.
     The following sysctl variables, when set to very increased values from their
default values, seem to help keep a system able to do work or to be recoverable
to such condition without the necessity of a reboot:  vm.v_free_min,
vm.pageout_wakeup_thresh, vm.pageout_oom_seq.  Reducing the value of
vfs.zfs.arc_max may also help, depending upon your system's configuration.
Also, be aware that maintaining a large ccache directory tree to use with "make
buildworld", "make buildkernel", or "portmaster -a" will save you a great deal
of time, but will also hasten the day when a reboot will be necessary.  My
advice for those cases is to keep CCACHE_DIR and, for buildworld and
buildkernel, /usr/ports or other DESTDIR, in a file system that can be easily
unmounted and, perhaps, remounted in order to free up its associated buffer
cache memory (most of which should have been pagefreed immediately upon
completion of an I/O operation long since anyway).  There is not enough
pagefreeing being done by the kernel anymore that used to be done at appropriate
times or so it appears.  Also, if you use ZFS, it will help to reduce the limit
on ZFS ARC size by setting vfs.zfs.arc_max.  Note, however, that that is merely
a crutch to give you more operational time before you have to intervene
manually.
     FWIW, my speculation is that this mess of bugs of the kind that should have
vanished from FreeBSD by release 1.x was introduced into 13-CURRENT and later
backported into 11.2 and 12.x.  (I am aware that it affects 12.1, but I do not
know whether 12.0 was affected.  I currently am running 11.4-STABLE at r364474.
I do not think these bugs have much, if anything, to do with the graphics stack,
but rather appear to be VM subsystem bugs.  My system still suffers from them,
even though there is no safe-to-use graphics support for a Radeon HD 5770 card,
and therefore my system is not running X11. :-(
Comment 2 Gary Jennejohn 2020-10-12 08:29:13 UTC
(In reply to Scott Bennett from comment #1)
I use HEAD and vm_max_wired no longer exists there.  However, there is vm.max_user_wired which has this description "vm.max_user_wired: system-wide limit to user-wired page count".  If this also exists in your version it might help to set it.
Comment 3 Scott Bennett 2020-10-12 11:13:26 UTC
     Given that the only user who can pagefix is root, I am not sure how the new
vm.max_user_wired will be useful.  Further, if it really does what the
description says, it will not solve the problems that vm.max_wired could have
helped with and possible solve if it were properly supported in 11 and 12, which
appear to be caused by the kernel, not a user program.  And no, there is only
vm.max_wired in 11.  I can't say for 12 at the moment because the laptop that
has 12.1 on it is shut off.  Next time I use it I can look to see whether
vm.max_user_wired is present.
     If a sysctl to limit pagefixing on a per-process basis were added, but not
replacing vm.max_wired, that could be useful in a case of root running something
that might pagefix too much memory concurrently.  It's a shame that vm.max_wired
is being dumped instead of repaired.
     In any case, this PR probably ought to be reassigned to the proper team for
the VM system.