Bug 231457 - Out of swap space on ZFS
Summary: Out of swap space on ZFS
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-18 16:08 UTC by dimka
Modified: 2018-12-13 21:40 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dimka 2018-09-18 16:08:07 UTC
After occupying all RAM and several hundred megabytes of swap space, the kernel kills large processes with the messages:

Sep 14 03:04:30 hosting kernel: pid 2078 (mysqld), uid 88, was killed: out of swap space
Sep 14 03:06:26 hosting kernel: pid 7068 (mysqld), uid 88, was killed: out of swap space
Sep 14 03:06:32 hosting kernel: pid 2085 (clamd), uid 106, was killed: out of swap space

Tested on 3 real and 1 virtual machine with 1/2/4GB RAM and 8GB swap volume, on 11.2-RELEASE/amd64.
I NOT check this on 11.0-RELEASE and 11.1-RELEASE.

It's like another bug
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199189
but I never reached this bug on 10.*/amd64, and advice for volume/sysctl tunings from his discussion, did not help for me on 11.2/amd64.

Detailed installation procedure

# Boot from install DVD, select "Shell" in "Partitioning" menu.
#
# gnop create -S 4096 /dev/ada0
# zpool create -O mountpoint=none zroot /dev/ada0.nop
#
# zfs create -V 8GB -o org.freebsd:swap=on zroot/swap
# zfs create -o quota=2GB -o mountpoint=/mnt zroot/root
# zfs create -o quota=15GB -o mountpoint=/mnt/tmp zroot/tmp
# zfs create -o quota=30GB -o mountpoint=/mnt/var zroot/var
# zfs create -o quota=30GB -o mountpoint=/mnt/usr zroot/usr
# zfs create -o quota=15GB -o mountpoint=/mnt/home zroot/home
#
# zpool export zroot
# gnop destroy /dev/ada0.nop
# dd if=/boot/zfsboot of=/dev/ada0 bs=512 count=1
# dd if=/boot/zfsboot of=/dev/ada0 bs=512 skip=1 seek=1024
# zpool import zroot
#
# exit
#
# Post-install, select "Live CD" mode.
#
# echo zfs_enable=\"YES\" >> /nmt/etc/rc.conf
# zfs umount -a
# zfs set mountpoint=legacy zroot/root
# zfs set mountpoint=/tmp zroot/tmp
# zfs set mountpoint=/var zroot/var
# zfs set mountpoint=/usr zroot/usr
# zfs set mountpoint=/home zroot/home
# zpool set bootfs=zroot/root zroot
#
# exit
Comment 1 Mike 2018-09-24 17:44:14 UTC
Experiencing same issue.

uname -a
FreeBSD sword 11.2-RELEASE-p3 FreeBSD 11.2-RELEASE-p3 #0: Thu Sep  6 07:14:16 UTC 2018     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

swapinfo -h
Device          1K-blocks     Used    Avail Capacity
/dev/da0p2       12582912     4.2M      12G     0%
/dev/da1p2       12582912     4.1M      12G     0%
Total            25165824     8.3M      24G     0%

sysctl hw | egrep 'hw.(phys|user|real)'
hw.physmem: 34297905152
hw.usermem: 1263333376
hw.realmem: 34359738368

tail /var/log/messages
Sep 24 12:30:29 sword kernel: pid 12993 (getty), uid 0, was killed: out of swap space
Sep 24 12:30:43 sword kernel: pid 12994 (getty), uid 0, was killed: out of swap space
Sep 24 12:30:58 sword kernel: pid 12995 (getty), uid 0, was killed: out of swap space
Sep 24 12:31:14 sword kernel: pid 12996 (getty), uid 0, was killed: out of swap space
Sep 24 12:31:28 sword kernel: pid 12997 (getty), uid 0, was killed: out of swap space
Sep 24 12:31:42 sword kernel: pid 12998 (getty), uid 0, was killed: out of swap space
Sep 24 12:31:57 sword kernel: pid 12999 (getty), uid 0, was killed: out of swap space
Sep 24 12:32:12 sword kernel: pid 13000 (getty), uid 0, was killed: out of swap space
Sep 24 12:32:27 sword kernel: pid 13001 (getty), uid 0, was killed: out of swap space
Sep 24 12:32:42 sword kernel: pid 13002 (getty), uid 0, was killed: out of swap spac

cat /boot/loader.conf

accf_data_load="YES"
accf_http_load="YES"
autoboot_delay=3
cc_htcp_load="YES"
hw.igb.rx_abs_int_delay=1024
hw.igb.rx_int_delay=512
hw.igb.rxd=4096
hw.igb.tx_abs_int_delay=1024
hw.igb.tx_int_delay=512
hw.igb.txd=4096
hw.intr_storm_threshold=9000
if_bridge_load="YES"
if_tap_load="YES"
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
kern.ipc.nmbclusters=262144
kern.ipc.nmbjumbo16=32768
kern.ipc.nmbjumbo9=65536
kern.ipc.nmbjumbop=262144
kern.ipc.semaem=32767
kern.ipc.semmni=32767
kern.ipc.semmns=8192
kern.ipc.semmnu=4096
kern.ipc.semmsl=120
kern.ipc.semopm=200
kern.ipc.semume=80
kern.ipc.semusz=184
kern.ipc.semvmx=65534
kern.maxusers=1024
mlx4en_load="YES"
net.fibs=2
net.inet.tcp.hostcache.cachelimit="0"
net.inet.tcp.tcbhashsize=65536
net.inet.tcp.tso=0
net.isr.bindthreads=0
nmdm_load="YES"
vfs.zfs.arc_max="36G"
vfs.zfs.txg.timeout="5"
vfs.zfs.write_limit_override="536870912"
vfs.zfs.write_limit_override="536870912"
vmm_load="YES"
zfs_load="YES"


One possible culprit is vfs.zfs.arc_max is set to size of phys memory. I am adjusting that to half memory and rebooting.
Comment 2 Mike 2018-09-25 16:49:14 UTC
Update: since adjusting vfs.zfs.arc_max to half RAM (rather than all of it which was an oversight) and rebooting, problem has not manifested.
Comment 3 dimka 2018-09-26 04:45:11 UTC
vfs.zfs.arc_max = 0.6 * RAM
by default, at least on 1/2/4G RAM.
Tune to 0.5 * RAM in /boot/loader.conf (and reboot) has no effect in my case.

Remember that you need to occupying all physical memory, and almost the entire swap space, to reproduce this problem.
Comment 4 Shane 2018-09-28 06:16:31 UTC
Actually, you don't have to use all your swap to get out of swap errors, they also happen when too much ram is wired, this prevents any ram being swapped in/out.

I started getting out of swap errors on 10.1 with 8G ram.

Look at "vm.stats.vm.v_wire_count * hw.page_size" at the time of getting out of swap errors, this is the wired amount shown in top.

As I mentioned in bug #229764 max_wired is 30% so arc should be less than 70% ram. Another thing that often wires ram is bhyve, so any guest ram should also be considered when setting arc_max.
Comment 5 dimka 2018-10-01 12:05:14 UTC
I tried this, and seen that the system is coming out of stupor state, immediatly after the killings of the processes. Without this tuning, the system remained very slow for a few seconds.
However, even a halving vfs.zfs.arc_max and vm.max_wired, did not solve the problem in my case.
Also, if i suspend memory absorption on
(vm.stats.vm.v_laundry_count * vm.stats.vm.v_page_size) > 10M
the system does not kill processes, and can not purge laundry pages to swap.
Probably there is some kind of deadlock or other similar problem.

CPU:  0.0% user,  0.0% nice, 11.3% system,  0.0% interrupt, 88.7% idle
Mem: 566M Active, 147M Inact, 130M Laundry, 94M Wired, 22M Free
ARC: 29M Total, 1048K MFU, 25M MRU, 32K Anon, 1152K Header, 1387K Other
     12M Compressed, 19M Uncompressed, 1.56:1 Ratio
Swap: 7678M Total, 30M Used, 7648M Free
Comment 6 Mark Millard 2018-10-01 19:13:17 UTC
As I understand any conditions that lead to sustained low
free RAM via pressure from one or more processes that keep
active RAM usage high leads to killing of processes to free
memory. The default vm.pageout_oom_seq=12 can be increased
to increase how long the low free RAM condition is tolerated.
(It increases how many attempts to free RAM are made first.)
I assign vm.pageout_oom_seq in /etc/sysctl.conf .

FreeBSD does not swap out processes that stay active. This
is documented in the book published by McKusic, Neville-Neil,
and Watson (2nd edition, last names listed). So if one or more
keep active RAM use high, free RAM use tends to stay low.

There can be lots of swap available and the process
killing can still happen. The console log messages
produced for this case is very misleading via referencing
out of swap instead of referencing a sustained period of
low free RAM.

Real "out of swap" conditions tend to also have messages
of the form:

Aug  5 17:54:01 sentinel kernel: swap_pager_getswapspace(32): failed

On small board computers such as ARM boards I've been
using vm.pageout_oom_seq=120 and one person with storage
devices with I/O latency problems used something like
vm.pageout_oom_seq=1024 to allow -j4 buildworld buildkernel
to work. (No attempt at approximating the smallest value
that would have worked.) There was a long 2018 Jun. through
Sep. freebsd-arm list exchange under various subjects that
eventually exposed this vm.pageout_oom_seq control and
FreeBSD's swapping criteria that I noted above.

This does not address why Free RAM is low over a sustained
period, it just makes the system more tolerant of such. It
could be that there are also other mechanisms that do not
involve vm.pageout_oom_seq .
Comment 7 Gordon Hartley 2018-12-07 10:16:52 UTC
Just wanted to report in that I also triggered this issue when doing a zfs scrub on an 11.2 system with system defaults (albeit updated from earlier releases via freebsd-update) with (intentionally) no dedicated swap filesystem. Never seem to have had problems in the past.

Stopped the scrub via "zpool scrub -s" and the problem stopped occurring.

Added just in case it helps someone diagnose the underlying cause(s).
Comment 8 Gordon Hartley 2018-12-07 12:23:08 UTC
In addition to the above - it's not just during scrub's, although that seems to exacerbate the behaviour - not sure what is going on. Going to reinstall OS with dedicated swap as workaround.
Comment 9 Max Kostikov 2018-12-13 21:40:30 UTC
I can confirm this bug with 11.2-p5 system on bare metal Xeon server with 8Gb RAM / 4Gb swap with ZFS.
It looks like a peak when within less 1 minute with thousand of log messages

root@beta:/home/xm # zcat /var/log/messages.*.bz2 | grep "Dec 13 21:31" | grep "swap_pager" | wc -l
    6285

I never saw such behaviour under FreeBSD for last years.