Summary: | Out of swap space on ZFS | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | dimka | ||||
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> | ||||
Status: | Open --- | ||||||
Severity: | Affects Some People | CC: | FreeBSD, alaa.alassafin, che, gro.dsbeerf.sgub, ish, lwhsu, mail, marklmi26-fbsd, max, mikeowens, ota, parashiva, sigsys | ||||
Priority: | --- | ||||||
Version: | 11.2-RELEASE | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
dimka
2018-09-18 16:08:07 UTC
Experiencing same issue. uname -a FreeBSD sword 11.2-RELEASE-p3 FreeBSD 11.2-RELEASE-p3 #0: Thu Sep 6 07:14:16 UTC 2018 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 swapinfo -h Device 1K-blocks Used Avail Capacity /dev/da0p2 12582912 4.2M 12G 0% /dev/da1p2 12582912 4.1M 12G 0% Total 25165824 8.3M 24G 0% sysctl hw | egrep 'hw.(phys|user|real)' hw.physmem: 34297905152 hw.usermem: 1263333376 hw.realmem: 34359738368 tail /var/log/messages Sep 24 12:30:29 sword kernel: pid 12993 (getty), uid 0, was killed: out of swap space Sep 24 12:30:43 sword kernel: pid 12994 (getty), uid 0, was killed: out of swap space Sep 24 12:30:58 sword kernel: pid 12995 (getty), uid 0, was killed: out of swap space Sep 24 12:31:14 sword kernel: pid 12996 (getty), uid 0, was killed: out of swap space Sep 24 12:31:28 sword kernel: pid 12997 (getty), uid 0, was killed: out of swap space Sep 24 12:31:42 sword kernel: pid 12998 (getty), uid 0, was killed: out of swap space Sep 24 12:31:57 sword kernel: pid 12999 (getty), uid 0, was killed: out of swap space Sep 24 12:32:12 sword kernel: pid 13000 (getty), uid 0, was killed: out of swap space Sep 24 12:32:27 sword kernel: pid 13001 (getty), uid 0, was killed: out of swap space Sep 24 12:32:42 sword kernel: pid 13002 (getty), uid 0, was killed: out of swap spac cat /boot/loader.conf accf_data_load="YES" accf_http_load="YES" autoboot_delay=3 cc_htcp_load="YES" hw.igb.rx_abs_int_delay=1024 hw.igb.rx_int_delay=512 hw.igb.rxd=4096 hw.igb.tx_abs_int_delay=1024 hw.igb.tx_int_delay=512 hw.igb.txd=4096 hw.intr_storm_threshold=9000 if_bridge_load="YES" if_tap_load="YES" kern.geom.label.disk_ident.enable="0" kern.geom.label.gptid.enable="0" kern.ipc.nmbclusters=262144 kern.ipc.nmbjumbo16=32768 kern.ipc.nmbjumbo9=65536 kern.ipc.nmbjumbop=262144 kern.ipc.semaem=32767 kern.ipc.semmni=32767 kern.ipc.semmns=8192 kern.ipc.semmnu=4096 kern.ipc.semmsl=120 kern.ipc.semopm=200 kern.ipc.semume=80 kern.ipc.semusz=184 kern.ipc.semvmx=65534 kern.maxusers=1024 mlx4en_load="YES" net.fibs=2 net.inet.tcp.hostcache.cachelimit="0" net.inet.tcp.tcbhashsize=65536 net.inet.tcp.tso=0 net.isr.bindthreads=0 nmdm_load="YES" vfs.zfs.arc_max="36G" vfs.zfs.txg.timeout="5" vfs.zfs.write_limit_override="536870912" vfs.zfs.write_limit_override="536870912" vmm_load="YES" zfs_load="YES" One possible culprit is vfs.zfs.arc_max is set to size of phys memory. I am adjusting that to half memory and rebooting. Update: since adjusting vfs.zfs.arc_max to half RAM (rather than all of it which was an oversight) and rebooting, problem has not manifested. vfs.zfs.arc_max = 0.6 * RAM by default, at least on 1/2/4G RAM. Tune to 0.5 * RAM in /boot/loader.conf (and reboot) has no effect in my case. Remember that you need to occupying all physical memory, and almost the entire swap space, to reproduce this problem. Actually, you don't have to use all your swap to get out of swap errors, they also happen when too much ram is wired, this prevents any ram being swapped in/out. I started getting out of swap errors on 10.1 with 8G ram. Look at "vm.stats.vm.v_wire_count * hw.page_size" at the time of getting out of swap errors, this is the wired amount shown in top. As I mentioned in bug #229764 max_wired is 30% so arc should be less than 70% ram. Another thing that often wires ram is bhyve, so any guest ram should also be considered when setting arc_max. I tried this, and seen that the system is coming out of stupor state, immediatly after the killings of the processes. Without this tuning, the system remained very slow for a few seconds. However, even a halving vfs.zfs.arc_max and vm.max_wired, did not solve the problem in my case. Also, if i suspend memory absorption on (vm.stats.vm.v_laundry_count * vm.stats.vm.v_page_size) > 10M the system does not kill processes, and can not purge laundry pages to swap. Probably there is some kind of deadlock or other similar problem. CPU: 0.0% user, 0.0% nice, 11.3% system, 0.0% interrupt, 88.7% idle Mem: 566M Active, 147M Inact, 130M Laundry, 94M Wired, 22M Free ARC: 29M Total, 1048K MFU, 25M MRU, 32K Anon, 1152K Header, 1387K Other 12M Compressed, 19M Uncompressed, 1.56:1 Ratio Swap: 7678M Total, 30M Used, 7648M Free As I understand any conditions that lead to sustained low free RAM via pressure from one or more processes that keep active RAM usage high leads to killing of processes to free memory. The default vm.pageout_oom_seq=12 can be increased to increase how long the low free RAM condition is tolerated. (It increases how many attempts to free RAM are made first.) I assign vm.pageout_oom_seq in /etc/sysctl.conf . FreeBSD does not swap out processes that stay active. This is documented in the book published by McKusic, Neville-Neil, and Watson (2nd edition, last names listed). So if one or more keep active RAM use high, free RAM use tends to stay low. There can be lots of swap available and the process killing can still happen. The console log messages produced for this case is very misleading via referencing out of swap instead of referencing a sustained period of low free RAM. Real "out of swap" conditions tend to also have messages of the form: Aug 5 17:54:01 sentinel kernel: swap_pager_getswapspace(32): failed On small board computers such as ARM boards I've been using vm.pageout_oom_seq=120 and one person with storage devices with I/O latency problems used something like vm.pageout_oom_seq=1024 to allow -j4 buildworld buildkernel to work. (No attempt at approximating the smallest value that would have worked.) There was a long 2018 Jun. through Sep. freebsd-arm list exchange under various subjects that eventually exposed this vm.pageout_oom_seq control and FreeBSD's swapping criteria that I noted above. This does not address why Free RAM is low over a sustained period, it just makes the system more tolerant of such. It could be that there are also other mechanisms that do not involve vm.pageout_oom_seq . Just wanted to report in that I also triggered this issue when doing a zfs scrub on an 11.2 system with system defaults (albeit updated from earlier releases via freebsd-update) with (intentionally) no dedicated swap filesystem. Never seem to have had problems in the past. Stopped the scrub via "zpool scrub -s" and the problem stopped occurring. Added just in case it helps someone diagnose the underlying cause(s). In addition to the above - it's not just during scrub's, although that seems to exacerbate the behaviour - not sure what is going on. Going to reinstall OS with dedicated swap as workaround. I can confirm this bug with 11.2-p5 system on bare metal Xeon server with 8Gb RAM / 4Gb swap with ZFS. It looks like a peak when within less 1 minute with thousand of log messages root@beta:/home/xm # zcat /var/log/messages.*.bz2 | grep "Dec 13 21:31" | grep "swap_pager" | wc -l 6285 I never saw such behaviour under FreeBSD for last years. I experimented with a stable/11 branch on 2GB RAM and 8GB swap space with r320475 "world" used, looking for the moment of kernel problems begins. Swap on ZFS is unstable from revision r321453 (2017-07-25): /usr/src/sys/kern/subr_blist.c /usr/src/sys/sys/blist.h And fully broken from r321554 (2017-07-26): /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h It happened between 11.1 and 11.2 release forks. Also, swap on ZFS fully broken in the current revision of stable/11 r342915 (kernel and world used). Created attachment 201015 [details]
Memory occupation test tool
I check and made sure that 12.0-RELEASE/amd64 is also affected by this problem, test tool and sendmail processes was killed by OOM after physical memory is over and several megabytes of swap were used. Short hardware details: CPU: Intel(R) Celeron(R) CPU 847 @ 1.10GHz (1097.53-MHz K8-class CPU) real memory = 2147483648 (2048 MB) avail memory = 1987403776 (1895 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) da0: <SanDisk Ultra USB 3.0 1.00> Removable Direct Access SPC-4 SCSI device da0: 14663MB (30031250 512 byte sectors) You can a very simple test on your hardware or VM: 1. Install system from DVD to cleaned drive in "Auto (ZFS)" mode. 2. Comment GPT swap entry (created by installer) in /etc/fstab. 3. Create swap space on ZFS pool: # zfs create -V 8GB -o org.freebsd:swap=on zroot/swap 4. Reboot. 5. Compile and use "Memory occupation test tool" from bug report attachments: # ./memphage 9500 Argument is memory occupation limit (mem free + swap free - 500) in megabytes. I'm having the same behavior with freeBSD 11.2 p8. Although swap partition is not on zfs, we do however have a zfs pool on this machine as described here : https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235125 I tried memphage total memory is 128GB Free memory at the time I started the memphage was 110 GB Swap 8GB ./memphage 115000 took around 3 minutes with no issues.swap got 3.2 GB filled. ./memphage 120000 took 5 minutes and got killed at 118672Mb swap was completely filled and got multiple : Jan 23 14:56:27 san2 kernel: swap_pager_getswapspace(2): failed and then memphage got killed. Jan 23 14:56:48 san2 kernel: pid 6945 (memphage), uid 0, was killed: out of swap space Jan 23 14:56:48 san2 kernel: Jan 23 14:56:48 san2 kernel: pid 6945 (memphage), uid 0, was killed: out of swap space -- No other processes were killed this time. only memphage The other day I received OOM in the current revision of 11.1, it happened at night, apparently during "periodic" execution. I had to roll back this system to stable/11 r321452, and there are no problems in 5 days. Hi, We are seeing similar behaviour on one of our zfs-nfs servers as well. Jan 31 10:41:13 volume1 kernel: pid 17505 (collectd), uid 0, was killed: out of swap space Jan 31 10:41:13 volume1 kernel: pid 51659 (ntpd), uid 0, was killed: out of swap space Jan 31 10:42:54 volume1 kernel: pid 73673 (devd), uid 0, was killed: out of swap space Jan 31 10:43:11 volume1 kernel: pid 31167 (mountd), uid 0, was killed: out of swap space Jan 31 10:44:12 volume1 kernel: pid 50359 (nfsd), uid 0, was killed: out of swap space Jan 31 10:44:36 volume1 kernel: pid 81152 (zsh), uid 0, was killed: out of swap space Jan 31 10:44:54 volume1 kernel: pid 49005 (zsh), uid 4002, was killed: out of swap space Jan 31 10:46:13 volume1 kernel: pid 95263 (nrpe3), uid 181, was killed: out of swap space Jan 31 10:46:36 volume1 kernel: pid 48518 (sshd), uid 4002, was killed: out of swap space Jan 31 10:46:55 volume1 kernel: pid 92367 (rpcbind), uid 0, was killed: out of swap space Jan 31 10:47:11 volume1 kernel: pid 56206 (nfsd), uid 0, was killed: out of swap space Jan 31 10:47:23 volume1 kernel: pid 68827 (dhclient), uid 65, was killed: out of swap space Jan 31 10:47:38 volume1 kernel: pid 87548 (getty), uid 0, was killed: out of swap space Jan 31 10:47:50 volume1 kernel: pid 24945 (getty), uid 0, was killed: out of swap space Jan 31 10:49:14 volume1 kernel: pid 29466 (getty), uid 0, was killed: out of swap space Jan 31 10:49:37 volume1 kernel: pid 77339 (getty), uid 0, was killed: out of swap space Jan 31 10:49:51 volume1 kernel: pid 78317 (getty), uid 0, was killed: out of swap space Jan 31 10:50:13 volume1 kernel: pid 81831 (getty), uid 0, was killed: out of swap space Jan 31 10:50:37 volume1 kernel: pid 89762 (getty), uid 0, was killed: out of swap space Jan 31 10:50:51 volume1 kernel: pid 92067 (getty), uid 0, was killed: out of swap space Jan 31 10:51:49 volume1 kernel: pid 97499 (getty), uid 0, was killed: out of swap space Jan 31 10:52:14 volume1 kernel: pid 96091 (getty), uid 0, was killed: out of swap space Jan 31 10:52:37 volume1 kernel: pid 98907 (getty), uid 0, was killed: out of swap space Jan 31 10:52:51 volume1 kernel: pid 99595 (getty), uid 0, was killed: out of swap space Jan 31 10:55:47 volume1 kernel: pid 60068 (zsh), uid 0, was killed: out of swap space Feb 7 09:57:40 volume1 collectd[25157]: plugin_read_thread: read-function of the `swap' plugin took 19.765 seconds, which is above its read interval (10.000 seconds). You might want to adjust the `Interval' or `ReadThreads' settings. Feb 7 09:59:48 volume1 kernel: pid 25157 (collectd), uid 0, was killed: out of swap space Feb 7 09:59:48 volume1 kernel: pid 94240 (atop), uid 0, was killed: out of swap space Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 327109, size: 16384 Feb 7 09:59:48 volume1 kernel: pid 51515 (ntpd), uid 0, was killed: out of swap space Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 326787, size: 4096 Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 102263, size: 4096 Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 327152, size: 4096 Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 100915, size: 8192 Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 326754, size: 8192 Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 8471, size: 4096 Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 106028, size: 12288 Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 8229, size: 8192 Feb 7 09:59:48 volume1 kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 103890, size: 8192 Feb 7 10:03:11 volume1 kernel: swap_pager_getswapspace(32): failed Feb 7 10:06:00 volume1 kernel: swap_pager_getswapspace(32): failed root@volume1:~ # grep arc /boot/loader.conf vfs.zfs.arc_min="10024M" vfs.zfs.arc_max="13084M" root@volume1:~ # sysctl -a | grep phys kern.ipc.shm_use_phys: 0 vm.phys_segs: vm.phys_free: vm.phys_pager_cluster: 1024 hw.physmem: 17139478528 root@volume1:~ # sysctl vm.pageout_oom_seq vm.pageout_oom_seq: 120 root@volume1:~ # root@volume1:~ # swapinfo Device 1K-blocks Used Avail Capacity /dev/gpt/swap 8388608 26080 8362528 0% root@volume1:~ # freebsd-version -uk 11.2-RELEASE-p8 11.2-RELEASE-p8 root@volume1:~ # We actually do have reason to assume the VM's storage backend might be periodically affected by an extremely slow storage provider (its running as a VM on Openstack), as indicated by the "swap_pager: indefinite wait buffer: bufobj". It's kind of worrisome that important processes (nfsd for instance) are shot down by the OOM with the default value of vm.pageout_oom_seq (if the default setting of that sysctl turns out to cause the OOM killer). We've just changed the vm.pageout_oom_seq from its default of 12 to 120 and are monitoring the impact of that change. Ruben(In reply to Billg from comment #13) Hello, My server also with same error "mysqld killed out of swap space" when I import an 70G+ mysql dump (mysqldump -u root -p database -r dump.sql). The server's hardware is 256G SSD*2, 32G RAM, CPU e3-1245 v3, with latest freebsd 12, zfs mirror, atime=off,primarycache=all,secondarycache=none. I tried below solutions: 1. trick learn from stackoverflow.com set global net_buffer_length=1048576; set global max_allowed_packet=1073741824; SET foreign_key_checks = 0; not working 2. disable swap not working 3. then I think it related to RAM or SWAP, should I disabled ARC? zfs set primarycache=none tank working!!! So I have my database working now. Hope my experience could help someone. Thank you, Best Regards. (In reply to Parashiva from comment #16) My zrc limit is: vfs.zfs.arc_max="4G" vfs.zfs.arc_min="2G" I'm using zfs on vps with very small rams (512mb). It was good operating on 10.3R, 11.0R, 11.1R, 11.2R and 12.0R. Recently, I upgraded to 12.1R and many processes were killed because of 'out of swap space' but no 'swap_pager_getswapspace failed'. So, I set 'sysctl vm.pageout_oom_seq=1024' on /boot/loader.conf and this reduced killed processes. Now, I'm watching the situation by increasing 1024 to 10240. This submittal and its comments are so old that the kernel messages for "was killed" have been made more specific (in 2 of the 3 types of contexts that lead to such kills). The usually inaccurate "out of swap space" text is not the typical text reported any more. Changing this submiittal from New to Open at this point is just misleading/confusing vs. the modern details (13.1+, say). |