After upgrading to 11.2 from 10.4 processes get killed due to out of swap space. swap usage is 1%. Starting jails: pid 2535 (mysqld), uid 88, was killed: out of swap space pid 791 (ntpd), uid 0, was killed: out of swap space Oct 29 21:12:42 host kernel: pid 2535 (mysqld), uid 88, was killed: out of swap space Oct 29 21:12:42 host kernel: pid 791 (ntpd), uid 0, was killed: out of swap space sqljls: jail "sql" not found Oct 29 21:12:53 host jail: login_getclass: unknown class 'root' php www. this was not an issue with 10.4 ... also ntpd seems like a really odd choice to kill. it doesnt take much memory, and is running at -20 nice. the vm has enough swap. no change there from 10.4. it kills the processes at boot. after starting sql jail and ntpd manually swap usage is 10%.
[These notes are generated from activity with head. They in places presume that vm.pageout_oom_seq is available in 11.2 but that might not be the case. As I understand, the other aspects of the below notes apply.] Unfortunately some messages about "out of swap" can be misnomers, more tied to low free RAM after enough attempts to gain more free RAM (so after enough time). Real "out of swap" conditions tend to also have messages similar to: Aug 5 17:54:01 sentinel kernel: swap_pager_getswapspace(32): failed If you are not seeing such messages, then it is likely that the mount of swap space still free is not the actual thing driving the kills. Poor I/O performance for paging and/or swapping can contribute to the kills happening. But I've no clue if such is an issue for your context. The default vm.pageout_oom_seq=12 can be increased to increase how long a low free RAM condition is tolerated. (It increases how many attempts to free RAM are made first.) I assign vm.pageout_oom_seq in /etc/sysctl.conf --but that may not be the best for your context. vm.pageout_oom_seq=120 has proved useful. In some extreme situations (buildworld buildkernel in a low RAM, slow context, including long I/O latencies) vm.pageout_oom_seq=1024 or more has been used to avoid kills when there was plenty of swap space.
(In reply to Mark Millard from comment #1) Thank you for your feedback. I've added vm.pageout_oom_seq=120 to sysctl.conf. Both of the following messages have been logged; Oct 30 23:08:20 host kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1143, size: 16384 Oct 30 23:16:52 host kernel: swap_pager: I/O error - pageout failed; blkno 477,size 69632, error 12 swap is filebacked; [user@host /var/log]$ cat /etc/fstab # Device Mountpoint FStype Options Dump Pass# ... md99 none swap sw,late,file=/usr/swap0 0 0 the swap file is in /usr which is the same fs as / (it was provisioned this way by the cloud provider.) I can roll back the vm to a 10.4 snapshot. I'm confident the issue wouldnt be present. The 10.4 install has been solid, and this began /immediately/ after the upgrade. Reboot 1 [user@host /var/log]$ sudo less all.log ... Oct 30 23:05:58 host kernel: Starting jails: Oct 30 23:08:20 host kernel: Oct 30 23:08:20 host kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1143, size: 16384 Oct 30 23:08:20 host last message repeated 2 times Oct 30 23:08:20 host kernel: pid 2532 (mysqld), uid 88, was killed: out of swap space Oct 30 23:08:20 host kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1143, size: 16384 Oct 30 23:08:20 host last message repeated 2 times Oct 30 23:08:20 host kernel: pid 791 (ntpd), uid 0, was killed: out of swap space Oct 30 23:08:20 host kernel: Oct 30 23:08:20 host kernel: pid 2532 (mysqld), uid 88, was killed: out of swap space Oct 30 23:08:20 host kernel: Oct 30 23:08:20 host kernel: pid 791 (ntpd), uid 0, was killed: out of swap space ... [user@host /var/log]$ uname -a FreeBSD host.domain.com 11.2-RELEASE-p4 FreeBSD 11.2-RELEASE-p4 #0: Thu Sep 27 08:16:24 UTC 2018 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 [user@host /var/log]$ swapinfo Device 1K-blocks Used Avail Capacity /dev/md99 524288 5296 518992 1% [user@host /var/log]$ uptime 11:13PM up 8 mins, 2 users, load averages: 0.06, 0.11, 0.08 [user@host /var/log]$ sudo service ntpd start Starting ntpd. [user@host /var/log]$ sudo jail -c sql sql: created [user@host /var/log]$ swapinfo Device 1K-blocks Used Avail Capacity /dev/md99 524288 59912 464376 11% [user@host /var/log]$ jls JID IP Address Hostname Path 2 127.0.0.253 php.domain.com /usr/local/jails/php 3 127.0.0.254 www.domain.com /usr/local/jails/www 4 127.0.0.252 sql.domain.com /usr/local/jails/sql Reboot 2 [user@host /var/log]$ sudo less all.log Oct 30 23:16:50 host kernel: Starting jails: Oct 30 23:16:52 host kernel: Oct 30 23:16:52 host kernel: swap_pager: I/O error - pageout failed; blkno 445,size 131072, error 12 Oct 30 23:16:52 host kernel: swap_pager: I/O error - pageout failed; blkno 477,size 69632, error 12 Oct 30 23:16:52 host kernel: swap_pager: I/O error - pageout failed; blkno 494,size 8192, error 12 ... Oct 30 23:16:52 host kernel: swap_pager: I/O error - pageout failed; blkno 946,size 69632, error 12 Oct 30 23:16:52 host kernel: swap_pager: I/O error - pageout failed; blkno 963,size 36864, error 12 Oct 30 23:16:52 host kernel: swap_pager: I/O error - pageout failed; blkno 972,size 12288, error 12 Oct 30 23:18:12 host kernel: pid 2532 (mysqld), uid 88, was killed: out of swap space Oct 30 23:18:12 host kernel: Oct 30 23:18:12 host kernel: pid 2532 (mysqld), uid 88, was killed: out of swap space [user@host ~]$ uptime 11:18PM up 2 mins, 1 user, load averages: 0.23, 0.25, 0.11 [user@host ~]$ jls JID IP Address Hostname Path 2 127.0.0.253 php.domain.com /usr/local/jails/php 3 127.0.0.254 www.domain.com /usr/local/jails/www [user@host ~]$ swapinfo Device 1K-blocks Used Avail Capacity /dev/md99 524288 5752 518536 1% [user@host ~]$ sudo jail -c sql sql: created [user@host ~]$ swapinfo Device 1K-blocks Used Avail Capacity /dev/md99 524288 54272 470016 10%
(In reply to teksimian from comment #2) See bugzilla's https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206048 for why "swap is filebacked" should be avoided if one wants to avoid deadlocks and such. In particular, see its comments 7 and 8. I'd use a partition as the area for paging/swapping. Also the messages that are are like: Oct 30 23:16:52 host kernel: swap_pager: I/O error - pageout failed; blkno 477,size 69632, error 12 suggests an unreliable page/swap media. And, quoting Trev's reply (and the original question) from a list exchange: QUOTE What does the error swap_pager: indefinite wait buffer: mean? This means that a process is trying to page memory to disk, and the page attempt has hung trying to access the disk for more than 20 seconds. It might be caused by bad blocks on the disk drive, disk wiring, cables, or any other disk I/O-related hardware. If the drive itself is bad, disk errors will appear in /var/log/messages and in the output of dmesg. Otherwise, check the cables and connections. ENDQUOTE It is possible for a some systems to queue up more than the I/O system can process in 20 seconds, even when the I/O is working well (but is relatively slow compared to the work load).