Bug 224975

Summary: shutdown(8) needs to wait longer for swapoff to avoid a “Cannot allocate memory” error
Product: Base System Reporter: Wolfram Schneider <wosch>
Component: binAssignee: freebsd-bugs mailing list <bugs>
Status: New ---    
Severity: Affects Only Me CC: fbsdbugs4, jilles
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Bug Depends on: 224479    
Bug Blocks: 187081    

Description Wolfram Schneider freebsd_committer 2018-01-07 18:27:32 UTC
While analysing the bug #224479 I noticed that `shutdown -r now’ runs to fast and failed to swapoff a swap device or swap file.

I see on the console the error message:

  swapoff: /dev/md99: Cannot allocate memory

and soon later a kernel panic. Not good.

This happens when more swap space is in use than free memory is available. E.g. you have 2.5GB swap space, 69MB are in use and only 49MB Free memory is available (according to the top(1) command).

How to repeat:

# start some processes, which need a little bit more RAM than available
for i in $(seq 1 20);do perl -e '$a=`man tcsh`; for(0..100) { $b.=$a}; sleep 100' & done

top(1) reports:

Mem: 611M Active, 51M Inact, 112M Laundry, 142M Wired, 103M Buf, 49M Free
Swap: 2500M Total, 69M Used, 2431M Free, 2% Inuse


# now reboot with shutdown
$ shutdown -r now


you will see the “swapoff: /dev/md99: Cannot allocate memory” error message because 49M Free Mem is < than 69M used swap.

followed by a kernel swap_pager I/O error message

In case of low memory I think that shutdown/reboot needs to wait a little bit (3..10 seconds) after we kill the processes. Then there will be enough free memory available, and the swapoff call will run successfully.
Comment 1 Jilles Tjoelker freebsd_committer 2018-01-07 21:19:50 UTC
Just "waiting for a few seconds" will not help. The order of operations would have to be adjusted. The current order is (incomplete):

 * shutdown(8) prints final warning message
 * shutdown(8) signals init(8)
 * init(8) sends SIGHUP to all /etc/ttys session leaders and revokes the terminals
 * init(8) starts rc.shutdown
 * rc.shutdown shuts down some daemons
 * rc.shutdown runs /etc/rc.d/swaplate, turning off swap with the late flag
 * rc.shutdown shuts down other daemons
 * init(8) revokes /dev/console
 * init(8) signals all processes with SIGTERM and then SIGKILL, waiting up to 20 seconds for them to terminate
 * init(8) calls reboot(2) with appropriate arguments
 * kernel syncs
 * kernel unmounts (forcibly) all filesystems
 * kernel turns off all swap
 * kernel instructs hardware to power off, reboot, etc.

As a result, any swap files must be turned off by /etc/rc.d/swaplate. If not, the kernel will panic when trying to read data from the swap file when turning it off, since the filesystems have already been unmounted.

You can make scenarios like yours work (without changes to FreeBSD) if you ensure the memory-eating processes are either shut down by an rc.d script that runs before swaplate in the shutdown order or are in the foreground of a tty which is enabled in /etc/ttys.

What could be done in FreeBSD is adding unforced unmount and swapoff after all processes have been signaled. This could be in init(8) or the kernel. Some looping may be beneficial since turning off a swap file may make it possible to unmount a filesystem without forcing.

In case of swap on fuse or the like, it is necessary to turn off the swap before stopping the fuse daemon. However, it is best to kill as many processes as possible before turning off swap to avoid paging in useless things and to avoid high memory pressure.