Created attachment 224295 [details]
Patch to add a progress indicator for filesystem unmounting at reboot/shutdown
On a server with about 140000 zfs filesystems a "reboot" sometimes takes a very long (hours) time. Time that is spend by the kernel unmounting the filesystems. Unfortunately the kernel doesn't print any kind of progress indication when this occurs so you just see the "Uptime: xxxx" printed and then nothing...
Please find enclosed a patch that adds a progress indication for the unmounting part.
(There used to be a related issue with the ZFS kmem freeing also taking a very long time but that fix is included so that part goes quickly when the unmounting part has been passed).
I probably should have mentioned some more details:
149354 ZFS filesystems
# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
DATA 582T 272T 310T - - 7% 46% 1.00x ONLINE -
FILUR00 65.5T 107G 65.4T - - 0% 0% 1.00x ONLINE -
FILUR02 196T 72.6T 124T - - 11% 36% 1.00x ONLINE -
FILUR03 196T 78.2T 118T - - 11% 39% 1.00x ONLINE -
RUNUR01 65.5T 11.8M 65.5T - - 0% 0% 1.00x ONLINE -
SUSPECT 9.06T 2.79M 9.06T - - 0% 0% 1.00x ONLINE -
UNUSED 196T 5.38M 196T - - 0% 0% 1.00x ONLINE -
zroot 444G 9.03G 435G - - 21% 2% 1.00x ONLINE -
At the reboot time when it was really slow it unmounted around 6-10 filesystems per second -> estimated reboot time around 4-5 hours... (Needless to say I hard-rebooted it with an "ipmitool power reset" after an hour :-)
(It hasn't been this slow before, but we don't reboot these servers very often).
Anyway, a more verbose vfs_unmount() would be a good thing even with a more sane amount of filesystems - in my opinion.
(And preferably a faster zfs unmount operation :-)
"That part is included" ... where? I have only 400-odd filesystems, but I still find that long running ZFS can take minutes if not hours to let the computer reboot. I'd like to test your patch, but I'd like this other patch, too.
(In reply to dgilbert from comment #2)
Ah, sorry. Wasn't really clear there. The fix for that other problem is already in the normal FreeBSD 12.2 kernel.
Information about that fix (from FreeBSD 11.3) is in bug 242427 (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242427) - however that is something that happens after all the filesystems are unmounted.
For a machine with more modest amounts of zfs filesystems (3000) then the unmounting is much quicker (in my tests) - like 3000 filesystems unmounted in 10 seconds. That is for a fairly recently rebooted (same hardware as the slow one - a HP DL380g9 with 512-768GB of RAM and 140 10TB drives on the big one and 10 10TB drives on the small one.
Hrm. Not matching my experience, then. I recently updated from 12.2 to 13 --- basically at the beginning of the release schedule. My server has 128G RAM, a 8C/16T threadripper and 60T of disk.
The disk has many uses, but I also do poudriere builds on this machine --- which tend to thrash it pretty hard.
If a reboot is done after low uptime (under a week or so), things are fine. But the reboot time rises as uptime does ... roughly. Poudriere runs seem to frustrate it.
So far, with 13, I have seen a reboot take ~ 5 minutes. Small amounts of disk activity on the array (just noticed the blinken lights). This is after the buffer messages but before the uptime is printed.
FreeBSD 13 uses the new OpenZFS code whereas 12.2 uses the older FreeBSD ZFS code, so it probably differs quite a bit.
If you're seeing the "Uptime: xx" message then it's not the unmounting of the filesystems since "Uptime: xx" is printed after they all are unmounted. After that it frees kernel memory and other stuff... I have another patch that adds more verbose printing in various parts that was very slow back in the FreeBSD 11.3 days, but I'm not sure how much of it applies directly to FB13 due to the new ZFS code.
Hmm.. I wonder if the changes in the memory handling code in FreeBSD might have caused the kmem_cache stuff to become slow again for some reason.
I too suffer from long reboots, on both 12.2 and 13.0, on servers with large numbers of disks. I plan to test your patch. On my systems, the long hang (1m - 30m) happens after "All buffers synced".
Created attachment 224325 [details]
Progress indicator during vfs_unmount at shutdown
An updated vfs_unmount progress indicator patch with more details. Applies to 12.2, 13.0 (och compiles on current).
Created attachment 224326 [details]
More verbose during shutdown
Adds a sysctl to control how much more verbose to be during shutdown and prints some progress indicators (for FreeBSD 13), but probably applies without much fuzz for 12.2 too.
Created attachment 224327 [details]
Patch to make zfs_fini more verbose at shutdown
A patch to print more details while closing down zfs at shutdown/reboot. Requires the "More verbose during shutdown" patch. For FreeBSD 13 and newer.
Created attachment 224366 [details]
Improved FreeBSD 12.2 version of "verbose shutdown" patch
Created attachment 224367 [details]
Improved FreeBSD 13 version of "verbose shutdown" patch