After a large filesystem is marked dirty (due to a panic or a ^C'd fsck), and then a reboot, the background fsck starts. Approximately 1-2 minutes later the server slows down. Eventually, within about 5-10 minutes, all disk access attempts cease to function, and the server becomes unresponsive to even hitting return in bash. You can still ping the server, and if you connect to SSH it will still go through all the motions, right up until it is about to spawn login. This, even though the partition being fsck'd is not in use. As far as I can tell it will never recover. I've given it over 12 hours. It doesn't panic, unfortunately, or give any indication on the console why it is having trouble. fsck works fine when you run it from the command line, in the foreground. Fix: Disable background fsck in /etc/rc.conf: background_fsck="NO" It may be that using UFS2 also fixes the problem (but we've had other issues with that, I'll open another PR when I can reproduce that). How-To-Repeat: Install 5.4-STABLE on a multi-TB server, creating a 36GB / partition, and 1 or more 2TB partitions (you will need to use auto-carving). Use softupdates to format the large partitions. Use UFS1. Leave the large target partition completely empty. Unmount the target partition. Start "fsck /dev/whatever", and hit ^C part way through. Verify it says "FILE SYSTEM MARKED DIRTY". Reboot. Log in again to monitor the server. It will eventually stop responding to your commands.
This bug has been reproduced on a different server (similar hardware) running 6.0-RELEASE and UFS2. I accidentally forgot to disable background fscks on the server (big d'oh!) and about 12 hours after the server rebooted access to the disk started slowing down, eventually becoming completely unresponsive, forcing a reboot. The reboot took about 2 minutes to take effect, probably because the server was "busy" with the fsck. I was able to log in to it before it locked up, and tried ktrace'ing the fsck_ffs process. It had no activity. I suspect it deadlocked against something else. Unfortunately the server was a NFS server, so the NFS client also had to be rebooted due to a separate NFS client deadlock bug. The how-to-repeat is the same: That ^C fsck step is just to trigger a dirty filesystem. Really, really easy to duplicate. The workaround is the same: Disable background_fsck for all 5.4 or 6.0 servers (or for any servers capable of performing a background fsck). FWIW: The foreground fsck takes far less than 12 hours to complete.
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
State Changed From-To: open->feedback Is this still a problem for you? r184934 might have improved the snapshot creation on large file systems.
Responsible Changed From-To: freebsd-fs->jh Track.
State Changed From-To: feedback->closed Feedback timeout.