Whenever fsck is run in background mode, a file called fsck_snapshot gets created in the .snap directory on the checked volume. Fsck then runs its check on this file instead of the live filesystem. Filesystem snapshots (which fsck_snapshot essentially is) are designed to persist over mounts and reboots thus if fsck does not terminate properly for some reason (hard reboot etc), the file gets left over. This is partially solved on the next background fsck run (commonly just after the system reboots if the fs is marked dirty) since fsck overwrites the left over fsck_snapshot whit a new one and removes it when its done. The prblem occours when you mark the filesystem clean before the next fsck background run (for example through fsck in singleuser mode). This way the fsck_snapshot file persists and possibly consumes most of the filesystem (depending on the state of the filesystem when the snapshot was made). Fix: Implement a code (maybe into loader after the the fs is mounted) to check for left over fsck snapshots and remove them if appropriate. How-To-Repeat: 1) run fsck in background mode 2) halt -qn before fsck finishes (or otherwise terminate it unproperly ... sigkill does not seem to work since fsck is in biord state) 3) boot into singleuser mode 4) fsck to mark the filesystem clean 5) reboot into normal mode and watch the file grow with every change on the live filesystem
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
Please, does this bug explain the clean-then-dirty behaviour that's observed in the following transcript? <https://lists.freebsd.org/archives/freebsd-current/att-0339/2021-07-16_00.53_typescript.txt> First observed (and reproducible) whilst working with faulty hardware. Reproducible today with a new SSD.
(In reply to Graham Perrin from comment #3) Your transcript does not use snapshots, so this bug which is about snapshots does not apply to your transcript. The update to the block counts should not have affected the file type, so it would appear that when the inode block with the updated count and size fields was written to disk, other parts of it were scrambled. This implies some kind of error in writing the inode block to the disk.
(In reply to Kirk McKusick from comment #4) Thank you. (I wondered whether there might be a shared underlying cause.) I'll raise a separate bug report.
Anyone, please: are the symptoms in opening comment #0 (2006) likely to be reproducible with any current branch of FreeBSD? <https://www.freebsd.org/where/> I recall relatively recent attention to background fsck in <https://cgit.freebsd.org/src/commit/?h=releng/13.1&id=fb2feceac34cc9c3fb47ba4a7b0ca31637f8fdf0> (a cherry-pick to releng/13.1) …
This problem was fixed not long after this bug report was filed (though I was unaware of the bug report until now so did not close it). The fix is to open the snapshot and then unlink it immediately after it is created. Since the snapshot will have no references it will be removed when fsck closes the file and/or exits. If the system crashes prior to fsck exiting then the next run of fsck will find the unreferenced snapshot and remove it.
(In reply to Kirk McKusick from comment #7) Mostly an FYI that there might still be an oddity possible. Context: # uname -apKU FreeBSD amd64_UFS 14.0-CURRENT FreeBSD 14.0-CURRENT #61 main-n261026-d04c86717c8c-dirty: Sun Feb 19 15:03:52 PST 2023 root@amd64_ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1400081 1400081 I booted this system and the "df -m" listed 57% Capacity but my du -xsm /* listed more like a 3rd of the capacity (totals of the lines). The du result was more like expected. (Note: This system is normally booted and uses a distinct ZFS media, so today's activity was the first UFS media use in some days.) Turned out there was a /.snap/fsck_snapshot roughly matching the du -xsm /* total. "ls -Tld" indicated today's date on the /.snap/fsck_snapshot . I removed the file after a while (no evidence of it being in use noticed) and rebooted and it did not return. The removal resulted in "df -m" reporting 29% Capacity, both before and after the reboot, more like expected.
(In reply to Mark Millard from comment #8) Is this filesystem mounted and having background fsck run on it? Background fsck is the only application that would create .snap/fsck_snapshot.
(In reply to Kirk McKusick from comment #9) I expect that the background fsck ran on the system. But by the time I noticed the high % Capacity and then finally noticed the /.snap/fsck_snapshot , I did not find any evidence of a background fsck still being active. This means that I did not see the background fsck run, unfortunately. But the date/time on /.snap/fsck_snapshot made reasonable sense for time frame. My seeing the file indicates that the unlink did not happen by the time I noticed the file. That, of itself, may be odd. (And is what prompted me to submit the note.)
(In reply to Mark Millard from comment #10) There is a brief window when the file exists. The snapshot request is made. The resulting file is opened by fsck. The file is then unlinked. If fsck exits or the kernel dies after the snapshot is created and before the unlink is done then the file will remain. It is possible that you somehow hit that window.
(In reply to Kirk McKusick from comment #11) FYI: There were no kernel crashes before, during, or after. Other than % Capacity and /.snap/fsck_snapshot existence issue, things seemed normal. So it would seem that the fsck probably exited before the unlink was done, leaving the file in place. Sounds like such is an expected possibility.
(In reply to Mark Millard from comment #12) The open of the snapshot also includes the reading of the superblock on the snapshot. Until recently rather few checks were done, so a bad field in the superblock could create a wild pointer that would cause fsck to segment fault. So that would be my best guess of what caused the premature exit with the snapshot still in place. I should reorganize the code to do the unlink immediately after the open to close that window.
(In reply to Kirk McKusick from comment #13) FYI: I did not find a .core file for fsck_ffs (or for any other program) when I did a 'find / -name "*.core" -print' . Similarly, looking in /var/log/messages did not show any examples of the likes of messages of the form: pid ???? (????), jid ????, uid ????: exited on signal ???? for that day. Separately: It does sound like moving the unlink to just after the open would be appropriate.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d3a36e4b7459b2d62c4cd50de7a8e3195d7241c7 commit d3a36e4b7459b2d62c4cd50de7a8e3195d7241c7 Author: Kirk McKusick <mckusick@FreeBSD.org> AuthorDate: 2023-10-25 22:36:45 +0000 Commit: Kirk McKusick <mckusick@FreeBSD.org> CommitDate: 2023-10-25 22:38:11 +0000 Delete snapshot after opening it when running fsck_ffs(9) in background. When fsck_ffs(8) runs in background, it creates a snapshot named fsck_snapshot in the filesystem's .snap directory. The fsck_snapshot file was removed when the background fsck finished. If the system crashed or the fsck exited unexpectedly, the fsck_snapshot file would remain. The snapshot would consume ever more space as the filesystem changed over time until it was removed by a system administrator or a future run of background fsck removed it to create a new snapshot file. This commit unlinks the .snap/fsck_snapshot file immediately after opening it so that it will be reclaimed when fsck closes it at the conclusion of its run. After a system crash, it will be removed as part of the filesystem cleanup because of its zero reference count. As only a few milliseconds pass between its creation and unlinking, there is far less opportunity for it to be accidentally left behind. PR: 106107 MFC-after: 1 week sbin/fsck_ffs/fsck.h | 1 - sbin/fsck_ffs/fsutil.c | 1 - sbin/fsck_ffs/globs.c | 2 -- sbin/fsck_ffs/main.c | 8 +++++--- sbin/fsck_ffs/setup.c | 8 ++++---- 5 files changed, 9 insertions(+), 11 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=27133e6e86c16f86642a16d15ea2c910e9642616 commit 27133e6e86c16f86642a16d15ea2c910e9642616 Author: Kirk McKusick <mckusick@FreeBSD.org> AuthorDate: 2023-10-25 22:36:45 +0000 Commit: Kirk McKusick <mckusick@FreeBSD.org> CommitDate: 2023-11-12 06:48:25 +0000 Delete snapshot after opening it when running fsck_ffs(9) in background. PR: 106107 (cherry picked from commit d3a36e4b7459b2d62c4cd50de7a8e3195d7241c7) sbin/fsck_ffs/fsck.h | 1 - sbin/fsck_ffs/fsutil.c | 1 - sbin/fsck_ffs/globs.c | 2 -- sbin/fsck_ffs/main.c | 8 +++++--- sbin/fsck_ffs/setup.c | 8 ++++---- 5 files changed, 9 insertions(+), 11 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d3d779f6475474bd7fc80d1f9cce91c7b42fc958 commit d3d779f6475474bd7fc80d1f9cce91c7b42fc958 Author: Kirk McKusick <mckusick@FreeBSD.org> AuthorDate: 2023-10-25 22:36:45 +0000 Commit: Kirk McKusick <mckusick@FreeBSD.org> CommitDate: 2023-11-12 06:51:14 +0000 Delete snapshot after opening it when running fsck_ffs(9) in background. PR: 106107 (cherry picked from commit d3a36e4b7459b2d62c4cd50de7a8e3195d7241c7) sbin/fsck_ffs/fsck.h | 1 - sbin/fsck_ffs/fsutil.c | 1 - sbin/fsck_ffs/globs.c | 2 -- sbin/fsck_ffs/main.c | 8 +++++--- sbin/fsck_ffs/setup.c | 8 ++++---- 5 files changed, 9 insertions(+), 11 deletions(-)