I rented a virtual machine running FreeBSD in da data center. The machine crashed for unknown reasons and fsck did not run successfully. I see on the console the message: WARNING: / was not properly dismounted WARNING: /: mount pendig error: blocks 0 files 1 Starting file system checks: [...] ** Resolving unreferenced inode list ** Processiong journal entries. Fsck_ufs: Directory XXXXX name not found Unknown error 1; help! ERRROR: ABORTING BOOT (sending SIGTERM to parent)! And the root shell started. I run fsck manually # fsck -y ** /dev/vtbd0p2 USE JOURNAL? Yes ** SU+J Recovering /dev/vtbd0p2 ** Reading 33554432 byte journal from inode 4 RECOVER? Yes ** Building recovery table. ** Resovling unreferenced inode list. ** Processiong journal entries. fsck_ufs: Directory XXXXX name not found And it failed again. I googled and some user recommend to use `fsck -y -f’ I tried this and it seems to work, with fsck -y I could repair the filesystem, run fsck again (without -f) and boot the machine. I see 2 issues here: why does fsck fails, and print a strange error message? And second: the manual page fsck(8) is misguiding. It says: -f Force checking of file systems, even when they are marked clean (for file systems that support this).” This sound like you use -f for checking a clean file system. But in my case it was rotten. I didn't wanted to check it, I needed to repair it.
What virtual machine infrastructure? Is the virtual disk lying about flushed writes? It seems like this inconsistency could happen if a virtual disk reordered some writes and then the host crashed.
It is an Intel 64bit CPU with a shared SSD. I guess the data center is using QEMU/KVM for the virtualization, but I’m not sure about that. $ df -h Filesystem Size Used Avail Capacity Mounted on /dev/vtbd0p2 93G 63G 23G 73% / devfs 1.0K 1.0K 0B 100% /dev sysctl -a kern.geom.label.disk_ident.enable: 1 kern.geom.disk.cd0.flags: 0 kern.geom.disk.cd0.led: kern.geom.disk.vtbd0.flags: 3a<OPEN,CANFLUSHCACHE,UNMAPPEDBIO,DIRECTCOMPLETION> kern.geom.disk.vtbd0.led: kern.disks: cd0 vtbd0 SSD write performance ca: 140Mbyte/s SSD read performance ca: 560Mbyte/s FreeBSD 12.0-CURRENT #0 fcca5326804(master): Mon Oct 23 13:24:36 CEST 2017 root@freebsd:/usr/obj/usr/src/sys/GENERIC-NODEBUG amd64 FreeBSD clang version 5.0.0 (tags/RELEASE_500/final 312559) (based on LLVM 5.0.0svn) CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2200.05-MHz K8-class CPU) real memory = 8589934592 (8192 MB) FreeBSD/SMP: Multiprocessor System Detected: 6 CPUs virtio_pci0: <VirtIO PCI Block adapter> port 0xc080-0xc0bf mem 0xf2060000-0xf2060fff irq 11 at device 4.0 on pci0 vtblk0: <VirtIO Block Adapter> on virtio_pci0 vtblk0: 102400MB (209715200 512 byte sectors) virtio_pci1: <VirtIO PCI Balloon adapter> port 0xc0c0-0xc0df irq 10 at device 5.0 on pci0 vtballoon0: <VirtIO Balloon Adapter> on virtio_pci1 ugen0.2: <QEMU 0.12.1 QEMU USB Tablet> at usbus0 Trying to mount root from ufs:/dev/vtbd0p2 [rw]...
Correction: the VM is running on Virtio (kvm/linux) https://www.linux-kvm.org/page/Virtio
I have exactly the same occurring with FreeBSD installed on an SSD. After simulating power outages, the system refuses boot. Booting into single user mode and trying fsck ends with "fsck_ufs: Directory 1284157 name not found". This is FreeBSD 10.3-RELEASE-p12. The SSD is mSATA iSLC unit.
(In reply to Maxim Usatov from comment #4) A non-power protected SSD typically lies about whether it has actually committed changes to stable storage for reasons of performance. As such the kernel has no idea what is and isn't there. Your issue is likely unrelated; there's no solution to what you're experiencing other than "don't do that" (meaning either put in a SSD that has power protection or don't unexpectedly kill the power to it.)
(In reply to karl from comment #5) On some devices it may be possible to disable volatile write cache, if any. This is necessary for correct UFS power-fail recovery. (Some disks also lie about allowing write-cache disabled, which is extra fun.)
I reproduced this issue on 12.0-RC1 on a t1.small.x86 bare metal machine at packet.net. This occurred after a panic, with (I presume) stable power.
This bug falls into the class of bugs that occur when using journalling. Specifically, the journal only tracks inconsistencies that can occur based on the order of the write completions to the disk. If the disk lies about write completions, then the disk will be inconsistent in ways that the journal does not know. Recovery using journalling is quick, because it only checks the things that it knows may be wrong. If other errors have occurred, the journal will not fix them. Running fsck -f ignores the journal and does a full consistency check of the disk so will find and fix the errors about which the journal is unaware. When running on lying disks, you should NOT use journalling. Rather run with just soft updates. After a crash it will take longer to come back up, but all of the problems will be found and fixed. You can disable journalling using the command `tunefs -j disable'. When creating new filesystems, use `newfs -U ...' instead of `newfs -j ...'.
A commit references this bug: Author: wosch Date: Fri Mar 8 10:03:16 UTC 2019 New revision: 344922 URL: https://svnweb.freebsd.org/changeset/base/344922 Log: explain ``fsck -f'' more in detail PR: 223491 Approved by: mckusick, 0mp, imp Differential Revision: https://reviews.freebsd.org/D19437 Changes: head/sbin/fsck/fsck.8
A commit references this bug: Author: gbe Date: Sun Dec 6 07:38:59 UTC 2020 New revision: 368378 URL: https://svnweb.freebsd.org/changeset/base/368378 Log: MFC r344922 (by wosch): explain ``fsck -f'' more in detail PR: 223491 Approved by: mckusick, 0mp, imp Differential Revision: https://reviews.freebsd.org/D19437 Changes: _U stable/12/ stable/12/sbin/fsck/fsck.8
This happens every now and then in Hetzner (SSD) and ext4, xfs and other journalled file systems do not expose such bug on the same VM/hardware.