Bug 223491 - fsck_ufs: Directory XXXX name not found
Summary: fsck_ufs: Directory XXXX name not found
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL: https://reviews.freebsd.org/D19437
Keywords:
Depends on: 224292
Blocks:
  Show dependency treegraph
 
Reported: 2017-11-07 10:33 UTC by Wolfram Schneider
Modified: 2019-03-08 10:03 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wolfram Schneider freebsd_committer 2017-11-07 10:33:08 UTC
I rented a virtual machine running FreeBSD in da data center. The machine crashed for unknown reasons and fsck did not run successfully. I see on the console the message:

WARNING: / was not properly dismounted
WARNING: /: mount pendig error: blocks 0 files 1
Starting file system checks:
[...]
** Resolving unreferenced inode list
** Processiong journal entries.
Fsck_ufs: Directory XXXXX name not found
Unknown error 1; help!
ERRROR: ABORTING BOOT (sending SIGTERM to parent)!


And the root shell started. 

I run fsck manually

# fsck -y 
** /dev/vtbd0p2
USE JOURNAL? Yes

** SU+J Recovering /dev/vtbd0p2
** Reading 33554432 byte journal from inode 4

RECOVER? Yes
** Building recovery table.
** Resovling unreferenced inode list.
** Processiong journal entries.
fsck_ufs: Directory XXXXX name not found

And it failed again. 

I googled and some user recommend to use `fsck -y -f’

I tried this and it seems to work, with fsck -y I could repair the filesystem, run fsck again (without -f) and boot the machine.

I see 2 issues here: why does fsck fails, and print a strange error message?

And second: the manual page fsck(8) is misguiding. It says:

   -f      Force checking of file systems, even when they are marked clean
           (for file systems that support this).”

This sound like you use -f for checking a clean file system. But in my case it was rotten. I didn't wanted to check it, I needed to repair it.
Comment 1 Conrad Meyer freebsd_committer 2017-11-08 00:19:38 UTC
What virtual machine infrastructure?  Is the virtual disk lying about flushed writes?  It seems like this inconsistency could happen if a virtual disk reordered some writes and then the host crashed.
Comment 2 Wolfram Schneider freebsd_committer 2017-11-08 09:04:48 UTC
It is an Intel 64bit CPU with a shared SSD. I guess the data center is using QEMU/KVM for the virtualization, but I’m not sure about that.

$ df -h
Filesystem      Size    Used   Avail Capacity  Mounted on
/dev/vtbd0p2     93G     63G     23G    73%    /
devfs           1.0K    1.0K      0B   100%    /dev


sysctl -a
kern.geom.label.disk_ident.enable: 1
kern.geom.disk.cd0.flags: 0
kern.geom.disk.cd0.led: 
kern.geom.disk.vtbd0.flags: 3a<OPEN,CANFLUSHCACHE,UNMAPPEDBIO,DIRECTCOMPLETION>
kern.geom.disk.vtbd0.led: 
kern.disks: cd0 vtbd0

SSD write performance ca: 140Mbyte/s 
SSD read performance ca: 560Mbyte/s

FreeBSD 12.0-CURRENT #0 fcca5326804(master): Mon Oct 23 13:24:36 CEST 2017
    root@freebsd:/usr/obj/usr/src/sys/GENERIC-NODEBUG amd64
FreeBSD clang version 5.0.0 (tags/RELEASE_500/final 312559) (based on LLVM 5.0.0svn)
CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2200.05-MHz K8-class CPU)
real memory  = 8589934592 (8192 MB)
FreeBSD/SMP: Multiprocessor System Detected: 6 CPUs
virtio_pci0: <VirtIO PCI Block adapter> port 0xc080-0xc0bf mem 0xf2060000-0xf2060fff irq 11 at
 device 4.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci0
vtblk0: 102400MB (209715200 512 byte sectors)
virtio_pci1: <VirtIO PCI Balloon adapter> port 0xc0c0-0xc0df irq 10 at device 5.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci1
ugen0.2: <QEMU 0.12.1 QEMU USB Tablet> at usbus0
Trying to mount root from ufs:/dev/vtbd0p2 [rw]...
Comment 3 Wolfram Schneider freebsd_committer 2017-12-12 19:10:35 UTC
Correction: the VM is running on Virtio (kvm/linux)

https://www.linux-kvm.org/page/Virtio
Comment 4 Maxim Usatov 2019-01-14 17:30:15 UTC
I have exactly the same occurring with FreeBSD installed on an SSD. After simulating power outages, the system refuses boot. Booting into single user mode and trying fsck ends with "fsck_ufs: Directory 1284157 name not found". 

This is FreeBSD 10.3-RELEASE-p12. The SSD is mSATA iSLC unit.
Comment 5 karl 2019-01-14 17:54:36 UTC
(In reply to Maxim Usatov from comment #4)

A non-power protected SSD typically lies about whether it has actually committed changes to stable storage for reasons of performance.  As such the kernel has no idea what is and isn't there.

Your issue is likely unrelated; there's no solution to what you're experiencing other than "don't do that" (meaning either put in a SSD that has power protection or don't unexpectedly kill the power to it.)
Comment 6 Conrad Meyer freebsd_committer 2019-01-14 18:31:30 UTC
(In reply to karl from comment #5)
On some devices it may be possible to disable volatile write cache, if any.  This is necessary for correct UFS power-fail recovery.  (Some disks also lie about allowing write-cache disabled, which is extra fun.)
Comment 7 Ed Maste freebsd_committer 2019-02-25 15:13:53 UTC
I reproduced this issue on 12.0-RC1 on a t1.small.x86 bare metal machine at packet.net. This occurred after a panic, with (I presume) stable power.
Comment 8 Kirk McKusick freebsd_committer 2019-02-25 23:45:26 UTC
This bug falls into the class of bugs that occur when using journalling. Specifically, the journal only tracks inconsistencies that can occur based on the order of the write completions to the disk. If the disk lies about write completions, then the disk will be inconsistent in ways that the journal does not know. Recovery using journalling is quick, because it only checks the things that it knows may be wrong. If other errors have occurred, the journal will not fix them. Running fsck -f ignores the journal and does a full consistency check of the disk so will find and fix the errors about which the journal is unaware. When running on lying disks, you should NOT use journalling. Rather run with just soft updates. After a crash it will take longer to come back up, but all of the problems will be found and fixed. You can disable journalling using the command `tunefs -j disable'. When creating new filesystems, use `newfs -U ...' instead of `newfs -j ...'.
Comment 9 commit-hook freebsd_committer 2019-03-08 10:03:30 UTC
A commit references this bug:

Author: wosch
Date: Fri Mar  8 10:03:16 UTC 2019
New revision: 344922
URL: https://svnweb.freebsd.org/changeset/base/344922

Log:
  explain ``fsck -f'' more in detail

  PR:	223491
  Approved by: mckusick, 0mp, imp
  Differential Revision:	https://reviews.freebsd.org/D19437

Changes:
  head/sbin/fsck/fsck.8