Bug 230851 - fsck sets last modified file size to zero on crash on UFS filesystem
Summary: fsck sets last modified file size to zero on crash on UFS filesystem
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 11.2-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-23 20:47 UTC by Ali Abdallah
Modified: 2018-08-28 14:17 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ali Abdallah 2018-08-23 20:47:19 UTC
On system crash, fsck sets the last modified files size to 0, loosing their content. 

Step to reproduce

1) edit a file, for example /boot/loader.conf
2) cause a panic, for example sysctl debug.kdb.panic=1

On next boot the content of loader.conf is gone, and its size is set to zero. 

I'm not sure if this is an expected behavior, but it is really bad to loose the content of the last modified files on crash/panic. I don't pretend to find my last modifications save, but at least the original file. 

My UFS settings are the following, but I've tried different combinations, mainly playing with the -n and -j options, same result. 

tunefs: POSIX.1e ACLs: (-a)                                disabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       enabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         enabled
tunefs: maximum blocks per file in a cylinder group: (-e)  4096
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: space to hold for metadata blocks: (-k)            6408
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)
Comment 1 Kirk McKusick freebsd_committer freebsd_triage 2018-08-24 05:29:49 UTC
What editor are you using? Most editors follow the safe update practice:

write file to new_name
fsync(new_name);
rename(new_name, orig_name);

The fsync will not return until the contents are on the disk and the rename is atomic, so it will either point at the orig_name file contents or the new_name file contents which because of the fsync will be on the disk.
Comment 2 Ali Abdallah 2018-08-24 09:09:24 UTC
I had this issue when I edited loader.conf with nano. 

Actually I tested the edit/panic cycles many times on an old test machine. The problem does not always occur, but sometimes.
Comment 3 Kirk McKusick freebsd_committer freebsd_triage 2018-08-24 23:27:41 UTC
(In reply to Ali Abdallah from comment #2)
I have looked at nano and it does not follow the proper protocol for writing out the file (which is detailed in my comment #1). It simply does:

fd = open(file, O_WRONLY|O_CREAT|O_TRUNC, 0666);
write new contents
close(fd);

The O_TRUNC flag truncates the file to zero length. If the system crashes before the new contents are written, you get a zero length file. By default the contents will not be written for up to 30 seconds. So if you panic within 30 seconds of the file being written, you will get a zero length file. The proper way to fix this is detailed in my comment #1. The gap could be closed significantly by adding an fsync(fd) before calling close as that would cause the file to be written to disk within a few milliseconds of its finishing being written thus closing the gap considerably (but it would still be possible to get a zero length file). So, it really should be fixed properly.

I am closing this bug because it is a bug in nano and not in FreeBSD.
Comment 4 Conrad Meyer freebsd_committer freebsd_triage 2018-08-25 02:37:55 UTC
@Ali, Please file a bug with the nano project and chase up getting that fixed there.  I know nano has a lot of users and it is a shame it isn't safe!
Comment 5 Ali Abdallah 2018-08-25 07:04:18 UTC
@Conrad I will file a bug against nano. 

@Kirk thanks for looking into this. 

BTW, while I was testing on my test system, I had done the following on a clean 11.2 installation

1) pkg install nano 
2) nano /etc/sysctl.conf modify, save and exit
3) sysctl debug.kdb.panic=1

On the next boot, nano was registered as installed by pkg. But the nano binary (together with its indexinfo and gettext-runtime files) were not present on disk. 

I think pkg does not fsync the installed files in this case, right?
Comment 6 Kirk McKusick freebsd_committer freebsd_triage 2018-08-25 16:08:24 UTC
(In reply to Ali Abdallah from comment #5)
You are correct that the installed binaries are not being fsync'ed before the database is being updated (which being database software DOES properly fsync). I have not looked at pkg, but if it directly does the write itself, it should do the fsync before it updates the database. It may be that it is running a shell script that uses the install(1) utility, then it is install that should be doing the fsync. Obviously, more checking needs to be done.

Thanks for your help in tracking down these issues.
Comment 7 commit-hook freebsd_committer freebsd_triage 2018-08-27 15:21:21 UTC
A commit references this bug:

Author: mckusick
Date: Mon Aug 27 15:20:42 UTC 2018
New revision: 338340
URL: https://svnweb.freebsd.org/changeset/base/338340

Log:
  When doing a -S "safe copy", the install command should do an
  fsync(2) system call after copying the installed file to ensure
  that it is on stable storage.

  PR:          230851
  Reviewed by: kib
  Approved by: re (marius)

Changes:
  head/usr.bin/xinstall/xinstall.c
Comment 8 Ali Abdallah 2018-08-28 06:17:28 UTC
(In reply to commit-hook from comment #7)

Thanks a lot for your effort, this will make UFS even better. 

Unfortunately not all softwares follow the safe update practice. For example I've lost my .zsh_history on crash yesterday. I assume that zsh don't fsync on every command entered in the shell. I think I can do nothing in those situations unless I gjournal my home UFS partition or I can just use backups/snapshots. 

I had never experienced crashes in the past, but all my troubles started when I moved my production system from 11.1 to 11.2 (vboxdrv caused me all kind of troubles when running my Windows 10 vm machine, that unfortunately I need it for my work). 

Thanks!
Comment 9 Conrad Meyer freebsd_committer freebsd_triage 2018-08-28 14:17:50 UTC
(In reply to Ali Abdallah from comment #8)
Yeah, with zsh I end up taking backups.  I rely on ^R history a lot to work quickly, so losing it is painful.  Here's a really ugly kludge I use (saves a new copy any time .zhistory is modified): https://github.com/cemeyer/backup_zhistory