Bug 90512 - [64-bit] Snapshot corruption after fs activity
Summary: [64-bit] Snapshot corruption after fs activity
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 6.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-16 19:20 UTC by Nate Eldredge
Modified: 2007-01-06 00:04 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nate Eldredge 2005-12-16 19:20:03 UTC
When you use mksnap_ffs to make a snapshot on a filesystem which then
has a lot of stuff deleted and re-created, the snapshot becomes corrupt.

I think this is fairly serious since snapshots may be used for backup
purposes.  That's how I originally discovered the problem; I made a
snapshot on /usr before making a bunch of changes, during which I
accidentally moved most of /usr/local to another partition :).  I moved
it back but wanted to verify that everything was back as it was,
which is when I discovered my snapshot was no good.

Note this is on amd64.  I have not tried i386.

Fix: 

Unknown.

Thanks!
How-To-Repeat: # dd if=/dev/zero of=snaptest.img bs=1024k count=1000
# mdconfig -a -t vnode -f snaptest.img
md0
# newfs /dev/md0
# mount /dev/md0 /mnt/md0
# cd /mnt/md0
# tar xjf /usr/ports/distfiles/gap/gap4r4p6.tar.bz2 
# mksnap_ffs /mnt/md0 /mnt/md0/.snap/snap1
# mdconfig -a -t vnode -f .snap/snap1
WARNING: opening backing store: /mnt/md0/.snap/snap1 readonly
md1
# mount -r /dev/md1 /mnt/md1
###### inspecting /mnt/md1 reveals the snapshot is apparently okay
# rm -r gap4r4
###### snapshot still apparently okay
# !tar
tar xjf /usr/ports/distfiles/gap/gap4r4p6.tar.bz2
# ls -l /mnt/md1/gap4r4
ls: Makefile.in: Bad file descriptor
ls: bin: Bad file descriptor
ls: cnf: Bad file descriptor
ls: configure: Bad file descriptor
ls: doc: Bad file descriptor
ls: etc: Bad file descriptor
ls: gap.shi: Bad file descriptor
ls: grp: Bad file descriptor
ls: pkg: Bad file descriptor
ls: prim: Bad file descriptor
ls: small: Bad file descriptor
ls: src: Bad file descriptor
ls: sysinfo.in: Bad file descriptor
ls: trans: Bad file descriptor
ls: tst: Bad file descriptor
total 38
-rw-r--r--  1 nate  nate   4782 Aug 29 06:19 README
-rw-r--r--  1 nate  nate   9725 May 11  2005 description4r4p5
-rw-r--r--  1 nate  nate  11660 Aug 29 06:05 description4r4p6
drwxr-xr-x  2 nate  nate   9728 Aug 30 06:27 lib


Doing truss on ls reveals that lstat() is returning EBADF on the offending
files (which doesn't make any sense as there is no file descriptor involved;
EIO might be better).  Also, umounting and then fscking /dev/md1
produces a cornucopia of errors, including as a representative sample:

PARTIALLY TRUNCATED INODE I=70662
3689066227402421815 BAD I=70662
4121129229942796344 BAD I=70662
3833180345978203193 BAD I=70662
4051046384641915184 BAD I=70662
3688509874569295664 BAD I=70662
3472592161990062385 BAD I=70662
3906084542581519160 BAD I=70662
4049637910162848049 BAD I=70662
4123381021216356400 BAD I=70662
3979273551213759020 BAD I=70662
4051327820913194809 BAD I=70662
EXCESSIVE BAD BLKS I=70662
INCORRECT BLOCK COUNT I=70662 (960 should be 736)
PARTIALLY TRUNCATED INODE I=70719
UNALLOCATED  I=23552  OWNER=nate MODE=0
DIRECTORY CORRUPTED  I=70660  OWNER=nate MODE=40755
MISSING '.'  I=71129  OWNER=nate MODE=40755
SIZE=1536 MTIME=Aug 30 06:27 2005 
UNREF DIR  I=117760  OWNER=nate MODE=40755
SIZE=512 MTIME=Aug 30 06:27 2005 
LINK COUNT DIR I=2  OWNER=root MODE=40755
SIZE=512 MTIME=Dec 16 10:34 2005  COUNT 4 SHOULD BE 3

The original filesystem /dev/md0 apparently
remains okay and fsck reports no errors for it.

There are no kernel error messages this time, though a previous attempt
(when the snapshot was on /dev/md0) yielded

/mnt/md0: bad dir ino 3182535 at offset 0: mangled entry
/mnt/md0: bad dir ino 2953 at offset 0: mangled entry
...4 or 5 more...

Also at that time there were directories which changed to files of size 1 
which dumped many, many bytes of garbage when cat'ted.
Comment 1 Nate Eldredge 2005-12-17 06:49:55 UTC
FWIW, I can't seem to reproduce this on my i386/CURRENT box.  Since there 
don't appear to be any significant FFS changes between 6.0-RELEASE and 
CURRENT, this may be a 64-bit issue.

-- 
Nate Eldredge
nge@cs.hmc.edu
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2007-01-05 13:53:48 UTC
State Changed
From-To: open->feedback

The fix for the problem was committed as rev. 1.103.2.17 on RELENG_6, 
and shall be included in 6.2. Please, retest.
Comment 3 Mark Linimon freebsd_committer freebsd_triage 2007-01-06 00:03:54 UTC
State Changed
From-To: feedback->closed

A fix has been committed.