When you use mksnap_ffs to make a snapshot on a filesystem which then has a lot of stuff deleted and re-created, the snapshot becomes corrupt. I think this is fairly serious since snapshots may be used for backup purposes. That's how I originally discovered the problem; I made a snapshot on /usr before making a bunch of changes, during which I accidentally moved most of /usr/local to another partition :). I moved it back but wanted to verify that everything was back as it was, which is when I discovered my snapshot was no good. Note this is on amd64. I have not tried i386. Fix: Unknown. Thanks! How-To-Repeat: # dd if=/dev/zero of=snaptest.img bs=1024k count=1000 # mdconfig -a -t vnode -f snaptest.img md0 # newfs /dev/md0 # mount /dev/md0 /mnt/md0 # cd /mnt/md0 # tar xjf /usr/ports/distfiles/gap/gap4r4p6.tar.bz2 # mksnap_ffs /mnt/md0 /mnt/md0/.snap/snap1 # mdconfig -a -t vnode -f .snap/snap1 WARNING: opening backing store: /mnt/md0/.snap/snap1 readonly md1 # mount -r /dev/md1 /mnt/md1 ###### inspecting /mnt/md1 reveals the snapshot is apparently okay # rm -r gap4r4 ###### snapshot still apparently okay # !tar tar xjf /usr/ports/distfiles/gap/gap4r4p6.tar.bz2 # ls -l /mnt/md1/gap4r4 ls: Makefile.in: Bad file descriptor ls: bin: Bad file descriptor ls: cnf: Bad file descriptor ls: configure: Bad file descriptor ls: doc: Bad file descriptor ls: etc: Bad file descriptor ls: gap.shi: Bad file descriptor ls: grp: Bad file descriptor ls: pkg: Bad file descriptor ls: prim: Bad file descriptor ls: small: Bad file descriptor ls: src: Bad file descriptor ls: sysinfo.in: Bad file descriptor ls: trans: Bad file descriptor ls: tst: Bad file descriptor total 38 -rw-r--r-- 1 nate nate 4782 Aug 29 06:19 README -rw-r--r-- 1 nate nate 9725 May 11 2005 description4r4p5 -rw-r--r-- 1 nate nate 11660 Aug 29 06:05 description4r4p6 drwxr-xr-x 2 nate nate 9728 Aug 30 06:27 lib Doing truss on ls reveals that lstat() is returning EBADF on the offending files (which doesn't make any sense as there is no file descriptor involved; EIO might be better). Also, umounting and then fscking /dev/md1 produces a cornucopia of errors, including as a representative sample: PARTIALLY TRUNCATED INODE I=70662 3689066227402421815 BAD I=70662 4121129229942796344 BAD I=70662 3833180345978203193 BAD I=70662 4051046384641915184 BAD I=70662 3688509874569295664 BAD I=70662 3472592161990062385 BAD I=70662 3906084542581519160 BAD I=70662 4049637910162848049 BAD I=70662 4123381021216356400 BAD I=70662 3979273551213759020 BAD I=70662 4051327820913194809 BAD I=70662 EXCESSIVE BAD BLKS I=70662 INCORRECT BLOCK COUNT I=70662 (960 should be 736) PARTIALLY TRUNCATED INODE I=70719 UNALLOCATED I=23552 OWNER=nate MODE=0 DIRECTORY CORRUPTED I=70660 OWNER=nate MODE=40755 MISSING '.' I=71129 OWNER=nate MODE=40755 SIZE=1536 MTIME=Aug 30 06:27 2005 UNREF DIR I=117760 OWNER=nate MODE=40755 SIZE=512 MTIME=Aug 30 06:27 2005 LINK COUNT DIR I=2 OWNER=root MODE=40755 SIZE=512 MTIME=Dec 16 10:34 2005 COUNT 4 SHOULD BE 3 The original filesystem /dev/md0 apparently remains okay and fsck reports no errors for it. There are no kernel error messages this time, though a previous attempt (when the snapshot was on /dev/md0) yielded /mnt/md0: bad dir ino 3182535 at offset 0: mangled entry /mnt/md0: bad dir ino 2953 at offset 0: mangled entry ...4 or 5 more... Also at that time there were directories which changed to files of size 1 which dumped many, many bytes of garbage when cat'ted.
FWIW, I can't seem to reproduce this on my i386/CURRENT box. Since there don't appear to be any significant FFS changes between 6.0-RELEASE and CURRENT, this may be a 64-bit issue. -- Nate Eldredge nge@cs.hmc.edu
State Changed From-To: open->feedback The fix for the problem was committed as rev. 1.103.2.17 on RELENG_6, and shall be included in 6.2. Please, retest.
State Changed From-To: feedback->closed A fix has been committed.