Bug 15065

Summary: fsck can't fix "huge" zero length files
Product: Base System Reporter: Kevin J. Meehan <kjm>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.3-STABLE   
Hardware: Any   
OS: Any   

Description Kevin J. Meehan 1999-11-23 21:40:00 UTC
	After a particularly bad week with 3 "double faults", one of 
	the 5 filesystems on the RAID subsystem had some odd files 
	the system pushed to the lost+found directory:

	c--Sr-S--T 1 cheng  si        147, 0x004f00ac Dec 31  1969 #1261947
	lrwxr-S--- 1 demlow ncs   4638043271730144200 Dec 31  1969 #2484071@ -> 
	s--x--S--- 1 root   wheel 4631321269953217064 Dec 31  1969 #2484095=

	The size returned by dump was a large negative number and thus broke
	Amanda. An umount and fsck of the filesystem would not fix the above.

Fix: 

We finally needed to use fsdb to remove the offending inodes. Note that
	while the link and file show huge sizes, their block counts are 0:

	fsdb (inum: 2)> inode 2484071
	current inode: symlink
	I=2484071 MODE=122740 SIZE=4638043271730144200
	        MTIME=Dec 31 18:00:00 1969 [1 nsec]
	        CTIME=Oct 20 14:23:00 1999 [0 nsec]
	        ATIME=Dec 31 18:00:00 1969 [0 nsec]
	OWNER=demlow GRP=ncs LINKCNT=1 FLAGS=0 BLKCNT=0 GEN=56ef2b95
	
	fsdb (inum: 2484071)> inode 2484095
	current inode: socket
	I=2484095 MODE=142100 SIZE=4631321269953217064
	        MTIME=Dec 31 18:00:00 1969 [1 nsec]
	        CTIME=Apr  9 17:05:23 1999 [0 nsec]
	        ATIME=Dec 31 18:00:00 1969 [0 nsec]
	OWNER=root GRP=wheel LINKCNT=1 FLAGS=0 BLKCNT=0 GEN=25cd617b
	
	fsdb (inum: 2484095)> inode 1261947
	current inode: character special (147,5177516)I=1261947
		MODE=27040 SIZE=4639318980096568328
	        MTIME=Dec 31 18:00:00 1969 [1 nsec]
	        CTIME=Sep  9 13:30:59 1999 [0 nsec]
	        ATIME=Dec 31 18:00:00 1969 [0 nsec]
	OWNER=cheng GRP=si LINKCNT=1 FLAGS=0 BLKCNT=0 GEN=1552126f
	
	# /sbin/fsck /home1
	** /dev/rda2s1e
	** Last Mounted on /mnt
	** Phase 1 - Check Blocks and Sizes
	** Phase 2 - Check Pathnames
	UNALLOCATED  I=2484071  OWNER=root MODE=0
	SIZE=0 MTIME=Dec 31 18:00 1969 
	NAME=/lost+found/#2484071
	
	REMOVE? [yn] y
	
	UNALLOCATED  I=2484095  OWNER=root MODE=0
	SIZE=0 MTIME=Dec 31 18:00 1969 
	NAME=/lost+found/#2484095
	
	REMOVE? [yn] y
	
	UNALLOCATED  I=1261947  OWNER=root MODE=0
	SIZE=0 MTIME=Dec 31 18:00 1969 
	NAME=/lost+found/#1261947
	
	REMOVE? [yn] y
	
	** Phase 3 - Check Connectivity
	** Phase 4 - Check Reference Counts
	** Phase 5 - Check Cyl groups
	FREE BLK COUNT(S) WRONG IN SUPERBLK
	SALVAGE? [yn] y
	
	SUMMARY INFORMATION BAD
	SALVAGE? [yn] y
	
	BLK(S) MISSING IN BIT MAPS
	SALVAGE? [yn] y
	
	56770 files, 3812612 used, 6350567 free
		(6759 frags, 792976 blocks, 0.1% fragmentation)
	
	***** FILE SYSTEM MARKED CLEAN *****
	
	***** FILE SYSTEM WAS MODIFIED *****
	# /sbin/fsck /home1
	** /dev/rda2s1e
	** Last Mounted on /mnt
	** Phase 1 - Check Blocks and Sizes
	** Phase 2 - Check Pathnames
	** Phase 3 - Check Connectivity
	** Phase 4 - Check Reference Counts
	** Phase 5 - Check Cyl groups
	56770 files, 3812612 used, 6350567 free
	(6759 frags, 792976 blocks, 0.1% fragmentation)

	We finally upgraded the firmware on the controllers from version 1
	to 9 on Nov 3rd and have been trouble free up to this point-but it
	was usually about a month in between lock ups.

	We were torn as to whether or not this should be reported. For one
	thing we had bum hardware. That is not FreeBSD's fault. On the other
	hand, the sizes reported are obviously rediculously huge and the block
	counts are zero. In the end we decided to let you know and let you 
	decide whether fsck should fix something like this automatically, or
	flag it as something that needs to be manually removed, or not.
How-To-Repeat: 
	Difficult-get a hold of a bum controller and have it munge your filesystem.
	(Not recommended!)
Comment 1 Bruce Evans 1999-11-24 13:07:11 UTC
On Tue, 23 Nov 1999, Kevin J. Meehan wrote:

> >Description:
> 
> 	After a particularly bad week with 3 "double faults", one of 
> 	the 5 filesystems on the RAID subsystem had some odd files 
> 	the system pushed to the lost+found directory:
> 
> 	c--Sr-S--T 1 cheng  si        147, 0x004f00ac Dec 31  1969 #1261947
> 	lrwxr-S--- 1 demlow ncs   4638043271730144200 Dec 31  1969 #2484071@ -> 
> 	s--x--S--- 1 root   wheel 4631321269953217064 Dec 31  1969 #2484095=
> 
> 	The size returned by dump was a large negative number and thus broke
> 	Amanda. An umount and fsck of the filesystem would not fix the above.

Try this fix.  I wrote it to fixed corrupted holey files of size 17TB on ffs
with a blocksize of 8KB while fixing ffs to support such files.  The fixes
are incomplete and have not been committed.

diff -c2 pass1.c~ pass1.c
*** pass1.c~	Sun Aug 29 11:00:46 1999
--- pass1.c	Sun Aug 29 11:00:57 1999
***************
*** 174,178 ****
  	register struct dinode *dp;
  	struct zlncnt *zlnp;
! 	int ndb, j;
  	mode_t mode;
  	char *symbuf;
--- 174,180 ----
  	register struct dinode *dp;
  	struct zlncnt *zlnp;
! 	u_int64_t bigndb;
! 	ufs_daddr_t j;
! 	u_long ndb;
  	mode_t mode;
  	char *symbuf;
***************
*** 210,220 ****
  		inodirty();
  	}
! 	ndb = howmany(dp->di_size, sblock.fs_bsize);
! 	if (ndb < 0) {
  		if (debug)
! 			printf("bad size %qu ndb %d:",
! 				dp->di_size, ndb);
  		goto unknown;
  	}
  	if (mode == IFBLK || mode == IFCHR)
  		ndb++;
--- 212,223 ----
  		inodirty();
  	}
! 	bigndb = howmany(dp->di_size, sblock.fs_bsize);
! 	if (bigndb != 0 && (ufs_daddr_t)(bigndb - 1) != bigndb - 1) {
  		if (debug)
! 			printf("bad size %qu bigndb %qu:",
! 			    dp->di_size, bigndb);
  		goto unknown;
  	}
+ 	ndb = (u_long)bigndb;
  	if (mode == IFBLK || mode == IFCHR)
  		ndb++;
***************
*** 252,256 ****
  		}
  	}
! 	for (j = ndb; j < NDADDR; j++)
  		if (dp->di_db[j] != 0) {
  			if (debug)
--- 255,260 ----
  		}
  	}
! 	if (ndb < NDADDR)
! 	    for (j = ndb; j < NDADDR; j++)
  		if (dp->di_db[j] != 0) {
  			if (debug)
***************
*** 259,263 ****
  			goto unknown;
  		}
! 	for (j = 0, ndb -= NDADDR; ndb > 0; j++)
  		ndb /= NINDIR(&sblock);
  	for (; j < NIADDR; j++)
--- 263,267 ----
  			goto unknown;
  		}
! 	for (j = 0, ndb -= NDADDR; (ufs_daddr_t)ndb > 0; j++)
  		ndb /= NINDIR(&sblock);
  	for (; j < NIADDR; j++)

Bruce
Comment 2 iedowse freebsd_committer freebsd_triage 2001-01-31 15:19:54 UTC
State Changed
From-To: open->closed

Fixed in revision 1.21 of src/sbin/fsck_ffs/pass1.c. I'll merge this 
into -stable in a few days. Thanks for the bug report!