Bug 111146 - [2tb] fsck(8) fails on 6T filesystem
Summary: [2tb] fsck(8) fails on 6T filesystem
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 6.2-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-04-02 17:50 UTC by Dan D Niles
Modified: 2018-05-29 14:49 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dan D Niles 2007-04-02 17:50:00 UTC
     I have a 6T filesystem on a server that crashed.  I cannot fsck 
the filesystem.

# fsck -t ufs -y /dev/da0
fsck_ufs: cannot alloc 1993797728 bytes for inoinfo

I also tried:

# fsck -t ufs -f -p /dev/da0
/dev/da0: UNKNOWN FILE TYPE I=11895232
/dev/da0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

I built a custom kernel with MAXDSIZ and DFLDSIZ just under 3G, and got
the same results.  It was at about 430M in use when it crashed, so the
total would be 2332 M which is less that the size allowed (reported by
limits).

NOTE:  I have temporarily replaced the server.  For a short time I have the
crashed filesystem available for testing and debugging code.  I have
a core dump from the fsck.

How-To-Repeat:    On a 6T filesystem that has crashed, run:
	fsck -t ufs -y /dev/da0
Comment 1 Harrison Grundy 2007-04-04 14:13:57 UTC
How much memory do you have in this system? There is a minimum ammount of
memory required to fsck large filesystems, I've found.

--- Harrison Grundy
Comment 2 Dan D Niles 2007-04-04 15:29:29 UTC
I only have 3G at the moment, but fsck is failing when the resulting
memory usage would be 2.3G.  I have MAXDSIZ and DFLDSIZE set to 2.8G.
I have 2G of swap space, none of which gets used.

I'm getting a little pressure to reformat the array.  Is there any
debugging you would like me to do?

Thanks for your response,

Dan D Niles
Comment 3 Jan Srzednicki 2007-04-08 20:24:55 UTC
Hi,

First of all, show the output of both "ulimit -Sa" and "ulimit -Ha". It
is possible that you may need to raise the soft limit manually.

If the values are all right, try running fsck with strace/truss and show
the result.

-- 
  Jan Srzednicki  ::  http://wrzask.pl/
  "Remember, remember, the fifth of November"
                                     -- V for Vendetta
Comment 4 Dan D Niles 2007-04-09 17:12:46 UTC
# ulimit -Sa
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) 2935808
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 11095
pipe size          (512 bytes, -p) 1
stack size            (kbytes, -s) 65536
cpu time             (seconds, -t) unlimited
max user processes            (-u) 5547
virtual memory        (kbytes, -v) unlimited

# ulimit -Ha
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) 2935808
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 11095
pipe size          (512 bytes, -p) 1
stack size            (kbytes, -s) 65536
cpu time             (seconds, -t) unlimited
max user processes            (-u) 5547
virtual memory        (kbytes, -v) unlimited

I've ordered a SCSI card to move the raid device to a server that I can
bring up to 8G of ram.  I'm hoping the card gets here before I need to
give the array back.

I'll run fsck with truss and see with I find out.

Thanks,

Dan
Comment 5 Dan D Niles 2007-04-09 17:27:10 UTC
On Sun, 2007-04-08 at 21:24 +0200, Jan Srzednicki wrote:
> 
> If the values are all right, try running fsck with strace/truss and show
> the result.
> 

I added a debugging print statement to fsck_ffs, and sent it a SIGINFO
every two seconds.   Here is the tail of the output, and the tail of the
truss output.

It seems like it is allocation space for < 10k inodes at a time until it
fails.  When it fails it is trying to allocate space for 1.5g inodes.
Is that normal?

/dev/da0: phase 1: cyl group 2223 of 33666 (6%)
Trying to calloc space for 2240 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 448 inodes
Trying to calloc space for 6208 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 768 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 448 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 448 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 448 inodes
Trying to calloc space for 4032 inodes
Trying to calloc space for 6208 inodes
Trying to calloc space for 1664 inodes
/dev/da0: phase 1: cyl group 2252 of 33666 (6%)
Trying to calloc space for 3584 inodes
/dev/da0: phase 1: cyl group 2253 of 33666 (6%)
Trying to calloc space for 448 inodes
Trying to calloc space for 3648 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 4352 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 5376 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 448 inodes
Trying to calloc space for 384 inodes
Trying to calloc space for 448 inodes
Trying to calloc space for 1572191256 inodes
fsck_ffs: cannot alloc 1993797728 bytes for inoinfo


919: break(0x22ab2000)                         = 0 (0x0)
 1919: break(0x22ab3000)                         = 0 (0x0)
 1919: lseek(4,0x6570640000,SEEK_SET)            = 1885601792
(0x70640000)
 1919: read(4,"\M-mA\^D\0\M-k\^C\0\0\M-j\^C\0\0"...,65536) = 65536
(0x10000)
 1919: lseek(4,0x657bdf0000,SEEK_SET)            = 2078212096
(0x7bdf0000)
 1919: read(4,"\0\0\0\0U\^B\t\0004\^[\^EF\M-V\b"...,16384) = 16384
(0x4000)
 1919: write(1,"Trying to calloc space for 448 i"...,38) = 38 (0x26)
 1919: lseek(4,0x657bdf4000,SEEK_SET)            = 2078228480
(0x7bdf4000)
 1919: read(4,"\M-mA\^B\0\M-k\^C\0\0\M-j\^C\0\0"...,65536) = 65536
(0x10000)
 1919: break(0x22ab4000)                         = 0 (0x0)
 1919: lseek(4,0x657be04000,SEEK_SET)            = 2078294016
(0x7be04000)
 1919: read(4,"\0\0\0\0000\0\0\0000\0\0\0\0\0\0"...,65536) = 65536
(0x10000)
 1919: lseek(4,0x65875b4000,SEEK_SET)            = -2024062976
(0x875b4000)
 1919: read(4,"\0\0\M-'\M-K,\M^H\M-:\M-Q*\^C\0"...,16384) = 16384
(0x4000)
 1919: write(1,"Trying to calloc space for 15721"...,45) = 45 (0x2d)
 1919: write(2,"fsck_ffs: ",10)                  = 10 (0xa)
 1919: write(2,"cannot alloc 1993797728 bytes fo"...,41) = 41 (0x29)
 1919: write(2,"\n",1)                           = 1 (0x1)
 1919: exit(0x8)
 1919: process exit, rval = 2048
Comment 6 Jan Srzednicki 2007-04-09 20:48:52 UTC
> It seems like it is allocation space for < 10k inodes at a time until it
> fails.  When it fails it is trying to allocate space for 1.5g inodes.
> Is that normal?

Check with dumpfs how many inodes are there in your filesystem.

-- 
  Jan Srzednicki  ::  http://wrzask.pl/
  "Remember, remember, the fifth of November"
                                     -- V for Vendetta
Comment 7 Dan D Niles 2007-04-09 21:09:28 UTC
On Mon, 2007-04-09 at 21:48 +0200, Jan Srzednicki wrote:
> Check with dumpfs how many inodes are there in your filesystem.

dumpfs seg-faulted and dumped core.  It spit out this info before core
dumping:

magic   19540119 (UFS2) time    Wed Mar 28 14:00:00 2007
superblock location     65536   id      [ 43d90071 e579e310 ]
ncg     33666   size    3167475584      blocks  3067823920
bsize   16384   shift   14      mask    0xffffc000
fsize   2048    shift   11      mask    0xfffff800
frag    8       shift   3       fsbtodb 2
minfree 8%      optim   time    symlinklen 120
maxbsize 16384  maxbpg  2048    maxcontig 8     contigsumsize 8
nbfree  159788467       ndir    2581658 nifree  784218256       nffree
1488762
bpg     11761   fpg     94088   ipg     23552
nindir  2048    inopb   64      maxfilesize     140806241583103
sbsize  2048    cgsize  16384   csaddr  3000    cssize  540672
sblkno  40      cblkno  48      iblkno  56      dblkno  3000
cgrotor 28218   fmod    0       ronly   0       clean   0
avgfpdir 64     avgfilesize 16384
flags   unclean 
fsmnt   /LSO
volname         swuid   0

cs[].cs_(nbfree,ndir,nifree,nffree):
        (4606,234,23288,6) (3955,223,23288,24) (80,0,23223,753)
(3,226,23298,8) 
        (16,87,23338,81) (3,227,23298,7) (2436,185,23340,19)
(4330,891,21577,21)
 
        (3971,170,23288,6) (1967,186,23336,33) (1812,177,23342,48)
(6639,199,233
24,50) 
        (6084,236,23288,16) (5213,224,23300,16) (5211,232,23287,19)
(6042,237,23
288,8) 
        (5213,236,23288,11) (5213,237,23288,10) (6120,237,23288,59)
(1363,226,23
298,219) 
        (5193,235,23288,60) (4,227,23298,8) (3059,197,23298,30)
(5218,199,23288,
9) 
        (6137,363,22338,9) (5221,174,23288,9) (5213,200,23288,48)
(4323,199,2328
8,42) 
[clipped]
Comment 8 Jan Srzednicki 2007-04-09 21:13:36 UTC
On Mon, Apr 09, 2007 at 03:09:28PM -0500, Dan D Niles wrote:
> On Mon, 2007-04-09 at 21:48 +0200, Jan Srzednicki wrote:
> > Check with dumpfs how many inodes are there in your filesystem.
> 
> dumpfs seg-faulted and dumped core.  It spit out this info before core
> dumping:

That's kinda strange, dumpfs never did that to me. It appears to me that
this filesystem has got quite severely corrupted. Did you try newfs on
it?

And another thing: try tuning up the -i, -f and -b parameters to newfs.
I assume that on such a big filesystem average filesize will be much
bigger than the "UNIX default" (10k), so you can safely set these to
their maximums (and allocate inodes more scarcely).

-- 
  Jan Srzednicki  ::  http://wrzask.pl/
  "Remember, remember, the fifth of November"
                                     -- V for Vendetta
Comment 9 Dan D Niles 2007-04-09 21:30:23 UTC
On Mon, 2007-04-09 at 22:13 +0200, Jan Srzednicki wrote:
> That's kinda strange, dumpfs never did that to me. It appears to me
> that
> this filesystem has got quite severely corrupted. Did you try newfs on
> it?

Not yet.  I'd like to figure out why I can't fsck it first.  Running
newfs on your backup disk is not a viable solution.  There is data I
cannot pull of the disk.  If my primary storage had crashed also, I'd be
hosed.

> And another thing: try tuning up the -i, -f and -b parameters to
> newfs.
> I assume that on such a big filesystem average filesize will be much
> bigger than the "UNIX default" (10k), so you can safely set these to
> their maximums (and allocate inodes more scarcely).

Running df reports 8683374 inodes used and 784218256 free.  This could
be wrong since the filesystem is dirty and mounted ro.

FreeBSD's newfs scales things automatically, though perhaps not enough:

tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: optimization preference: (-o)                      time
Comment 10 Jan Srzednicki 2007-04-09 21:39:52 UTC
On Mon, Apr 09, 2007 at 03:30:23PM -0500, Dan D Niles wrote:
> On Mon, 2007-04-09 at 22:13 +0200, Jan Srzednicki wrote:
> > That's kinda strange, dumpfs never did that to me. It appears to me
> > that
> > this filesystem has got quite severely corrupted. Did you try newfs on
> > it?
> 
> Not yet.  I'd like to figure out why I can't fsck it first.  Running
> newfs on your backup disk is not a viable solution.  There is data I
> cannot pull of the disk.  If my primary storage had crashed also, I'd be
> hosed.

Well, you need to take into the account that your data may be hosed.
Backup your primary storage NOW. :)

> > And another thing: try tuning up the -i, -f and -b parameters to
> > newfs.
> > I assume that on such a big filesystem average filesize will be much
> > bigger than the "UNIX default" (10k), so you can safely set these to
> > their maximums (and allocate inodes more scarcely).
> 
> Running df reports 8683374 inodes used and 784218256 free.  This could
> be wrong since the filesystem is dirty and mounted ro.
> 
> FreeBSD's newfs scales things automatically, though perhaps not enough:

It does not scale anything. Last time I checked (a few years ago) even
the -g option did not make any difference either, so I had to tune
things up manually with -i, -f and -b.

> tunefs: maximum blocks per file in a cylinder group: (-e)  2048
> tunefs: average file size: (-f)                            16384
> tunefs: average number of files in a directory: (-s)       64
> tunefs: minimum percentage of free space: (-m)             8%
> tunefs: optimization preference: (-o)                      time

These are the default values for any filesystem, regardles of it's size.

-- 
  Jan Srzednicki  ::  http://wrzask.pl/
  "Remember, remember, the fifth of November"
                                     -- V for Vendetta
Comment 11 Dan D Niles 2007-04-16 20:08:57 UTC
I attached the failed raid device to a newer server with 8G of RAM.  I
booted to an amd64 kernel, and set datasize limit to 7G. 

Resource limits (current):
  cputime          infinity secs
  filesize         infinity kB
  datasize          7340032 kB
  stacksize-cur        8192 kB
  coredumpsize     infinity kB
  memoryuse-cur     8093236 kB
  memorylocked-cur  1299644 kB
  maxprocesses         6164
  openfiles           12328
  sbsize           infinity bytes
  vmemoryuse       infinity kB


Now when I run fsck I get:

** /dev/da0
** Last Mounted on /LSO
** Phase 1 - Check Blocks and Sizes
fsck_ffs: bad inode number 53321728 to nextinode

My theory is that some bits got flipped in the meta-data and
cg_initediblk is getting a bad value.  The value of 1,572,191,256 that
it returns just before it fails is greater than the total number of
inodes, which is around 784,218,256.

It is distressing that some bits in the meta-data could get flipped
during normal usage resulting in an unusable filesystem.

I have 19 hours before I need to reformat the array and put it back into
production.  Is there anything else I should try before then?

Thanks,

Dan
Comment 12 Mark Linimon freebsd_committer freebsd_triage 2007-04-25 23:28:39 UTC
State Changed
From-To: open->feedback

To submitter: did the fsck fix this problem? 


Comment 13 Mark Linimon freebsd_committer freebsd_triage 2007-04-25 23:28:39 UTC
Responsible Changed
From-To: freebsd-bugs->linimon
Comment 14 Mark Linimon freebsd_committer freebsd_triage 2007-04-27 00:02:37 UTC
State Changed
From-To: feedback->suspended

Submitter had to format the drive, so we can't duplicate this right now. 
Set this to 'suspended' to note that it is a problem that probably still 
needs investigating.
Comment 15 Mark Linimon freebsd_committer freebsd_triage 2007-06-12 04:40:30 UTC
Responsible Changed
From-To: linimon->freebsd-bugs

Return this one to the pool.
Comment 16 Mark Linimon freebsd_committer freebsd_triage 2009-05-18 05:33:11 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 17 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:49:18 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 18 Kirk McKusick freebsd_committer freebsd_triage 2018-05-29 14:49:55 UTC
The problem reported here was caused by a corruption in the value of cg_initediblks. Extensive checks have been added to fsck_ffs to validate and correct excessive values in this field. Thus, errors of the sort reported here will no longer occur.