I have 25TB Dell PERC 6 RAID5 array. When it becomes almost full (10-20GB free), processes which write data to it start eating 100% CPU and write speed drops below 1MB/sec (normally to gives 400MB/sec). 1889 mitya 1 100 0 2058M 1027M CPU12 12 0:47 92.77% dd systat -vm shows disk array is not busy: Disks mfid0 KB/t 63.71 tps 65 MB/s 0.77 %busy 3 If I delete some files to free space during that slow write, the same process starts writing with normal speed. I was running that machine with ~1 year old 9-STABLE without any problems. That array often overflows, and I always got "filesystem is full" error without write speed reduction. The problem appeared after I upgraded to 9.2-BETA2 few days ago. tunefs: POSIX.1e ACLs: (-a) disabled tunefs: NFSv4 ACLs: (-N) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: soft update journaling: (-j) disabled tunefs: gjournal: (-J) disabled tunefs: trim: (-t) disabled tunefs: maximum blocks per file in a cylinder group: (-e) 4096 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 1% tunefs: space to hold for metadata blocks: (-k) 0 tunefs: optimization preference: (-o) space tunefs: volume label: (-L)
Responsible Changed From-To: freebsd-bugs->freebsd-fs Submitter notes this is a recent regression.
I found the exact revision number which broke that: Author: mckusick Date: Mon Apr 22 23:59:00 2013 New Revision: 249782 URL: http://svnweb.freebsd.org/changeset/base/249782
Responsible Changed From-To: freebsd-fs->mckusick Over to UFS maintainer.
Author: mckusick Date: Wed Aug 28 17:38:05 2013 New Revision: 254995 URL: http://svnweb.freebsd.org/changeset/base/254995 Log: A performance problem was reported in PR kern/181226: I have 25TB Dell PERC 6 RAID5 array. When it becomes almost full (10-20GB free), processes which write data to it start eating 100% CPU and write speed drops below 1MB/sec (normally to gives 400MB/sec). The revision at which it first became apparent was http://svnweb.freebsd.org/changeset/base/249782. The offending change reserved an area in each cylinder group to store metadata. The new algorithm attempts to save this area for metadata and allows its use for non-metadata only after all the data areas have been exhausted. The size of the reserved area defaults to half of minfree, so the filesystem reports full before the data area can completely fill. However, in this report, the filesystem has had minfree reduced to 1% thus forcing the metadata area to be used for data. As the filesystem approached full, it had only metadata areas left to allocate. The result was that every block allocation had to scan summary data for 30,000 cylinder groups before falling back to searching up to 30,000 metadata areas. The fix is to give up on saving the metadata areas once the free space reserve drops below 2%. The effect of this change is to use the old algorithm of just accepting the first available block that we find. Since most filesystems use the default 5% minfree, this will have no effect on their operation. For those that want to push to the limit, they will get their crappy block placements quickly. Submitted by: Dmitry Sivachenko Fix Tested by: Dmitry Sivachenko PR: kern/181226 MFC after: 2 weeks Modified: head/sys/ufs/ffs/ffs_alloc.c Modified: head/sys/ufs/ffs/ffs_alloc.c ============================================================================== --- head/sys/ufs/ffs/ffs_alloc.c Wed Aug 28 16:59:55 2013 (r254994) +++ head/sys/ufs/ffs/ffs_alloc.c Wed Aug 28 17:38:05 2013 (r254995) @@ -516,7 +516,13 @@ ffs_reallocblks_ufs1(ap) ip = VTOI(vp); fs = ip->i_fs; ump = ip->i_ump; - if (fs->fs_contigsumsize <= 0) + /* + * If we are not tracking block clusters or if we have less than 2% + * free blocks left, then do not attempt to cluster. Running with + * less than 5% free block reserve is not recommended and those that + * choose to do so do not expect to have good file layout. + */ + if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0) return (ENOSPC); buflist = ap->a_buflist; len = buflist->bs_nchildren; @@ -737,7 +743,13 @@ ffs_reallocblks_ufs2(ap) ip = VTOI(vp); fs = ip->i_fs; ump = ip->i_ump; - if (fs->fs_contigsumsize <= 0) + /* + * If we are not tracking block clusters or if we have less than 2% + * free blocks left, then do not attempt to cluster. Running with + * less than 5% free block reserve is not recommended and those that + * choose to do so do not expect to have good file layout. + */ + if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0) return (ENOSPC); buflist = ap->a_buflist; len = buflist->bs_nchildren; _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: open->patched A working patch has been applied to head. Assuming no problems are reported it will be MFC'ed to 9 in two weeks and this report closed.
Author: mckusick Date: Thu Sep 12 19:36:04 2013 New Revision: 255494 URL: http://svnweb.freebsd.org/changeset/base/255494 Log: MFC of 254995: A performance problem was reported in PR kern/181226: I have 25TB Dell PERC 6 RAID5 array. When it becomes almost full (10-20GB free), processes which write data to it start eating 100% CPU and write speed drops below 1MB/sec (normally to gives 400MB/sec). The revision at which it first became apparent was http://svnweb.freebsd.org/changeset/base/249782. The offending change reserved an area in each cylinder group to store metadata. The new algorithm attempts to save this area for metadata and allows its use for non-metadata only after all the data areas have been exhausted. The size of the reserved area defaults to half of minfree, so the filesystem reports full before the data area can completely fill. However, in this report, the filesystem has had minfree reduced to 1% thus forcing the metadata area to be used for data. As the filesystem approached full, it had only metadata areas left to allocate. The result was that every block allocation had to scan summary data for 30,000 cylinder groups before falling back to searching up to 30,000 metadata areas. The fix is to give up on saving the metadata areas once the free space reserve drops below 2%. The effect of this change is to use the old algorithm of just accepting the first available block that we find. Since most filesystems use the default 5% minfree, this will have no effect on their operation. For those that want to push to the limit, they will get their crappy block placements quickly. Submitted by: Dmitry Sivachenko Fix Tested by: Dmitry Sivachenko PR: kern/181226 MFC of 254996: In looking at block layouts as part of fixing filesystem block allocations under low free-space conditions (-r254995), determine that old block-preference search order used before -r249782 worked a bit better. This change reverts to that block-preference search order. Modified: stable/9/sys/ufs/ffs/ffs_alloc.c Directory Properties: stable/9/sys/ (props changed) Modified: stable/9/sys/ufs/ffs/ffs_alloc.c ============================================================================== --- stable/9/sys/ufs/ffs/ffs_alloc.c Thu Sep 12 18:08:25 2013 (r255493) +++ stable/9/sys/ufs/ffs/ffs_alloc.c Thu Sep 12 19:36:04 2013 (r255494) @@ -516,7 +516,13 @@ ffs_reallocblks_ufs1(ap) ip = VTOI(vp); fs = ip->i_fs; ump = ip->i_ump; - if (fs->fs_contigsumsize <= 0) + /* + * If we are not tracking block clusters or if we have less than 2% + * free blocks left, then do not attempt to cluster. Running with + * less than 5% free block reserve is not recommended and those that + * choose to do so do not expect to have good file layout. + */ + if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0) return (ENOSPC); buflist = ap->a_buflist; len = buflist->bs_nchildren; @@ -736,7 +742,13 @@ ffs_reallocblks_ufs2(ap) ip = VTOI(vp); fs = ip->i_fs; ump = ip->i_ump; - if (fs->fs_contigsumsize <= 0) + /* + * If we are not tracking block clusters or if we have less than 2% + * free blocks left, then do not attempt to cluster. Running with + * less than 5% free block reserve is not recommended and those that + * choose to do so do not expect to have good file layout. + */ + if (fs->fs_contigsumsize <= 0 || freespace(fs, 2) < 0) return (ENOSPC); buflist = ap->a_buflist; len = buflist->bs_nchildren; @@ -1173,7 +1185,7 @@ ffs_dirpref(pip) if (fs->fs_contigdirs[cg] < maxcontigdirs) return ((ino_t)(fs->fs_ipg * cg)); } - for (cg = prefcg - 1; cg >= 0; cg--) + for (cg = 0; cg < prefcg; cg++) if (fs->fs_cs(fs, cg).cs_ndir < maxndir && fs->fs_cs(fs, cg).cs_nifree >= minifree && fs->fs_cs(fs, cg).cs_nbfree >= minbfree) { @@ -1186,7 +1198,7 @@ ffs_dirpref(pip) for (cg = prefcg; cg < fs->fs_ncg; cg++) if (fs->fs_cs(fs, cg).cs_nifree >= avgifree) return ((ino_t)(fs->fs_ipg * cg)); - for (cg = prefcg - 1; cg >= 0; cg--) + for (cg = 0; cg < prefcg; cg++) if (fs->fs_cs(fs, cg).cs_nifree >= avgifree) break; return ((ino_t)(fs->fs_ipg * cg)); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: patched->closed The fixes have been MFC'ed to 9-stable. They are not relevant to earlier versions of the system.