Bug 227204 - Combination of gmirror and enabled softupdates journalling cause slow filesystem degradation
Summary: Combination of gmirror and enabled softupdates journalling cause slow filesys...
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-02 15:46 UTC by Aleksander Derevianko
Modified: 2018-04-02 22:23 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksander Derevianko 2018-04-02 15:46:32 UTC
Combination of gmirror and enabled softupdates journalling cause slow filesystem degradation

Hello!

I'm supporting at least 10 freebsd installations in different parts of the country (remotely). All of them configured with gmirror in two-disk configuration, all filesystems with softupdates enabled, and some of them with softupdates journalling enabled.

Most of installations are rather old, binary updated from 8.x-RELEASE version to 10.3-RELEASE

At least 3 different systems after a while (years) of uptime and multiple reboots (sometimes due to power failure) get the following problem: filesystem unconsistency, causing 
a) kernel panics 
b) forever locks of processes accessing some files or file listing.

In all 3 cases, I have it solved by booting in single-user mode, disabling soft-updates journalling (leaving softupdates only) and performing fsck -f -y on all filesystems.

Typical list of errors found by fsck. Most of dublicates lines omitted:

===================================================================
[root@yakutia /usr/home/der]# fsck -f -y /dev/ada0s1a
** /dev/ada0s1a
** Last Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
UNREF FILE I=5858989  OWNER=root MODE=100600
SIZE=380 MTIME=Oct  2 17:50 2017
CLEAR? yes

UNREF FILE I=5858993  OWNER=root MODE=100600
SIZE=427 MTIME=Oct  2 17:54 2017
CLEAR? yes

** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

149860 files, 918866 used, 31578821 free (15053 frags, 3945471 blocks, 0.0% fragmentation)

***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****
[root@yakutia /usr/home/der]# fsck -f -y /dev/ada0s1d
** /dev/ada0s1d
** Last Mounted on /usr
** Phase 1 - Check Blocks and Sizes
26629536 DUP I=13324783
UNEXPECTED SOFT UPDATE INCONSISTENCY

26629537 DUP I=13324783
UNEXPECTED SOFT UPDATE INCONSISTENCY

...

26629543 DUP I=13324783
UNEXPECTED SOFT UPDATE INCONSISTENCY

26629544 DUP I=13324783
UNEXPECTED SOFT UPDATE INCONSISTENCY

26629545 DUP I=13324783
UNEXPECTED SOFT UPDATE INCONSISTENCY

26629546 DUP I=13324783
UNEXPECTED SOFT UPDATE INCONSISTENCY

EXCESSIVE DUP BLKS I=13324783
CONTINUE? yes

INCORRECT BLOCK COUNT I=13324783 (11200 should be 80)
CORRECT? yes

INCORRECT BLOCK COUNT I=28090681 (8 should be 0)
CORRECT? yes

...

INCORRECT BLOCK COUNT I=28100016 (8 should be 0)
CORRECT? yes

INCORRECT BLOCK COUNT I=28100017 (8 should be 0)
CORRECT? yes

INCORRECT BLOCK COUNT I=28100020 (8 should be 0)
CORRECT? yes

INTERNAL ERROR: dups with softupdates
UNEXPECTED SOFT UPDATE INCONSISTENCY
** Phase 1b - Rescan For More DUPS
26629536 DUP I=13322839
UNEXPECTED SOFT UPDATE INCONSISTENCY

26629537 DUP I=13322839
UNEXPECTED SOFT UPDATE INCONSISTENCY

...

26629545 DUP I=13322839
UNEXPECTED SOFT UPDATE INCONSISTENCY

** Phase 2 - Check Pathnames
DUP/BAD  I=13322839  OWNER=root MODE=100644
SIZE=946634 MTIME=Oct 30 17:43 2017
FILE=/local/lib/libfreetype.a

UNEXPECTED SOFT UPDATE INCONSISTENCY

REMOVE? yes

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
BAD/DUP FILE I=13322839  OWNER=root MODE=100644
SIZE=946634 MTIME=Oct 30 17:43 2017
CLEAR? yes

BAD/DUP FILE I=13324783  OWNER=root MODE=100555
SIZE=5691264 MTIME=Jun 26 22:35 2017
CLEAR? yes

...

UNREF FILE  I=28098948  OWNER=root MODE=100644
SIZE=0 MTIME=Oct 23 20:32 2017
RECONNECT? yes

ZERO LENGTH DIR I=28098949  OWNER=root MODE=40755
SIZE=0 MTIME=Apr  2 23:33 2018
CLEAR? yes

UNREF FILE  I=28098950  OWNER=root MODE=100644
SIZE=0 MTIME=Nov 13 07:12 2016
RECONNECT? yes

UNREF FILE  I=28098951  OWNER=root MODE=100644
SIZE=0 MTIME=Nov 13 07:12 2016
RECONNECT? yes

...
...

UNREF FILE  I=28100020  OWNER=root MODE=100644
SIZE=0 MTIME=Apr  2 23:33 2018
RECONNECT? yes

UNREF FILE  I=28100049  OWNER=root MODE=120755
SIZE=0 MTIME=Apr  2 23:33 2018
RECONNECT? yes

** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

370932 files, 2653812 used, 82406856 free (23056 frags, 10297975 blocks, 0.0% fragmentation)

***** FILE SYSTEM STILL DIRTY *****

***** FILE SYSTEM WAS MODIFIED *****

***** PLEASE RERUN FSCK *****
[root@yakutia /usr/home/der]# fsck -f -y /dev/ada0s1d
** /dev/ada0s1d
** Last Mounted on /usr
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
370932 files, 2653812 used, 82406856 free (23056 frags, 10297975 blocks, 0.0% fragmentation)

***** FILE SYSTEM MARKED CLEAN *****
==========================================================

I can't find direct information in handbook, if journalling softupdates is not compatible with gmirror, so decided to create this bug.

I'm sure that it's not hardware problem - smartctl show nothing on those hard drives, and we actually replace disks ~every 3 years using gmirror.

From the last case I can save broken filesystem image (mounts OK, but cause kernel panic if attempting to write some of the files), if it can be used to find root cause. Due to security reasons I can't give it to anyone outside my company, but I can try to analyse it if proper questions was given.
Comment 1 Kirk McKusick freebsd_committer freebsd_triage 2018-04-02 22:23:39 UTC
This is a problem that is endemic to all overwriting  filesystems that use journalling. Specifically, the journal only checks and corrects things that it knows need to be fixed. Under normal circumstances it knows about everything that might be wrong. Unfortunately most disks are run with `write cache enabled' which means that they can lie about completing writes to stable store. Specifically they report that a write is on the platter (or in the flash) when in fact it is only in the disk's volatile cache. If there is a power-fail event, they are usually able to flush their cache, but not always. Since the journal has been told that the write completed, it does not check for the missed write and the corresponding corruption of the filesystem remains until a full fsck is run (which checks all of the metadata integrity). If the missed write was an update to a cylinder-group map, then you can end up double-allocating a block (such as you see in your example). When an attempt is made to free a double-allocated block you will get a system panic with "freeing free block".

Some systems have tried periodically forcing a full fsck (on the order of every month or so) to catch these types of errors, but the disruption if the reboot happened during a busy period led them to drop this practice. Still it is a good idea to periodically run a full fsck just to ensure that your filesystems stay healthy. If this is not practical you should consider using ZFS which provides a great deal more redundancy and integrity though requires considerably more resources (disk + CPU + memory) for a given storage load than does UFS.

I am closing this report with "Works as Intended" as that is the closest to "This is a known shortcoming of journalled overwriting filesystems".