Bug 269261 - data corruption with fspacectl and mmap
Summary: data corruption with fspacectl and mmap
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Konstantin Belousov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-01-31 00:35 UTC by Alan Somers
Modified: 2024-01-02 03:25 UTC (History)
1 user (show)

See Also:
asomers: mfc-stable13?
asomers: mfc-stable12?


Attachments
Add a test case for fusefs (2.35 KB, patch)
2023-01-31 02:27 UTC, Alan Somers
no flags Details | Diff
standalone test case for fusefs or UFS (2.89 KB, text/plain)
2023-02-03 23:05 UTC, Alan Somers
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Somers freebsd_committer freebsd_triage 2023-01-31 00:35:01 UTC
When using mmap to read and write to a file, intermixed with fspacectl, data corruption can occur.  It seems like a cacheing bug, as though the data written via mmap doesn't get evicted from the cache during fspacectl, and is subsequently returned via mmap reads.  I can reliably reproduce this bug on UFS and fusefs, but not ZFS, tmpfs or fusefs-ext2.

Steps to reproduce:
0) Install a Rust toolchain
1) checkout https://github.com/asomers/fsx-rs.git at rev c3e726d
2) cd fsx-rs
3) cargo build
4) truncate -s 1g /tmp/ufs.img
5) sudo mdconfig -a -t vnode -f /tmp/ufs.img
6) sudo newfs /dev/md0
7) sudo mount /dev/md0 /mnt
8) sudo mkdir /mnt/tmp
9) sudo chmod 1777 /mnt/tmp
10) cat <<HERE > fsx.toml
  nomsyncafterwrite = true
  [weights]
  close_open = 0.1
  invalidate = 0.2
  truncate = 1
  fsync = 1
  fdatasync = 1
  punch_hole = 100
HERE
10) env RUST_LOG=debug cargo run -- -f fsx.toml -N 1000 -P /tmp -S 2153242284826767701 /mnt/tmp/fsx.bin

The output will look like this:
[INFO  fsx] Using seed 2153242284826767701
[DEBUG fsx]   1 skipping zero size hole punch
[DEBUG fsx]   2 skipping zero size hole punch
[DEBUG fsx]   3 skipping zero size hole punch
[DEBUG fsx]   4 skipping zero size hole punch
[DEBUG fsx]   5 skipping zero size read
[DEBUG fsx]   6 skipping zero size hole punch
[DEBUG fsx]   7 skipping zero size hole punch
[DEBUG fsx]   8 skipping zero size hole punch
[DEBUG fsx]   9 skipping zero size hole punch
[DEBUG fsx]  10 skipping zero size hole punch
[DEBUG fsx]  11 skipping zero size hole punch
[INFO  fsx]  12 mapwrite 0x1ffb4 .. 0x280a4 ( 0x80f1 bytes)
[INFO  fsx]  13 punch_hole  0xe4e5 .. 0x185c2 ( 0xa0de bytes)
[INFO  fsx]  14 punch_hole 0x27eae .. 0x280a4 (  0x1f7 bytes)
[INFO  fsx]  15 punch_hole  0xee3c .. 0x17541 ( 0x8706 bytes)
[INFO  fsx]  16 punch_hole 0x24fc5 .. 0x280a4 ( 0x30e0 bytes)
[INFO  fsx]  17 punch_hole 0x27dc2 .. 0x280a4 (  0x2e3 bytes)
[INFO  fsx]  18 punch_hole 0x14efa .. 0x16be0 ( 0x1ce7 bytes)
[INFO  fsx]  19 mapread   0xe210 .. 0x1153c ( 0x332d bytes)
[INFO  fsx]  20 mapread  0x1159f .. 0x1cb3d ( 0xb59f bytes)
[INFO  fsx]  21 mapread  0x16252 .. 0x21bd3 ( 0xb982 bytes)
[INFO  fsx]  22 punch_hole  0x2c14 ..  0x2d44 (  0x131 bytes)
[INFO  fsx]  23 punch_hole  0xc1b4 .. 0x18eed ( 0xcd3a bytes)
[INFO  fsx]  24 mapwrite 0x36f14 .. 0x3ffff ( 0x90ec bytes)
[INFO  fsx]  25 read      0xe4a9 .. 0x16bf9 ( 0x8751 bytes)
[INFO  fsx]  26 punch_hole  0xeedd .. 0x13904 ( 0x4a28 bytes)
[INFO  fsx]  27 mapwrite 0x2a9e0 .. 0x2c675 ( 0x1c96 bytes)
[INFO  fsx]  28 punch_hole 0x13374 .. 0x1f95e ( 0xc5eb bytes)
[INFO  fsx]  29 mapread   0xff83 .. 0x1bcb8 ( 0xbd36 bytes)
[INFO  fsx]  30 mapwrite 0x3cc44 .. 0x3ffff ( 0x33bc bytes)
[INFO  fsx]  31 mapwrite 0x14b65 .. 0x1969b ( 0x4b37 bytes)
[INFO  fsx]  32 write     0xcc6e .. 0x152f6 ( 0x8689 bytes)
[INFO  fsx]  33 write    0x30da5 .. 0x340ae ( 0x330a bytes)
[INFO  fsx]  34 punch_hole 0x3b300 .. 0x3ffff ( 0x4d00 bytes)
[INFO  fsx]  35 read     0x3d33c .. 0x3ffff ( 0x2cc4 bytes)
[INFO  fsx]  36 punch_hole 0x279cf .. 0x30304 ( 0x8936 bytes)
[INFO  fsx]  37 mapread  0x2441c .. 0x2d04e ( 0x8c33 bytes)
[ERROR fsx] miscompare: offset= 0x2441c, size = 0x8c33
[ERROR fsx] OFFSET  GOOD  BAD  RANGE  
[ERROR fsx] 0x24fc5 0x00 0x4f  0x3024
[ERROR fsx] Step# (mod 256) for a misdirected write may be 12
[ERROR fsx] LOG DUMP
[ERROR fsx]   0 SKIPPED  (punch_hole)
[ERROR fsx]   1 SKIPPED  (punch_hole)
[ERROR fsx]   2 SKIPPED  (punch_hole)
[ERROR fsx]   3 SKIPPED  (punch_hole)
[ERROR fsx]   4 SKIPPED  (read)
[ERROR fsx]   5 SKIPPED  (punch_hole)
[ERROR fsx]   6 SKIPPED  (punch_hole)
[ERROR fsx]   7 SKIPPED  (punch_hole)
[ERROR fsx]   8 SKIPPED  (punch_hole)
[ERROR fsx]   9 SKIPPED  (punch_hole)
[ERROR fsx]  10 SKIPPED  (punch_hole)
[ERROR fsx]  11 MAPWRITE 0x1ffb4 => 0x280a5 ( 0x80f1 bytes) HOLE
[ERROR fsx]  12 PUNCH_HOLE  0xe4e5 => 0x185c2 ( 0xa0de bytes)
[ERROR fsx]  13 PUNCH_HOLE 0x27eae => 0x280a4 (  0x1f7 bytes)
[ERROR fsx]  14 PUNCH_HOLE  0xee3c => 0x17541 ( 0x8706 bytes)
[ERROR fsx]  15 PUNCH_HOLE 0x24fc5 => 0x280a4 ( 0x30e0 bytes)
[ERROR fsx]  16 PUNCH_HOLE 0x27dc2 => 0x280a4 (  0x2e3 bytes)
[ERROR fsx]  17 PUNCH_HOLE 0x14efa => 0x16be0 ( 0x1ce7 bytes)
[ERROR fsx]  18 MAPREAD   0xe210 => 0x1153d ( 0x332d bytes)
[ERROR fsx]  19 MAPREAD  0x1159f => 0x1cb3e ( 0xb59f bytes)
[ERROR fsx]  20 MAPREAD  0x16252 => 0x21bd4 ( 0xb982 bytes)
[ERROR fsx]  21 PUNCH_HOLE  0x2c14 =>  0x2d44 (  0x131 bytes)
[ERROR fsx]  22 PUNCH_HOLE  0xc1b4 => 0x18eed ( 0xcd3a bytes)
[ERROR fsx]  23 MAPWRITE 0x36f14 => 0x40000 ( 0x90ec bytes) HOLE
[ERROR fsx]  24 READ      0xe4a9 => 0x16bfa ( 0x8751 bytes)
[ERROR fsx]  25 PUNCH_HOLE  0xeedd => 0x13904 ( 0x4a28 bytes)
[ERROR fsx]  26 MAPWRITE 0x2a9e0 => 0x2c676 ( 0x1c96 bytes)
[ERROR fsx]  27 PUNCH_HOLE 0x13374 => 0x1f95e ( 0xc5eb bytes)
[ERROR fsx]  28 MAPREAD   0xff83 => 0x1bcb9 ( 0xbd36 bytes)
[ERROR fsx]  29 MAPWRITE 0x3cc44 => 0x40000 ( 0x33bc bytes)
[ERROR fsx]  30 MAPWRITE 0x14b65 => 0x1969c ( 0x4b37 bytes)
[ERROR fsx]  31 WRITE     0xcc6e => 0x152f7 ( 0x8689 bytes)
[ERROR fsx]  32 WRITE    0x30da5 => 0x340af ( 0x330a bytes)
[ERROR fsx]  33 PUNCH_HOLE 0x3b300 => 0x3ffff ( 0x4d00 bytes)
[ERROR fsx]  34 READ     0x3d33c => 0x40000 ( 0x2cc4 bytes)
[ERROR fsx]  35 PUNCH_HOLE 0x279cf => 0x30304 ( 0x8936 bytes)
[ERROR fsx]  36 MAPREAD  0x2441c => 0x2d04f ( 0x8c33 bytes)

I also have a unit test that can reproduce this with fusefs in 3 operations.
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2023-01-31 00:37:37 UTC
Could you please provide a simple standalone reproducer for UFS?
Comment 2 Alan Somers freebsd_committer freebsd_triage 2023-01-31 02:27:40 UTC
Created attachment 239822 [details]
Add a test case for fusefs

Here is a minimal test case for fusefs.  I can probably come up with a standalone program tomorrow.  Also, I discovered that the bug goes away if I msync() after doing the mmap write.
Comment 3 Alan Somers freebsd_committer freebsd_triage 2023-02-03 23:05:26 UTC
Created attachment 239891 [details]
standalone test case for fusefs or UFS

This is a standalone test case that reproduces the problem for either UFS or fusefs.  just compile it, and execute with "./fspace-and-mmap /path/to/filesystem/x.bin".

Looking at the debug output of a fusefs file system, the fuse daemon gets the FUSE_FALLOCATE operation before it ever sees the FUSE_WRITE, which seems wrong.  I think the VFS needs to flush mapped pages in vn_deallocate_impl.
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2023-02-04 01:33:34 UTC
https://reviews.freebsd.org/D38379 for UFS
No idea about fuse
Comment 5 Alan Somers freebsd_committer freebsd_triage 2023-02-04 17:05:42 UTC
(In reply to Konstantin Belousov from comment #4)
Kib's patch looks good for UFS.  However, I've determined that the corruption in fusefs has a different root cause.  And it's more general, too.  It can be triggered by mmap()ing a file, then setting O_DIRECT and doing a write.  It doesn't require fspacectl.  Going forward, let's restrict this bug to discussion of UFS.
Comment 6 commit-hook freebsd_committer freebsd_triage 2023-02-04 18:33:16 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3b6056204dd80cc866b7998ef0776247ebc42ce4

commit 3b6056204dd80cc866b7998ef0776247ebc42ce4
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-02-04 01:20:19 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-02-04 18:32:07 +0000

    FIOSEEKHOLE/FIOSEEKDATA: correct consistency for bmap-based implementation

    Writes on UFS through a mapped region do not allocate disk blocks in
    holes immediately. The blocks are allocated when the pages are paged out
    first time.

    This breaks the algorithm in vn_bmap_seekhole() and ufs_bmap_seekdata(),
    because VOP_BMAP() reports hole for the place which already contains a
    valid data.

    Clean the pages before doing VOP_BMAP() in the affected functions.  In
    principle, we could clean less by only requesting clean starting from
    the offset, but it is probably not very important.

    PR:     269261
    Reported by:    asomers
    Reviewed by:    asomers, markj
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D38379

 sys/kern/vfs_vnops.c    | 14 ++++++++++++--
 sys/ufs/ufs/ufs_bmap.c  | 18 ++++++++++++++++++
 sys/ufs/ufs/ufs_vnops.c |  2 +-
 3 files changed, 31 insertions(+), 3 deletions(-)
Comment 7 Alan Somers freebsd_committer freebsd_triage 2023-02-04 18:46:41 UTC
assign to committer
Comment 8 commit-hook freebsd_committer freebsd_triage 2023-02-11 00:36:49 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=51485f81b01a03752ffaf530abfb570ae4593fae

commit 51485f81b01a03752ffaf530abfb570ae4593fae
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-02-04 01:20:19 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-02-11 00:25:02 +0000

    FIOSEEKHOLE/FIOSEEKDATA: correct consistency for bmap-based implementation

    PR:     269261

    (cherry picked from commit 3b6056204dd80cc866b7998ef0776247ebc42ce4)

 sys/kern/vfs_vnops.c    | 12 +++++++++++-
 sys/ufs/ufs/ufs_bmap.c  | 18 ++++++++++++++++++
 sys/ufs/ufs/ufs_vnops.c |  2 +-
 3 files changed, 30 insertions(+), 2 deletions(-)
Comment 9 Mark Linimon freebsd_committer freebsd_triage 2024-01-02 03:25:40 UTC
^Triage: the 12 branch is now out of support.