When using mmap to read and write to a file, intermixed with fspacectl, data corruption can occur. It seems like a cacheing bug, as though the data written via mmap doesn't get evicted from the cache during fspacectl, and is subsequently returned via mmap reads. I can reliably reproduce this bug on UFS and fusefs, but not ZFS, tmpfs or fusefs-ext2. Steps to reproduce: 0) Install a Rust toolchain 1) checkout https://github.com/asomers/fsx-rs.git at rev c3e726d 2) cd fsx-rs 3) cargo build 4) truncate -s 1g /tmp/ufs.img 5) sudo mdconfig -a -t vnode -f /tmp/ufs.img 6) sudo newfs /dev/md0 7) sudo mount /dev/md0 /mnt 8) sudo mkdir /mnt/tmp 9) sudo chmod 1777 /mnt/tmp 10) cat <<HERE > fsx.toml nomsyncafterwrite = true [weights] close_open = 0.1 invalidate = 0.2 truncate = 1 fsync = 1 fdatasync = 1 punch_hole = 100 HERE 10) env RUST_LOG=debug cargo run -- -f fsx.toml -N 1000 -P /tmp -S 2153242284826767701 /mnt/tmp/fsx.bin The output will look like this: [INFO fsx] Using seed 2153242284826767701 [DEBUG fsx] 1 skipping zero size hole punch [DEBUG fsx] 2 skipping zero size hole punch [DEBUG fsx] 3 skipping zero size hole punch [DEBUG fsx] 4 skipping zero size hole punch [DEBUG fsx] 5 skipping zero size read [DEBUG fsx] 6 skipping zero size hole punch [DEBUG fsx] 7 skipping zero size hole punch [DEBUG fsx] 8 skipping zero size hole punch [DEBUG fsx] 9 skipping zero size hole punch [DEBUG fsx] 10 skipping zero size hole punch [DEBUG fsx] 11 skipping zero size hole punch [INFO fsx] 12 mapwrite 0x1ffb4 .. 0x280a4 ( 0x80f1 bytes) [INFO fsx] 13 punch_hole 0xe4e5 .. 0x185c2 ( 0xa0de bytes) [INFO fsx] 14 punch_hole 0x27eae .. 0x280a4 ( 0x1f7 bytes) [INFO fsx] 15 punch_hole 0xee3c .. 0x17541 ( 0x8706 bytes) [INFO fsx] 16 punch_hole 0x24fc5 .. 0x280a4 ( 0x30e0 bytes) [INFO fsx] 17 punch_hole 0x27dc2 .. 0x280a4 ( 0x2e3 bytes) [INFO fsx] 18 punch_hole 0x14efa .. 0x16be0 ( 0x1ce7 bytes) [INFO fsx] 19 mapread 0xe210 .. 0x1153c ( 0x332d bytes) [INFO fsx] 20 mapread 0x1159f .. 0x1cb3d ( 0xb59f bytes) [INFO fsx] 21 mapread 0x16252 .. 0x21bd3 ( 0xb982 bytes) [INFO fsx] 22 punch_hole 0x2c14 .. 0x2d44 ( 0x131 bytes) [INFO fsx] 23 punch_hole 0xc1b4 .. 0x18eed ( 0xcd3a bytes) [INFO fsx] 24 mapwrite 0x36f14 .. 0x3ffff ( 0x90ec bytes) [INFO fsx] 25 read 0xe4a9 .. 0x16bf9 ( 0x8751 bytes) [INFO fsx] 26 punch_hole 0xeedd .. 0x13904 ( 0x4a28 bytes) [INFO fsx] 27 mapwrite 0x2a9e0 .. 0x2c675 ( 0x1c96 bytes) [INFO fsx] 28 punch_hole 0x13374 .. 0x1f95e ( 0xc5eb bytes) [INFO fsx] 29 mapread 0xff83 .. 0x1bcb8 ( 0xbd36 bytes) [INFO fsx] 30 mapwrite 0x3cc44 .. 0x3ffff ( 0x33bc bytes) [INFO fsx] 31 mapwrite 0x14b65 .. 0x1969b ( 0x4b37 bytes) [INFO fsx] 32 write 0xcc6e .. 0x152f6 ( 0x8689 bytes) [INFO fsx] 33 write 0x30da5 .. 0x340ae ( 0x330a bytes) [INFO fsx] 34 punch_hole 0x3b300 .. 0x3ffff ( 0x4d00 bytes) [INFO fsx] 35 read 0x3d33c .. 0x3ffff ( 0x2cc4 bytes) [INFO fsx] 36 punch_hole 0x279cf .. 0x30304 ( 0x8936 bytes) [INFO fsx] 37 mapread 0x2441c .. 0x2d04e ( 0x8c33 bytes) [ERROR fsx] miscompare: offset= 0x2441c, size = 0x8c33 [ERROR fsx] OFFSET GOOD BAD RANGE [ERROR fsx] 0x24fc5 0x00 0x4f 0x3024 [ERROR fsx] Step# (mod 256) for a misdirected write may be 12 [ERROR fsx] LOG DUMP [ERROR fsx] 0 SKIPPED (punch_hole) [ERROR fsx] 1 SKIPPED (punch_hole) [ERROR fsx] 2 SKIPPED (punch_hole) [ERROR fsx] 3 SKIPPED (punch_hole) [ERROR fsx] 4 SKIPPED (read) [ERROR fsx] 5 SKIPPED (punch_hole) [ERROR fsx] 6 SKIPPED (punch_hole) [ERROR fsx] 7 SKIPPED (punch_hole) [ERROR fsx] 8 SKIPPED (punch_hole) [ERROR fsx] 9 SKIPPED (punch_hole) [ERROR fsx] 10 SKIPPED (punch_hole) [ERROR fsx] 11 MAPWRITE 0x1ffb4 => 0x280a5 ( 0x80f1 bytes) HOLE [ERROR fsx] 12 PUNCH_HOLE 0xe4e5 => 0x185c2 ( 0xa0de bytes) [ERROR fsx] 13 PUNCH_HOLE 0x27eae => 0x280a4 ( 0x1f7 bytes) [ERROR fsx] 14 PUNCH_HOLE 0xee3c => 0x17541 ( 0x8706 bytes) [ERROR fsx] 15 PUNCH_HOLE 0x24fc5 => 0x280a4 ( 0x30e0 bytes) [ERROR fsx] 16 PUNCH_HOLE 0x27dc2 => 0x280a4 ( 0x2e3 bytes) [ERROR fsx] 17 PUNCH_HOLE 0x14efa => 0x16be0 ( 0x1ce7 bytes) [ERROR fsx] 18 MAPREAD 0xe210 => 0x1153d ( 0x332d bytes) [ERROR fsx] 19 MAPREAD 0x1159f => 0x1cb3e ( 0xb59f bytes) [ERROR fsx] 20 MAPREAD 0x16252 => 0x21bd4 ( 0xb982 bytes) [ERROR fsx] 21 PUNCH_HOLE 0x2c14 => 0x2d44 ( 0x131 bytes) [ERROR fsx] 22 PUNCH_HOLE 0xc1b4 => 0x18eed ( 0xcd3a bytes) [ERROR fsx] 23 MAPWRITE 0x36f14 => 0x40000 ( 0x90ec bytes) HOLE [ERROR fsx] 24 READ 0xe4a9 => 0x16bfa ( 0x8751 bytes) [ERROR fsx] 25 PUNCH_HOLE 0xeedd => 0x13904 ( 0x4a28 bytes) [ERROR fsx] 26 MAPWRITE 0x2a9e0 => 0x2c676 ( 0x1c96 bytes) [ERROR fsx] 27 PUNCH_HOLE 0x13374 => 0x1f95e ( 0xc5eb bytes) [ERROR fsx] 28 MAPREAD 0xff83 => 0x1bcb9 ( 0xbd36 bytes) [ERROR fsx] 29 MAPWRITE 0x3cc44 => 0x40000 ( 0x33bc bytes) [ERROR fsx] 30 MAPWRITE 0x14b65 => 0x1969c ( 0x4b37 bytes) [ERROR fsx] 31 WRITE 0xcc6e => 0x152f7 ( 0x8689 bytes) [ERROR fsx] 32 WRITE 0x30da5 => 0x340af ( 0x330a bytes) [ERROR fsx] 33 PUNCH_HOLE 0x3b300 => 0x3ffff ( 0x4d00 bytes) [ERROR fsx] 34 READ 0x3d33c => 0x40000 ( 0x2cc4 bytes) [ERROR fsx] 35 PUNCH_HOLE 0x279cf => 0x30304 ( 0x8936 bytes) [ERROR fsx] 36 MAPREAD 0x2441c => 0x2d04f ( 0x8c33 bytes) I also have a unit test that can reproduce this with fusefs in 3 operations.
Could you please provide a simple standalone reproducer for UFS?
Created attachment 239822 [details] Add a test case for fusefs Here is a minimal test case for fusefs. I can probably come up with a standalone program tomorrow. Also, I discovered that the bug goes away if I msync() after doing the mmap write.
Created attachment 239891 [details] standalone test case for fusefs or UFS This is a standalone test case that reproduces the problem for either UFS or fusefs. just compile it, and execute with "./fspace-and-mmap /path/to/filesystem/x.bin". Looking at the debug output of a fusefs file system, the fuse daemon gets the FUSE_FALLOCATE operation before it ever sees the FUSE_WRITE, which seems wrong. I think the VFS needs to flush mapped pages in vn_deallocate_impl.
https://reviews.freebsd.org/D38379 for UFS No idea about fuse
(In reply to Konstantin Belousov from comment #4) Kib's patch looks good for UFS. However, I've determined that the corruption in fusefs has a different root cause. And it's more general, too. It can be triggered by mmap()ing a file, then setting O_DIRECT and doing a write. It doesn't require fspacectl. Going forward, let's restrict this bug to discussion of UFS.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3b6056204dd80cc866b7998ef0776247ebc42ce4 commit 3b6056204dd80cc866b7998ef0776247ebc42ce4 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-02-04 01:20:19 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-02-04 18:32:07 +0000 FIOSEEKHOLE/FIOSEEKDATA: correct consistency for bmap-based implementation Writes on UFS through a mapped region do not allocate disk blocks in holes immediately. The blocks are allocated when the pages are paged out first time. This breaks the algorithm in vn_bmap_seekhole() and ufs_bmap_seekdata(), because VOP_BMAP() reports hole for the place which already contains a valid data. Clean the pages before doing VOP_BMAP() in the affected functions. In principle, we could clean less by only requesting clean starting from the offset, but it is probably not very important. PR: 269261 Reported by: asomers Reviewed by: asomers, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D38379 sys/kern/vfs_vnops.c | 14 ++++++++++++-- sys/ufs/ufs/ufs_bmap.c | 18 ++++++++++++++++++ sys/ufs/ufs/ufs_vnops.c | 2 +- 3 files changed, 31 insertions(+), 3 deletions(-)
assign to committer
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=51485f81b01a03752ffaf530abfb570ae4593fae commit 51485f81b01a03752ffaf530abfb570ae4593fae Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-02-04 01:20:19 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-02-11 00:25:02 +0000 FIOSEEKHOLE/FIOSEEKDATA: correct consistency for bmap-based implementation PR: 269261 (cherry picked from commit 3b6056204dd80cc866b7998ef0776247ebc42ce4) sys/kern/vfs_vnops.c | 12 +++++++++++- sys/ufs/ufs/ufs_bmap.c | 18 ++++++++++++++++++ sys/ufs/ufs/ufs_vnops.c | 2 +- 3 files changed, 30 insertions(+), 2 deletions(-)
^Triage: the 12 branch is now out of support.