Summary: A file is opened for O_APPEND writev(2) is called with 480 bytes from a mmap'ed neighbor file. writev(2) returns 480 The file only grows by 194 bytes. Seen on: 13.0p5, emaste reproduced on ref13-amd64 ZFS seems to be a requirement for the failure. To reproduce: git clone https://github.com/bsdphk/AardWARC cd AardWARC make test (The action happens in /tmp) The bug triggers the following assert: Assert error in silo_attempt_append(), wsilo.c line 436: Condition((off_t)s == (after - before)) not true.
I dug into this a little bit. The program creates a file and writes some data via write(2) (so data does not appear in the page cache, only in the DMU). Immediately after, the file is mapped for reading and data from the mapping is written to a different file, and I see: 1988 aardwarc CALL lseek(0x6,0,SEEK_END) 1988 aardwarc RET lseek 2378/0x94a 1988 aardwarc CALL writev(0x6,0x7fffffffe360,0x2) 1988 aardwarc PFLT 0x8002634cb 0x1<VM_PROT_READ> 1988 aardwarc PRET KERN_PROTECTION_FAILURE 1988 aardwarc GIO fd 6 wrote 479 bytes <file data> 1988 aardwarc RET writev 479/0x1df 1988 aardwarc CALL lseek(0x6,0,SEEK_END) 1988 aardwarc RET lseek 2570/0xa0a So we get a page fault while reading from the mapping, which is expected, and dmu_write_uio_dbuf() returns EFAULT, which is the magic signal for vn_io_fault1() to retry after wiring the mapping. Some tracing indicates that dmu_write_uio_dbuf() does manage to write some data to the file before hitting EFAULT. In fact, the amount of data written in the first try is exactly 479 - (2570 - 2378) bytes. I think the bug is that the EFAULT causes this bit of code in zfs_write() to be skipped: /* * Update the file size (zp_size) if it has changed; * account for possible concurrent updates. */ while ((end_size = zp->z_size) < zfs_uio_offset(uio)) { (void) atomic_cas_64(&zp->z_size, end_size, zfs_uio_offset(uio)); ASSERT(error == 0); } z_size contains the file size returned by VOP_GETATTR(), used to provide the return value for lseek(SEEK_END). But... this bug appears to exist in main too. So how does this work at all? Note, part of the weirdness here comes from the fact that some of the input file data is written in the first try. I would expect dmu_write_uio_dbuf() to return EFAULT without having written anything.
This reminds me of the bug https://github.com/openzfs/zfs/commit/c23803be84cb5cc9d98186221f4106a9962dfc45 and its predecessor commit https://github.com/openzfs/zfs/commit/de198f2d9507b6dcf3d0d8f037ba33940208733e were intended to fix. Obviously, this isn't SEEK_DATA/SEEK_HOLE, and in that case, you wound up with correctly sized files with zeroes for the last $WHATEVER. But I still thought I'd mention it in case the logic changes there might help.
Note that in the "old" FreeBSD ZFS (e.g., in stable/12) there wasn't a break after an error from dmu_write_uio_dbuf() and there was some FreeBSD specific code to handle / tolerate EFAULT. In the new OpenZFS that logic is gone. There is some Linux specific code for EFAULT, but for FreeBSD there seems to be nothing. FWIW, I think that tools/regression/fsx should / could have caught this kind of an issue.
"Old" ZFS reference: https://cgit.freebsd.org/src/tree/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c?h=stable/12#n1236
https://github.com/openzfs/zfs/pull/12964
Fixed in the latest OpenZFS merge: https://cgit.freebsd.org/src/commit/?id=e92ffd9b626833ebdbf2742c8ffddc6cd94b963e https://cgit.freebsd.org/src/commit/?id=063daa8350d4a78f96d1ee6550910363fd3756fb
Now fixed in stable/13: https://cgit.FreeBSD.org/src/commit/?id=b55a7f3422d76a6765716b2b6e78967bd75199c9 We'll release an EN with the patch this week.
Fixed in releng/13.0 as EN-22:10.zfs.