Bug 260453 - ZFS truncated write to O_APPEND file from mmap'ed memory
Summary: ZFS truncated write to O_APPEND file from mmap'ed memory
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-15 21:44 UTC by Poul-Henning Kamp
Modified: 2022-03-15 19:59 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Poul-Henning Kamp freebsd_committer freebsd_triage 2021-12-15 21:44:30 UTC
Summary:
        A file is opened for O_APPEND
        writev(2) is called with 480 bytes from a mmap'ed neighbor file.
        writev(2) returns 480
        The file only grows by 194 bytes.

Seen on: 13.0p5, emaste reproduced on ref13-amd64

ZFS seems to be a requirement for the failure.

To reproduce:

        git clone https://github.com/bsdphk/AardWARC
        cd AardWARC
        make test  (The action happens in /tmp)

The bug triggers the following assert:

        Assert error in silo_attempt_append(), wsilo.c line 436:
          Condition((off_t)s == (after - before)) not true.
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2021-12-15 22:09:55 UTC
I dug into this a little bit.  The program creates a file and writes some data via write(2) (so data does not appear in the page cache, only in the DMU).  Immediately after, the file is mapped for reading and data from the mapping is written to a different file, and I see:

  1988 aardwarc CALL  lseek(0x6,0,SEEK_END)
  1988 aardwarc RET   lseek 2378/0x94a
  1988 aardwarc CALL  writev(0x6,0x7fffffffe360,0x2)
  1988 aardwarc PFLT  0x8002634cb 0x1<VM_PROT_READ>
  1988 aardwarc PRET  KERN_PROTECTION_FAILURE
  1988 aardwarc GIO   fd 6 wrote 479 bytes
  <file data>
  1988 aardwarc RET   writev 479/0x1df
  1988 aardwarc CALL  lseek(0x6,0,SEEK_END)
  1988 aardwarc RET   lseek 2570/0xa0a

So we get a page fault while reading from the mapping, which is expected, and dmu_write_uio_dbuf() returns EFAULT, which is the magic signal for vn_io_fault1() to retry after wiring the mapping.

Some tracing indicates that dmu_write_uio_dbuf() does manage to write some data to the file before hitting EFAULT.  In fact, the amount of data written in the first try is exactly 479 - (2570 - 2378) bytes.

I think the bug is that the EFAULT causes this bit of code in zfs_write() to be skipped:

		/*
		 * Update the file size (zp_size) if it has changed;
		 * account for possible concurrent updates.
		 */
		while ((end_size = zp->z_size) < zfs_uio_offset(uio)) {
			(void) atomic_cas_64(&zp->z_size, end_size,
			    zfs_uio_offset(uio));
			ASSERT(error == 0);
		}

z_size contains the file size returned by VOP_GETATTR(), used to provide the return value for lseek(SEEK_END).  But... this bug appears to exist in main too.  So how does this work at all?

Note, part of the weirdness here comes from the fact that some of the input file data is written in the first try.  I would expect dmu_write_uio_dbuf() to return EFAULT without having written anything.
Comment 2 Rich Ercolani 2021-12-16 04:44:52 UTC
This reminds me of the bug https://github.com/openzfs/zfs/commit/c23803be84cb5cc9d98186221f4106a9962dfc45 and its predecessor commit https://github.com/openzfs/zfs/commit/de198f2d9507b6dcf3d0d8f037ba33940208733e were intended to fix.

Obviously, this isn't SEEK_DATA/SEEK_HOLE, and in that case, you wound up with correctly sized files with zeroes for the last $WHATEVER. But I still thought I'd mention it in case the logic changes there might help.
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2021-12-16 07:40:21 UTC
Note that in the "old" FreeBSD ZFS (e.g., in stable/12) there wasn't a break after an error from dmu_write_uio_dbuf() and there was some FreeBSD specific code to handle / tolerate EFAULT.

In the new OpenZFS that logic is gone.
There is some Linux specific code for EFAULT, but for FreeBSD there seems to be nothing.

FWIW, I think that tools/regression/fsx should / could have caught this kind of an issue.
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2022-01-12 14:27:35 UTC
https://github.com/openzfs/zfs/pull/12964
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2022-02-21 15:15:04 UTC
Now fixed in stable/13: https://cgit.FreeBSD.org/src/commit/?id=b55a7f3422d76a6765716b2b6e78967bd75199c9

We'll release an EN with the patch this week.
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2022-03-15 19:59:15 UTC
Fixed in releng/13.0 as EN-22:10.zfs.