Bug 255523 - vn_generic_copy_file_range copies holes to EOF slowly
Summary: vn_generic_copy_file_range copies holes to EOF slowly
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: Rick Macklem
Depends on:
Reported: 2021-05-01 04:23 UTC by Alan Somers
Modified: 2021-05-02 23:11 UTC (History)
2 users (show)

See Also:
rmacklem: mfc-stable13?

add support for a hole to eof to vn_generic_copy_range() (3.66 KB, patch)
2021-05-01 22:43 UTC, Rick Macklem
no flags Details | Diff
add support for a hole to eof to vn_generic_copy_range() (2.87 KB, patch)
2021-05-02 01:56 UTC, Rick Macklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Somers freebsd_committer 2021-05-01 04:23:30 UTC
vn_generic_copy_file_range can copy both data and holes.  However, it copies holes much more slowly than theoretically possible.  In my experiments using NFS 4.2 with FreeBSD 13.0-RELEASE servers and clients and ZFS local file systems, I can only achieve about 300 MBps when copying a file from NFS to local disk, when that file consists of a single giant hole.  In theory, the maximum speed should be infinite, given a sufficiently "large" sparse file.

Steps to Reproduce
1) Mount a ZFS-backed file system using nfsv4 with minorversion=2
2) On the server, create a giant sparse file with "truncate -s 1t sparsefile"
3) On the client, copy it to local storage with "cp sparsefile /tmp"


dtrace shows that vn_rdwr gets called repeatedly with len=64k.  Basically, it extends the file by 64kB at a time.  That's due to the "xfer = blksize" assignment in vn_generic_copy_file_range.  vn_generic_copy_file_range proceeds to loop xfer bytes at a time.  Instead, it should write the entire hole at once.

Note that this only happens when the hole extends to EOF.  If instead there is a small data region at EOF, then vn_generic_copy_file_range does indeed handle the hole efficiently.
Comment 1 Mark Millard 2021-05-01 08:40:42 UTC
Would this deal with:


Titled: "Make vn_generic_copy_file_range() interruptible via a signal."

This patch adds checks for signals that need to be
processed after each successful data copy cycle.

Might making the "data copy cycle" huge in some contexts
reintroduce such problems by making the sig_intr() calls
too infrequent? In other words: might vn_rdwr(. . .)  and/or
vn_write_outvp(. . .) sometimes take too long relative to
checking sig_intr() frequently enough?

There is also the cantseek related mem_iszero(dat,xfer) if
xfer were huge.

Note: I've not done a deep analyzis. I'm just asking based on
what I see in the vn_generic_copy_file_range(. . .) code.
Comment 2 Rick Macklem freebsd_committer 2021-05-01 22:43:14 UTC
Created attachment 224599 [details]
add support for a hole to eof to vn_generic_copy_range()

You can try this patch. I cannot verify if it helps,
because I only use UFS
I left assorted debug printfs in it.
Comment 3 Rick Macklem freebsd_committer 2021-05-01 22:48:59 UTC
vn_generic_copy_file_range() was coded the way it is,
since UFS (the only file system type I use/test on)
always allocates a data block (all 0 bytes) at the
end of the file.
--> Never has a hole extending to EOF.

I've attached a patch that tries to handle the
hole to EOF case.

Maybe you can test it?

Btw, the data size in the copy loop is pegged
at whatever the file system uses as a block size,
capped at 1Mbyte. This implies that interrupting
signals will be handled quickly.
--> Recall that read(2)/write(2) ignore signals,
    except for the weird NFS "intr" mount option case.
Comment 4 Alan Somers freebsd_committer 2021-05-01 23:43:07 UTC
Works for me!  I just have two comments about the patch:

* The (error == 0) check at line 3118 can be removed if you just move VOP_GETATTR two lines lower.
* Assigning false to holetoeof doesn't need to be done inside the loop.  You can initialize it in its declaration.
Comment 5 Rick Macklem freebsd_committer 2021-05-02 01:56:04 UTC
Created attachment 224603 [details]
add support for a hole to eof to vn_generic_copy_range()

Update patch as suggested by asomers@.
Moving the VOP_GETATTR() is actually
a fix, since "goto out;" assumes an
unlocked invp. It also is slightly optimized,
since the inva are only needed if holein > 0.

I moved the initialization of holetoeof to
above the loop, since it easier to spot there
(and I thoght style(9) frowned on initializations
 in declarations).

No semntics change unless VOP_GETATTR() fails
for some reason.
Comment 6 Alan Somers freebsd_committer 2021-05-02 02:19:29 UTC
Latest patch LGTM.
Comment 7 commit-hook freebsd_committer 2021-05-02 23:09:24 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4f592683c356379c5bac56b52807ed4ad54ee647

commit 4f592683c356379c5bac56b52807ed4ad54ee647
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2021-05-02 23:04:27 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2021-05-02 23:04:27 +0000

    copy_file_range(2): improve copying of a large hole to EOF

    PR#255523 reported that a file copy for a file with a large hole
    to EOF on ZFS ran slowly over NFSv4.2.
    The problem was that vn_generic_copy_file_range() would
    loop around reading the hole's data and then see it is all
    0s. It was coded this way since UFS always allocates a data
    block near the end of the file, such that a hole to EOF never exists.

    This patch modifies vn_generic_copy_file_range() to check for a
    ENXIO returned from VOP_IOCTL(..FIOSEEKDATA..) and handle that
    case as a hole to EOF. asomers@ confirms that it works for his
    ZFS test case.

    PR:     255523
    Tested by:      asomers
    Reviewed by:    asomers
    MFC after:      2 weeks
    Differential Revision:  https://reviews.freebsd.org/D30076

 sys/kern/vfs_vnops.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)
Comment 8 Rick Macklem freebsd_committer 2021-05-02 23:11:41 UTC
Patch has been tested by asomers@ and committed
to main. It will be MFC'd in 2 weeks.