While using cp(1) to copy large files (seen with 36GByte) on UFS file systems with something like: # cp -p guru-20210102.tar.gz xxx the CPU utilization is 100% while the copied amount of data is poor (some 168MByte per minute). The cp(1) process shows in truss(1) only copy_file_range(2) sys calls with low frequency: # truss -p 37655 copy_file_range(0x3,0x0,0x4,0x0,0x200000,0x0) = 2097152 (0x200000) copy_file_range(0x3,0x0,0x4,0x0,0x200000,0x0) = 2097152 (0x200000) copy_file_range(0x3,0x0,0x4,0x0,0x200000,0x0) = 2097152 (0x200000) copy_file_range(0x3,0x0,0x4,0x0,0x200000,0x0) = 2097152 (0x200000) copy_file_range(0x3,0x0,0x4,0x0,0x200000,0x0) = 2097152 (0x200000)
Created attachment 221214 [details] fix the problem by using a large "len" argument for copy_file_range(2) This patch modifies "cp" so that it uses a large "len" argument to copy_file_range(2). This results in far fewer copy_file_range(2) syscalls, each of which will do a SEEK_DATA and SEEK_HOLE. I suspect that these SEEK_DATA/SEEK_HOLE operations are slow for large non-sparse files and doing a large number of them caused the slowdown.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c98a764c681f8b70812a9f13a6e61c96aa1a69d2 commit c98a764c681f8b70812a9f13a6e61c96aa1a69d2 Author: Rick Macklem <rmacklem@FreeBSD.org> AuthorDate: 2021-01-03 00:58:43 +0000 Commit: Rick Macklem <rmacklem@FreeBSD.org> CommitDate: 2021-01-03 00:58:43 +0000 cp(1): fix performance issue for large non-sparse file copies PR252358 reported a serious performance problem when copying a large non-sparse file on a UFS file system. This problem seems to have been caused by a large number of SEEK_HOLE operations, with one done for each copy_file_range(2) call. This patch modifies cp(1) to use a large (SSIZE_MAX) len argument, reducing the number of system calls and resolving the performance issue. While here, convert the type of the "rcount" from "int" to "ssize_t" so that it is consistent with that returned by both read(2) and copy_file_range(2). PR: 252358 Reviewed by: asomers Differential Revision: https://reviews.freebsd.org/D27937 bin/cp/utils.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
I have committed a variant of the patch to head. Since this only applies to head, I will close this PR once I hear that Matthias has tested the patch and it has resolved the problem for him.
I did with the file from Rick's mail: # cd /usr/src # patch < ~guru/r368166/patch.252358 # cd bin/cp # make # make install and tested the cp(1) as: # cd /home/backups # ls -lh guru-20210102.tar.gz -r-------- 1 root wheel 36G 2 ene. 16:02 guru-20210102.tar.gz # cp -p guru-20210102.tar.gz xxx cp: guru-20210102.tar.gz: No such file or directory The target file 'xxx' has the same size of the source and the time of 301 secs seems to be ok; but what does the message mean? I run the cp(1) with truss(1) and the messages comes out of the process: # truss -o tr cp -p guru-20210102.tar.gz xxx cp: guru-20210102.tar.gz: No error: 0 # grep 'write(2' tr write(2,"cp: ",4) = 4 (0x4) write(2,"guru-20210102.tar.gz",20) = 20 (0x14) write(2,": ",2) = 2 (0x2) write(2,"No error: 0\n",12) = 12 (0xc) When the target file 'xxx' does not exist, the message is even like a real error message: # rm xxx # truss -o tr cp -p guru-20210102.tar.gz xxx cp: guru-20210102.tar.gz: No such file or directory # grep 'write(2' tr write(2,"cp: ",4) = 4 (0x4) write(2,"guru-20210102.tar.gz",20) = 20 (0x14) write(2,": ",2) = 2 (0x2) write(2,"No such file or directory\n",26) = 26 (0x1a) The message has to do with the call to copy_file_range(2), there is only one single syscall which tries to copy the full file at once: copy_file_range(0x3,0x0,0x4,0x0,0x7fffffffffffffff,0x0) = 38603862368 (0x8fcf83160) write(2,"cp: ",4) = 4 (0x4) write(2,"guru-20210102.tar.gz",20) = 20 (0x14) write(2,": ",2) = 2 (0x2) It looks like the message comes out of cp.c around like 290: ... for (badcp = rval = 0; (curr = fts_read(ftsp)) != NULL; badcp = 0) { switch (curr->fts_info) { case FTS_NS: case FTS_DNR: case FTS_ERR: warnx("%s: %s", curr->fts_path, strerror(curr->fts_errno)); badcp = rval = 1; continue; ... I will attach the complete file 'tr';
Created attachment 221230 [details] the truss file of running: truss -o tr cp -p guru-20210102.tar.gz xxx
Created attachment 221271 [details] fix the problem by using a large "len" argument for copy_file_range(2) The old patch didn't change the type of rcount to ssize_t.
I've tested the new patch from the PR. This gives: # cd /usr/src/bin/cp # patch < ~guru/cp.patch # make # make install # cd /home/backups # time cp -p guru-20210102.tar.gz xxx real 4m54,233s user 0m0,005s sys 1m8,588s # ls -lh guru-20210102.tar.gz xxx -r-------- 1 root wheel 36G 2 ene. 16:02 guru-20210102.tar.gz -r-------- 1 root wheel 36G 2 ene. 16:02 xxx Seems fine to me now.
Thanks. This is a relief. I didn't understand how it was broken. Then I saw that the patch I put here was a bogus early edit, where I had changed "rcount" to ssize_t in the wrong function. --> 38603862368, which was truncated to 32bits when assigned to the "int" and appeared negative, generating the error. Thanks for reporting this and the quick testing. I'll close the PR.
It would be good to have this bugfix also on stable/12.
copy_file_range(2) is only in head/FreeBSD13.