The cp(1) command produces surprising behavior when copying multiple files to a destinationd directory. The files are copied in reverse order. This is of consequence when the order of files in a directory has meaning; e.g., in an mp3 player appliance the sequentially plays files in an MS-DOS filesystem directory. This is very counter-intuitive to the user. Fix: BTFOM. Sneaking suspicion that mastercmp() and related callers are implicated in this. How-To-Repeat: mkdir /tmp/foo /tmp/bar touch /tmp/foo/1 /tmp/foo/2 /tmp/foo/3 /tmp/foo/4 /tmp/foo/5 /tmp/foo/6 cp -v /tmp/foo/1 /tmp/foo/2 /tmp/foo/3 /tmp/foo/4 /tmp/foo/5 /tmp/foo/6 /tmp/bar
--- cp.c_orig Sun Apr 25 06:32:27 2004 +++ cp.c Sun Apr 25 06:33:50 2004 @@ -94,7 +94,6 @@ enum op { FILE_TO_FILE, FILE_TO_DIR, DIR_TO_DNE }; static int copy(char *[], enum op, int); -static int mastercmp(const FTSENT * const *, const FTSENT * const *); static void siginfo(int __unused); int @@ -274,7 +273,7 @@ mask = ~umask(0777); umask(~mask); - if ((ftsp = fts_open(argv, fts_options, mastercmp)) == NULL) + if ((ftsp = fts_open(argv, fts_options, NULL)) == NULL) err(1, "fts_open"); for (badcp = rval = 0; (curr = fts_read(ftsp)) != NULL; badcp = 0) { switch (curr->fts_info) { @@ -478,32 +477,6 @@ if (errno) err(1, "fts_read"); return (rval); -} - -/* - * mastercmp -- - * The comparison function for the copy order. The order is to copy - * non-directory files before directory files. The reason for this - * is because files tend to be in the same cylinder group as their - * parent directory, whereas directories tend not to be. Copying the - * files first reduces seeking. - */ -static int -mastercmp(const FTSENT * const *a, const FTSENT * const *b) -{ - int a_info, b_info; - - a_info = (*a)->fts_info; - if (a_info == FTS_ERR || a_info == FTS_NS || a_info == FTS_DNR) - return (0); - b_info = (*b)->fts_info; - if (b_info == FTS_ERR || b_info == FTS_NS || b_info == FTS_DNR) - return (0); - if (a_info == FTS_D) - return (-1); - if (b_info == FTS_D) - return (1); - return (0); } static void As quoted above, the comments in cp.c tell us the function mastercmp() is an attempt to improve performance based on knowing something about physical disks. This is an old optimization strategy (it's in the original version of cp.c). AFAIK, in the updated BSD filesystem, when we copy a file, we don't actually move the physical data block of the file but change the information in its inode such as the address of its data block and owner. Deleting mastercmp() and setting the comparison paramter to NULL for the function fts_open() suppresses the behavior in the bug. The next question is whether deleting mastercmp eliminates an optimization. Our testing shows the exact opposite, mastercmp is degrading performance. We did several experiments with cp -R to measure elapsed time on transfers between devices of differing file system types (to avoid UFS2 optimizations). Our results show removing mastercmp yields a small performance gain (note: we had no SCSI devices available, and second note: variability in file system performance seems dominated by other factors). M. K. McKusick has indicated in seminars that modern disk drives lie to the driver about their physical layouts. The use of mastercmp in cp.c is a legacy optimization from a different era of disk technology. We recommend removing this call from cp.c to address 53475. Ting Hui, engineer Dorr H. Clark, advisor Graduate School of Engineering Santa Clara University
On Tue, 27 Apr 2004, Dorr H. Clark wrote: > ... > -/* > - * mastercmp -- > - * The comparison function for the copy order. The order is to > copy > - * non-directory files before directory files. The reason for this > - * is because files tend to be in the same cylinder group as their > - * parent directory, whereas directories tend not to be. Copying > the > - * files first reduces seeking. > - */ According to cp -pRv, mastercmp() gets this perfectly backwards: cp actually copies directories first. It seems to just randomize the order of regular files; this is presumably because mastercmp() doesn't distinguish between all pairs of different files and qsort() doesn't preserve the original order. > ... > As quoted above, the comments in cp.c tell us the function > mastercmp() is an attempt to improve performance based on > knowing something about physical disks. > > This is an old optimization strategy (it's in the original > version of cp.c). AFAIK, in the updated BSD filesystem, > when we copy a file, we don't actually move the > physical data block of the file but change the information in its > inode such as the address of its data block and owner. Copying still involves lots of physical i/o. The difference in relatively recent versions of ffs is that it doesn't scatter the files so much by switching the cylinder group too often. IIRC, it switched for every directory. > The next question is whether deleting mastercmp eliminates > an optimization. Our testing shows the exact opposite, > mastercmp is degrading performance. We did several experiments > with cp -R to measure elapsed time on transfers between devices > of differing file system types (to avoid UFS2 optimizations). > Our results show removing mastercmp yields a small performance > gain (note: we had no SCSI devices available, and second note: > variability in file system performance seems dominated > by other factors). It would be interesting to know if mastercmp() works better if it does what its comment says it does. I suspect that the backwardsness doesn't make much difference, but is worse than it used to be because there is now more competition for space in the same cylinder group. I think benchmarks that don't descend into subdirs would show that using mastercmp really is an optimization for that access pattern, but I think that access pattern is relatively unusual. Optimizing for the default fts order seems as good as anything. > M. K. McKusick has indicated in seminars that modern disk drives > lie to the driver about their physical layouts. The use of > mastercmp in cp.c is a legacy optimization from a different > era of disk technology. We recommend removing this call > from cp.c to address 53475. Large seeks (especially ones larger than the drive's cache) still matter, and I think drivers rarely lie about these. cp's attempted optimization is more about second-guessing what ffs does. I agree that it shouldn't do this. The file system might not even be ffs. Bruce
A commit references this bug: Author: jilles Date: Tue May 5 13:23:04 UTC 2015 New revision: 282482 URL: https://svnweb.freebsd.org/changeset/base/282482 Log: cp: Remove fts sorting. In an attempt to improve performance, cp reordered directories first (although the comment says directories last). This is not effective with new UFS layout policies. The sorting reorders multiple arguments passed to cp, which may be undesirable. Additionally, the comparison function does not induce a total order. Per POSIX, this causes undefined behaviour in qsort(). NetBSD removed the sorting in 2009. On filesystems that return directory entries in hash/btree order, sorting by d_fileno before statting improves performance on large directories. However, this can only be implemented in fts(3). PR: 53475 Reviewed by: bde (in 2004) MFC after: 1 week Changes: head/bin/cp/cp.c
A commit references this bug: Author: jilles Date: Thu May 14 10:46:21 UTC 2015 New revision: 282890 URL: https://svnweb.freebsd.org/changeset/base/282890 Log: MFC r282482: cp: Remove fts sorting. In an attempt to improve performance, cp reordered directories first (although the comment says directories last). This is not effective with new UFS layout policies. The sorting reorders multiple arguments passed to cp, which may be undesirable. Additionally, the comparison function does not induce a total order. Per POSIX, this causes undefined behaviour in qsort(). NetBSD removed the sorting in 2009. On filesystems that return directory entries in hash/btree order, sorting by d_fileno before statting improves performance on large directories. However, this can only be implemented in fts(3). PR: 53475 Reviewed by: bde (in 2004) Changes: _U stable/10/ stable/10/bin/cp/cp.c
A commit references this bug: Author: jilles Date: Thu May 14 19:32:14 UTC 2015 New revision: 282917 URL: https://svnweb.freebsd.org/changeset/base/282917 Log: MFC r282482: cp: Remove fts sorting. In an attempt to improve performance, cp reordered directories first (although the comment says directories last). This is not effective with new UFS layout policies. The sorting reorders multiple arguments passed to cp, which may be undesirable. Additionally, the comparison function does not induce a total order. Per POSIX, this causes undefined behaviour in qsort(). NetBSD removed the sorting in 2009. On filesystems that return directory entries in hash/btree order, sorting by d_fileno before statting improves performance on large directories. However, this can only be implemented in fts(3). PR: 53475 Reviewed by: bde (in 2004) Changes: _U stable/9/bin/cp/ stable/9/bin/cp/cp.c