Bug 53475 - cp(1) copies files in reverse order to destination
Summary: cp(1) copies files in reverse order to destination
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 4.8-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Jilles Tjoelker
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-06-19 03:10 UTC by Louis Mamakos
Modified: 2015-05-14 19:34 UTC (History)
1 user (show)

See Also:
jilles: mfc-stable10+
jilles: mfc-stable9+
jilles: mfc-stable8-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Louis Mamakos 2003-06-19 03:10:10 UTC
The cp(1) command produces surprising behavior when copying multiple
files to a destinationd directory.  The files are copied in reverse
order.  This is of consequence when the order of files in a directory
has meaning; e.g., in an mp3 player appliance the sequentially plays
files in an MS-DOS filesystem directory.

This is very counter-intuitive to the user.

Fix: 

BTFOM.  Sneaking suspicion that mastercmp() and related callers are
implicated in this.
How-To-Repeat:  mkdir /tmp/foo /tmp/bar
 touch /tmp/foo/1 /tmp/foo/2 /tmp/foo/3 /tmp/foo/4 /tmp/foo/5 /tmp/foo/6
 cp -v /tmp/foo/1 /tmp/foo/2 /tmp/foo/3 /tmp/foo/4 /tmp/foo/5 /tmp/foo/6 /tmp/bar
Comment 1 Dorr H. Clark 2004-04-28 03:03:14 UTC
--- cp.c_orig   Sun Apr 25 06:32:27 2004
+++ cp.c        Sun Apr 25 06:33:50 2004
@@ -94,7 +94,6 @@
 enum op { FILE_TO_FILE, FILE_TO_DIR, DIR_TO_DNE };
 
 static int copy(char *[], enum op, int);
-static int mastercmp(const FTSENT * const *, const FTSENT * const *);
 static void siginfo(int __unused);
 
 int
@@ -274,7 +273,7 @@
        mask = ~umask(0777);
        umask(~mask);
 
-       if ((ftsp = fts_open(argv, fts_options, mastercmp)) == NULL)
+       if ((ftsp = fts_open(argv, fts_options, NULL)) == NULL)
                err(1, "fts_open");
        for (badcp = rval = 0; (curr = fts_read(ftsp)) != NULL; badcp =
0) {
                switch (curr->fts_info) {
@@ -478,32 +477,6 @@
        if (errno)
                err(1, "fts_read");
        return (rval);
-}
-
-/*
- * mastercmp --
- *     The comparison function for the copy order.  The order is to
copy
- *     non-directory files before directory files.  The reason for this
- *     is because files tend to be in the same cylinder group as their
- *     parent directory, whereas directories tend not to be.  Copying
the
- *     files first reduces seeking.
- */
-static int
-mastercmp(const FTSENT * const *a, const FTSENT * const *b)
-{
-       int a_info, b_info;
-
-       a_info = (*a)->fts_info;
-       if (a_info == FTS_ERR || a_info == FTS_NS || a_info == FTS_DNR)
-               return (0);
-       b_info = (*b)->fts_info;
-       if (b_info == FTS_ERR || b_info == FTS_NS || b_info == FTS_DNR)
-               return (0);
-       if (a_info == FTS_D)
-               return (-1);
-       if (b_info == FTS_D)
-               return (1);
-       return (0);
 }
 
 static void


As quoted above, the comments in cp.c tell us the function 
mastercmp() is an attempt to improve performance based on 
knowing something about physical disks.
 
This is an old optimization strategy (it's in the original 
version of cp.c).  AFAIK, in the updated BSD filesystem, 
when we copy a file, we don't actually move the
physical data block of the file but change the information in its
inode such as the address of its data block and owner.  

Deleting mastercmp() and setting the comparison paramter to NULL 
for the function fts_open() suppresses the behavior 
in the bug.

The next question is whether deleting mastercmp eliminates
an optimization.  Our testing shows the exact opposite,
mastercmp is degrading performance.  We did several experiments
with cp -R to measure elapsed time on transfers between devices
of differing file system types (to avoid UFS2 optimizations).
Our results show removing mastercmp yields a small performance
gain (note: we had no SCSI devices available, and second note:
variability in file system performance seems dominated 
by other factors).

M. K. McKusick has indicated in seminars that modern disk drives
lie to the driver about their physical layouts.  The use of
mastercmp in cp.c is a legacy optimization from a different
era of disk technology.  We recommend removing this call
from cp.c to address 53475.

Ting Hui, engineer
Dorr H. Clark, advisor
Graduate School of Engineering
Santa Clara University
Comment 2 Bruce Evans 2004-04-28 08:37:25 UTC
On Tue, 27 Apr 2004, Dorr H. Clark wrote:

> ...
>  -/*
>  - * mastercmp --
>  - *     The comparison function for the copy order.  The order is to
>  copy
>  - *     non-directory files before directory files.  The reason for this
>  - *     is because files tend to be in the same cylinder group as their
>  - *     parent directory, whereas directories tend not to be.  Copying
>  the
>  - *     files first reduces seeking.
>  - */

According to cp -pRv, mastercmp() gets this perfectly backwards: cp
actually copies directories first.  It seems to just randomize the
order of regular files; this is presumably because mastercmp() doesn't
distinguish between all pairs of different files and qsort() doesn't
preserve the original order.

> ...
>  As quoted above, the comments in cp.c tell us the function
>  mastercmp() is an attempt to improve performance based on
>  knowing something about physical disks.
>
>  This is an old optimization strategy (it's in the original
>  version of cp.c).  AFAIK, in the updated BSD filesystem,
>  when we copy a file, we don't actually move the
>  physical data block of the file but change the information in its
>  inode such as the address of its data block and owner.

Copying still involves lots of physical i/o.  The difference in
relatively recent versions of ffs is that it doesn't scatter the files
so much by switching the cylinder group too often.  IIRC, it switched
for every directory.

>  The next question is whether deleting mastercmp eliminates
>  an optimization.  Our testing shows the exact opposite,
>  mastercmp is degrading performance.  We did several experiments
>  with cp -R to measure elapsed time on transfers between devices
>  of differing file system types (to avoid UFS2 optimizations).
>  Our results show removing mastercmp yields a small performance
>  gain (note: we had no SCSI devices available, and second note:
>  variability in file system performance seems dominated
>  by other factors).

It would be interesting to know if mastercmp() works better if it does
what its comment says it does.  I suspect that the backwardsness doesn't
make much difference, but is worse than it used to be because there
is now more competition for space in the same cylinder group.  I think
benchmarks that don't descend into subdirs would show that using
mastercmp really is an optimization for that access pattern, but I
think that access pattern is relatively unusual.  Optimizing for the
default fts order seems as good as anything.

>  M. K. McKusick has indicated in seminars that modern disk drives
>  lie to the driver about their physical layouts.  The use of
>  mastercmp in cp.c is a legacy optimization from a different
>  era of disk technology.  We recommend removing this call
>  from cp.c to address 53475.

Large seeks (especially ones larger than the drive's cache) still
matter, and I think drivers rarely lie about these.  cp's attempted
optimization is more about second-guessing what ffs does.  I agree
that it shouldn't do this.  The file system might not even be ffs.

Bruce
Comment 3 commit-hook freebsd_committer freebsd_triage 2015-05-05 13:23:07 UTC
A commit references this bug:

Author: jilles
Date: Tue May  5 13:23:04 UTC 2015
New revision: 282482
URL: https://svnweb.freebsd.org/changeset/base/282482

Log:
  cp: Remove fts sorting.

  In an attempt to improve performance, cp reordered directories first
  (although the comment says directories last). This is not effective with new
  UFS layout policies.

  The sorting reorders multiple arguments passed to cp, which may be
  undesirable.

  Additionally, the comparison function does not induce a total order. Per
  POSIX, this causes undefined behaviour in qsort().

  NetBSD removed the sorting in 2009.

  On filesystems that return directory entries in hash/btree order, sorting by
  d_fileno before statting improves performance on large directories. However,
  this can only be implemented in fts(3).

  PR:		53475
  Reviewed by:	bde (in 2004)
  MFC after:	1 week

Changes:
  head/bin/cp/cp.c
Comment 4 commit-hook freebsd_committer freebsd_triage 2015-05-14 10:46:26 UTC
A commit references this bug:

Author: jilles
Date: Thu May 14 10:46:21 UTC 2015
New revision: 282890
URL: https://svnweb.freebsd.org/changeset/base/282890

Log:
  MFC r282482: cp: Remove fts sorting.

  In an attempt to improve performance, cp reordered directories first
  (although the comment says directories last). This is not effective with new
  UFS layout policies.

  The sorting reorders multiple arguments passed to cp, which may be
  undesirable.

  Additionally, the comparison function does not induce a total order. Per
  POSIX, this causes undefined behaviour in qsort().

  NetBSD removed the sorting in 2009.

  On filesystems that return directory entries in hash/btree order, sorting by
  d_fileno before statting improves performance on large directories. However,
  this can only be implemented in fts(3).

  PR:		53475
  Reviewed by:	bde (in 2004)

Changes:
_U  stable/10/
  stable/10/bin/cp/cp.c
Comment 5 commit-hook freebsd_committer freebsd_triage 2015-05-14 19:32:17 UTC
A commit references this bug:

Author: jilles
Date: Thu May 14 19:32:14 UTC 2015
New revision: 282917
URL: https://svnweb.freebsd.org/changeset/base/282917

Log:
  MFC r282482: cp: Remove fts sorting.

  In an attempt to improve performance, cp reordered directories first
  (although the comment says directories last). This is not effective with new
  UFS layout policies.

  The sorting reorders multiple arguments passed to cp, which may be
  undesirable.

  Additionally, the comparison function does not induce a total order. Per
  POSIX, this causes undefined behaviour in qsort().

  NetBSD removed the sorting in 2009.

  On filesystems that return directory entries in hash/btree order, sorting by
  d_fileno before statting improves performance on large directories. However,
  this can only be implemented in fts(3).

  PR:		53475
  Reviewed by:	bde (in 2004)

Changes:
_U  stable/9/bin/cp/
  stable/9/bin/cp/cp.c