Bug 35646

Summary: cp(1) page needs a "Bugs" section.
Product: Documentation Reporter: Gary W. Swearingen <swear>
Component: Books & ArticlesAssignee: freebsd-doc (Nobody) <doc>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Latest   
Hardware: Any   
OS: Any   

Description Gary W. Swearingen 2002-03-07 21:00:05 UTC
The cp(1) program has a feature that should be documented in a "Bugs"
section.  (Or can there be a "Warnings" section?)

The "cp" program removes "holes" from "sparse" files while copying,
resulting in a non-exact copy, in some sense.
================

Fix: 

Add a "Bugs" section explaining about "sparse" files and "holes"
and how "cp" handles them.
How-To-Repeat: 
$ df .
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/ad0s2f   4530961  1477900  2690585    35%    /u
$ dd if=/dev/zero of=zeros-sparse oseek=1000 count=2
2+0 records in
2+0 records out
1024 bytes transferred in 0.000205 secs (4994148 bytes/sec)
$ df .
Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
/dev/ad0s2f    4530961  1477916  2690569    35%    /u
$ cp zeros-sparse zeros-cp
$ df .
Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
/dev/ad0s2f    4530961  1478428  2690057    35%    /u
$ l zero*
-rw-r-----  1 root  wheel  513024 Mar  7 12:42 zeros-cp
-rw-r-----  1 root  wheel  513024 Mar  7 12:42 zeros-sparse

================
Comment 1 Giorgos Keramidas freebsd_committer freebsd_triage 2002-03-08 00:07:24 UTC
Gary W. Swearingen wrote:

> The cp(1) program has a feature that should be documented in a "Bugs"
> section.  (Or can there be a "Warnings" section?)

Are you sure this should be documented in the manual page of cp(1) ?
Any program that copies data and doesn't take special care of 'holes' will
show similar behavior.  Should we modify their manual pages too?

	[ See what dd(1) does instead of cp(1) below. ]

	hades:~> cd /tmp
	hades:/tmp> df .
	Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
	/dev/ad0s1a     194548    65916   113069    37%    /
	hades:/tmp> dd if=/dev/zero of=zeros-sparse oseek=1000 count=2
	2+0 records in
	2+0 records out
	1024 bytes transferred in 0.000685 secs (1494682 bytes/sec)
	hades:/tmp> df .
	Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
	/dev/ad0s1a     194548    65932   113053    37%    /
	hades:/tmp> dd if=zeros-sparse of=zeros-dd
	1002+0 records in
	1002+0 records out
	513024 bytes transferred in 0.286582 secs (1790147 bytes/sec)

	^^^ Many blocks copied.

	hades:/tmp> df .
	Filesystem   1K-blocks     Used    Avail Capacity  Mounted on
	/dev/ad0s1a     194548    66444   112541    37%    /
	hades:/tmp> ls -l zeros-*
	-rw-r--r--  1 charon  wheel  - 513024 Mar  8 02:03 zeros-dd
	-rw-r--r--  1 charon  wheel  - 513024 Mar  8 02:03 zeros-sparse

Note that I'm not opposing the change.  I'm only asking for ideas about all the
possible programs that will behave exactly like cp(1) and dd(1) do, when they
find files with 'holes'.

Giorgos Keramidas                       FreeBSD Documentation Project
keramida@{freebsd.org,ceid.upatras.gr}  http://www.FreeBSD.org/docproj/
Comment 2 Gary W. Swearingen 2002-03-08 03:43:33 UTC
Giorgos Keramidas <keramida@freebsd.org> writes:

> Are you sure this should be documented in the manual page of cp(1) ?
> Any program that copies data and doesn't take special care of 'holes' will
> show similar behavior.  Should we modify their manual pages too?
> 
> 	[ See what dd(1) does instead of cp(1) below. ]
[snip...]
> Note that I'm not opposing the change.  I'm only asking for ideas about all the
> possible programs that will behave exactly like cp(1) and dd(1) do, when they
> find files with 'holes'.

You're clever to think of such things.  If the OS could always hide the
fact that it was compressing or uncompressing files like this, then it
would never need mentioning outside the filesytem documenation.  But it
doesn't.  A user of "cp" or "dd" should be able to predict, based on his
reading of the man page or maybe some handbook, whether his use of the
command will over-fill his filesystem.  Currently, he must resort to
trail and error, a method dear to many UNIX users, but not to many
others. (Of course, many will not read about it until being bitten.)

Such knowledge probably should also be available to users of ">", "|",
"cat", and probably some others.  Probably less important for "vi",
"sed", "awk", because few have expectations as to the size of their
outputs.  It's going to go undocumented in many cases, but I think
"cp" and "dd" are special cases as one often cares much about their
outputs.  One expects a copy to be identical to the original for all
purposes, not just most purposes.  I've seen the issue discussed before
and it would have been nice to be able to point to documentation on it.

But then, I didn't provide such documentation...
Comment 3 Giorgos Keramidas freebsd_committer freebsd_triage 2002-03-08 03:59:18 UTC
Gary W. Swearingen wrote:

> Giorgos Keramidas <keramida@freebsd.org> writes:
>
> > Are you sure this should be documented in the manual page of cp(1) ?
> > Any program that copies data and doesn't take special care of 'holes' will
> > show similar behavior.  Should we modify their manual pages too?
> >
> > 	[ See what dd(1) does instead of cp(1) below. ]
> [snip...]
> > Note that I'm not opposing the change.  I'm only asking for ideas about all the
> > possible programs that will behave exactly like cp(1) and dd(1) do, when they
> > find files with 'holes'.
>
> You're clever to think of such things.  If the OS could always hide the
> fact that it was compressing or uncompressing files like this, then it
> would never need mentioning outside the filesytem documenation.  But it
> doesn't.  A user of "cp" or "dd" should be able to predict, based on his
> reading of the man page or maybe some handbook, whether his use of the
> command will over-fill his filesystem.  Currently, he must resort to
> trail and error, a method dear to many UNIX users, but not to many
> others. (Of course, many will not read about it until being bitten.)

A more general solution is needed.  This is what I was trying to point out.
Many commands will do strange things with files that have holes.  A few
that I could think off the top of my head were:

	cat file1 > file2
	cat < file1 > file2
	cp file1 file2
	awk scripts
	sed scripts
	perl filters

Practically, any command that does not have knowledge of the underlying
filesystem data-structures will copy the 'wrong' amount of data.  AFAIK,
only dump(8) and restore(8) handle files with holes correctly; but these
commands work directly on the filesystem device.

I'll have to think about this a bit more.  I'll get back to you soon.

Giorgos Keramidas                       FreeBSD Documentation Project
keramida@{freebsd.org,ceid.upatras.gr}  http://www.FreeBSD.org/docproj/
Comment 4 Giorgos Keramidas freebsd_committer freebsd_triage 2006-05-16 21:56:23 UTC
State Changed
From-To: open->closed

I don't think there is a general way to document sparse file in 
all the possible places where they may come up as something 
resembling a "surprise" for users who don't know their internals. 

Documentation about sparse files doesn't belong in all the manpages 
but in introductory UNIX documentation, IMHO.