Bug 223955

Summary: cpio needs a --block-size= option
Product: Base System Reporter: Ronald F. Guilmette <rfg>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Many People CC: cem, rfg-freebsd
Priority: ---    
Version: 9.1-RELEASE   
Hardware: Any   
OS: Any   

Description Ronald F. Guilmette 2017-11-29 00:26:39 UTC
I didn't get the memo, so I'm only just now finding out that "cpio -p"
	can be WAAAAAY slower than "cp" for the same file, at least on my
	Ubuntu/Linux system.  I have some reason to believe that this will be
	true if I test it also on FreeBSD.  (The issue seems like it will
	be the same on both systems.)

	The problem seems to be that the default I/O block size used by cpio...
	left over from ancient times (e.g. 1970's) is 512 bytes.  This can
	be increased somewhat (to 5120 bytes) using the -B option, however
	in the current era of large media files and multi-terabyte disk
	drives... many of which themselves have a native hardware block size
	of 4 KiB... even a block size of 5120 bytes is ludicrously small,
	and causes a massive performance hit.

	The Linux people, at least, seems to have recognzied the issue, and
	have added a --block-size= option to their version of cpio.  I think
	it's well past time that FreeBSD followed suit and did likewise.

System: FreeBSD segfault.tristatelogic.com 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64

Just try copying a large file multi-gigabyte file using "cpio -p"
	and time the command.  Then, using a different file of about the
	same size, just do the same thing using "cp" and time that also.
	On Ubuntu, at least, there is a stunning 6.92x speed difference.
	I know.  I measured it.

Sorry.  No.  I don't have the code patches to implement this suggestion.
	but I felt that it was worth making the suggestion anyway.
Comment 1 Conrad Meyer freebsd_committer 2017-11-29 00:56:55 UTC
Interesting.  I wasn't aware cpio archives even supported multi-gigabyte files. 
 (They don't run into a 32-bit size limit?)

That said, why continue to use cpio at all instead of cp -R?
Comment 2 Conrad Meyer freebsd_committer 2017-11-29 00:59:25 UTC
FWIW, on CURRENT cpio (from contrib/libarchive/cpio/cpio.c) uses a hard coded 16 kiB buffer on the stack.
Comment 3 Conrad Meyer freebsd_committer 2017-11-29 01:00:25 UTC
Please forward this to the upstream libarchive project by filing an issue here: https://github.com/libarchive/libarchive/issues
Comment 4 Ronald F. Guilmette 2017-11-29 20:47:09 UTC
Oy vey!  I think that I may perhaps want to withdraw this PR entirely, now that I have some additional (and troubling) data.

After I filed this PR, just for laughs, I tried using that --block-size cpio option on my Ubuntu system and then tried again to time a multi-gigabyte file copy (cpio -p) which I had already times using other methods (e.g. "cp" and "cpio -without any --block-size= option).  I did my test with --block-size=1M.

To my amazement and horror, adding the --block-size=1M option didn't really make any huge difference.  The bloody file copying was -still- running at least 5x slower than copying of a similar sized file using good old "cp".

Moral of the story:  I guess I'm the only one on the planet who is still even trying to use good old cpio.  it appears that all of the work and tuning and optimizations have gone into cp and/or rsync instead... both of which are fast snot... while leaving poor old cpio to wallow in the backwaters of virtual abandonment.

Sigh.  I find this rather a pity, because, given that cpio is MUCH simpler than rsync, in theory it -should- be able to do file copies at least as fast, or perhaps even a bit faster.  (It doesn't have the added burden of all the network awareness and all that fancy schmancy differential file comparison stuff to deal with, unlike rsync.)

But it seems that I'm the only one in the universe who has even noticed, in all the years of this century so far, that poor old cpio just hasn't been keeping up.  Thus, it is silly of me to try to swim against the tide.  I'll just use cp and rsync from now on and be done with it.