Bug 86485

Summary: [patch] hexdump(1): hexdump -s speedup on /dev
Product: Base System Reporter: Toby Peterson <toby>
Component: binAssignee: Kyle Evans <kevans>
Status: Closed FIXED    
Severity: Affects Only Me CC: emaste, kevans
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
hexdump-fseeko.diff
none
hexdump-stat.h.diff
none
ministat.log
none
hexdump.diff none

Description Toby Peterson 2005-09-23 05:40:09 UTC
hexdump -s is very slow on devices when offset is large, because it doesn't try to fseek them.

Fix: 

Patch: http://lamancha.opendarwin.org/~toby/freebsd/hexdump.diff
How-To-Repeat: Run hexdump -s <large offset> on a readable device.
Comment 1 Garrett Cooper 2008-06-21 21:47:13 UTC
Hi,
Could you please resubmit the patch (link is broken)?
Thanks,
-Garrett
Comment 2 Toby Peterson 2008-06-22 03:05:21 UTC
On Jun 21, 2008, at 1:47 PM, Garrett Cooper wrote:

> Hi,
> Could you please resubmit the patch (link is broken)?
> Thanks,
> -Garrett


That webserver is long defunct, but the attached patch should resolve  
the issue.

- Toby
Comment 3 Enji Cooper freebsd_committer freebsd_triage 2008-12-27 10:37:53 UTC
I'm actually seeing a 20 ~ 30 fold performance _decrease_ after  
applying this patch with hexump ok Tiger when accessing /dev/zero.  
Could you submit better reproduction steps / criteria?
Thanks!
-Garrett
Comment 4 Alexander Best 2010-02-28 16:29:51 UTC
i can't verify the performance decrease running HEAD (r204383) garrett
mentioned. here are some benchmark (hexdump being the unpatched binary and
=2E/hexdump including toby's patches):

time hexdump -n 2 -s 999999999 /dev/zero  4,45s user 0,43s system 98% cpu
4,938 total

time ./hexdump -n 2 -s 999999999 /dev/zero  0,00s user 0,00s system 89% cpu
0,005 total

however while the unpatched hexdump binary succeeds doing

hexdump -n 2 -s 99999999 /dev/ada0  0,52s user 0,52s system 19% cpu 5,418
total

the patched binary outputs a warning

hexdump: /dev/ada0: Invalid argument
5f5e0ff
=2E/hexdump -n 2 -s 99999999 /dev/ada0  0,00s user 0,00s system 89% cpu 0,0=
06
total

to me the patch doesn't look right however, because

1. if a file is not seekable, fseeko() shouldn't be used. so the

"if (S_ISREG(sb.st_mode))"

statement should stay. removing it causes the warning i got during
benchmarking, because fseeko() itself outputs a warning if it's being run o=
n a
non-seekable file.
2. the real cause for the slowdown on non-seekable files is the use of
getchar() which is testing whether EOF has been reached using a blocksize o=
f 1
byte.

dd is much faster when dealing with non-seekable files. the difference howe=
ver
is that dd won't accept a seek value which is bigger than the filesize.
hexdump on the other hand will accept any seek value. if it's bigger than t=
he
filesize it outputs the last byte(s) before the EOF.

the dd code dealing with non-seekable files can be found in
/usr/src/bin/dd/position.c:pos_in()

maybe it's possible to use some of it to get rid of getchar().

cheers.
alex
Comment 5 Alexander Best freebsd_committer freebsd_triage 2010-08-29 13:01:55 UTC
State Changed
From-To: open->analyzed

The cause for this issue is the use of getchar() which tests every character 
against EOF. This causes huge overhead as can be seen in this comparison 
between the BSD and Linux hexdump versions: 

FreeBSD:	Linux: 
real 44,85	real 0.00 
user 4,51	user 0.00 
sys 38,76	sys 0.00 

The command used for this was 
'time -p hexdump -n 100 -s 1000000000 /dev/random'. Higher values for -s would 
simply take too much time on FreeBSD. ;) 


Comment 6 Alexander Best freebsd_committer freebsd_triage 2010-08-29 13:01:55 UTC
Responsible Changed
From-To: freebsd-bugs->arundel

Assign to me. Although i don't have commit rights to src i'm working on this 
issue atm.
Comment 7 Alexander Best freebsd_committer freebsd_triage 2011-11-16 19:20:45 UTC
the following patch should fix the issue correctly and without any hacks.

cheers.
alex
Comment 8 Alexander Best freebsd_committer freebsd_triage 2011-11-16 22:04:18 UTC
here are some statistics i did. the output was gathered from running 5 times

'/usr/bin/time -p hexdump -n 100 -s 100m /dev/random'

both with the unpatched and patched version of hexdump(1). as you can see the
improvement in speed is quite dramatic.

cheers.
alex
Comment 9 Alexander Best freebsd_committer freebsd_triage 2011-11-16 22:26:37 UTC
... and the rusage stats. ;)

patched:

real 0,00
user 0,00
sys  0,00
         0  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
       125  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
         2  voluntary context switches
         0  involuntary context switches

unpatched:

real 2,98
user 0,47
sys  2,47
      1080  maximum resident set size
        24  average shared memory size
      2038  average unshared data size
       128  average unshared stack size
       131  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
        16  voluntary context switches
       341  involuntary context switches

cheers.
alex
Comment 10 Alexander Best freebsd_committer freebsd_triage 2011-11-20 16:18:14 UTC
the real issue is that lseek() (and thus fseeko()), don't guarantee that a seek
has been successful. for devices such as tape drives, zero gets returned,
although the seek did not succeed.

so although seeking is not possible on fifos, pipes and sockets, that doesn't
mean that lseek() will work on all other file types.

this means that the provided patch is too simple. hexdump(1) along with all
other userspace applications that want to use lseek() (e.g. via fseeko), need
to handle all possible cases themselves.

a revised patch will be submitted shortly.

cheers.
alex
Comment 11 Alexander Best freebsd_committer freebsd_triage 2011-11-21 21:17:45 UTC
here's a revised patch. basically the new logic, when to seek and when to use
getchar() is:

1) if the file argument is a fifo, pipe or socket   --  goto 4)
2) if the file argument is a tape drive             --  goto 4)
3) for all other cases try fseeko(), if that fails  --  goto 4)

4) use getchar()

it might also be a good idea to mention that hexdump will not fail in case it
is being run against a device without a medium (DVD or Blue-ray) inserted.

it's questionable, whether this behavior is correct or not. strictly speaking,
hexdump does what it has been asked for: skip over some amount of data and
then print what's there.

cheers.
alex
Comment 12 Eitan Adler freebsd_committer freebsd_triage 2012-01-07 05:05:07 UTC
Responsible Changed
From-To: arundel->eadler

arundel has a patch but can't commit. I'll take this as a reminder to 
bug someone about this (and maybe commit it)
Comment 13 Eitan Adler freebsd_committer freebsd_triage 2012-02-13 05:20:48 UTC
State Changed
From-To: analyzed->open

don't like this state
Comment 14 Eitan Adler freebsd_committer freebsd_triage 2012-06-24 21:57:54 UTC
Responsible Changed
From-To: eadler->freebsd-bugs

not dealing with this for a while
Comment 15 commit-hook freebsd_committer freebsd_triage 2018-01-05 01:47:34 UTC
A commit references this bug:

Author: kevans
Date: Fri Jan  5 01:46:41 UTC 2018
New revision: 327567
URL: https://svnweb.freebsd.org/changeset/base/327567

Log:
  hexdump(1): Speed up -s flag on devices

  Using the -s flag on devices is extraordinarily slow due to using fseek(3) a
  little too conservatively. Address this by using fseek on character/block
  devices as well, falling back to getchar(3) only if we fail to seek or we're
  operating on tape drives, where fseek may succeed while not actually being
  supported.

  PR:		86485
  Submitted by:	arundel (originally; modified since then)
  Reviewed by:	cem
  Differential Revision:	https://reviews.freebsd.org/D10939

Changes:
  head/usr.bin/hexdump/display.c
Comment 16 commit-hook freebsd_committer freebsd_triage 2018-01-18 21:59:39 UTC
A commit references this bug:

Author: kevans
Date: Thu Jan 18 21:59:14 UTC 2018
New revision: 328149
URL: https://svnweb.freebsd.org/changeset/base/328149

Log:
  MFC r327567: hexdump(1): Speed up -s flag on devices

  Using the -s flag on devices is extraordinarily slow due to using fseek(3) a
  little too conservatively. Address this by using fseek on character/block
  devices as well, falling back to getchar(3) only if we fail to seek or we're
  operating on tape drives, where fseek may succeed while not actually being
  supported.

  PR:		86485

Changes:
_U  stable/11/
  stable/11/usr.bin/hexdump/display.c