Bug 118723 - [patch] od(1)/hexdump(1) truncates last partial repeat line
Summary: [patch] od(1)/hexdump(1) truncates last partial repeat line
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Xin LI
URL: https://reviews.freebsd.org/D40471
Keywords: patch
Depends on:
Blocks:
 
Reported: 2007-12-15 18:20 UTC by Mark Adler
Modified: 2023-06-09 01:40 UTC (History)
1 user (show)

See Also:


Attachments
file.diff (986 bytes, patch)
2007-12-15 18:20 UTC, Mark Adler
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Adler 2007-12-15 18:20:01 UTC
od and hexdump truncates the last line of the dump when that line
partially repeats the previous line.

Fix: Patch relative to latest version (1.22):
How-To-Repeat: Create this 40-byte file (including five newline characters):

abcdefg 
0123456 
abcdefg 
0123456 
abcdefg 

(Just the first three lines, 24 bytes, shows the same problem, but I
included five lines to make the structure more evident.)

od on that file incorrectly gives:
0000000    060542  061544  062546  063412  030061  031063  032065 033012
*
0000040

which leaves off the last line "abcdefg\n", and incorrectly shows the
final address as 40 octal (32 decimal).

Similarly, hexdump gives the following with the same issues:

0000000 6162 6364 6566 670a 3031 3233 3435 360a
*
0000020

With the patch below, od then correctly displays:

0000000    060542  061544  062546  063412  030061  031063  032065 033012
*
0000040    060542  061544  062546 063412
0000050

and hexdump then correctly displays:

0000000 6162 6364 6566 670a 3031 3233 3435 360a
*
0000020 6162 6364 6566 670a
0000028
Comment 1 Gavin Atkinson freebsd_committer freebsd_triage 2008-06-23 18:52:43 UTC
Responsible Changed
From-To: freebsd-bugs->gcooper

gcooper is looking into this
Comment 2 Enji Cooper freebsd_committer freebsd_triage 2008-12-27 09:41:12 UTC
Can't reproduce on 7.1rc2 / amd64 nor OSX 10.5.5 / x86. May be ppc or  
an endiam specific issue.

Will verify on a G5 with OSX 10.4.11 when I get back to SJC.

-Garrett
Comment 3 Mark Adler 2008-12-27 17:40:46 UTC
On Dec 27, 2008, at 1:41 AM, Garrett Cooper wrote:
> Can't reproduce on 7.1rc2 / amd64 nor OSX 10.5.5 / x86. May be ppc  
> or an endiam specific issue.
> Will verify on a G5 with OSX 10.4.11 when I get back to SJC.

Garrett,

I see the bug on my x86 with OS X 10.5.6.  See below.  (The version of  
od in my ~/bin directory has my patch.)  It is not a PPC-unique or  
endian issue.

Make sure that your test file is exactly as below, with no trailing  
blanks on the lines and only a single newline character per line.  To  
see the bug, the file must repeat every 16 bytes, and terminate in the  
middle of the last 16 bytes.  The bug is that the last partial repeat  
is not displayed.

Mark


gromit% cat xx
abcdefg
0123456
abcdefg
0123456
abcdefg
gromit% /usr/bin/od xx
0000000    061141  062143  063145  005147  030460  031462  032464   
005066
*
0000040
gromit% ~/bin/od xx
0000000    061141  062143  063145  005147  030460  031462  032464   
005066
*
0000040    061141  062143  063145  005147
0000050
gromit% /usr/bin/od -x xx
0000000      6261    6463    6665    0a67    3130    3332    3534     
0a36
*
0000040
gromit% ~/bin/od -x xx
0000000      6261    6463    6665    0a67    3130    3332    3534     
0a36
*
0000040      6261    6463    6665    0a67
0000050
gromit% uname -a
Darwin gromit 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00  
PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386
Comment 4 Garrett Cooper 2010-02-08 04:25:10 UTC
Ok... I've gotten back to this bug and here are my comments on the
issue at hand:

1. The item you're seeing is a valid *cosmetic* bug -- the address
isn't being incremented properly, so it gives you the impression that
the data being represented is invalid. See the following for a valid
representation of the text file with %_u:

[root@optimus /scratch/src/head/usr.bin/hexdump]# hexdump -e '"%_u\n"'
-v test_118723.txt
a
b
c
d
e
f
g
lf
0
1
2
3
4
5
6
lf
a
b
c
d
e
f
g
lf
0
1
2
3
4
5
6
lf
a
b
c
d
e
f
g
lf

versus the non- -v output:

[root@optimus /scratch/src/head/usr.bin/hexdump]# hexdump -e '"%_u\n"'
test_118723.txt
a
b
c
d
e
f
g
lf
0
1
2
3
4
5
6
lf
a
b
c
d
e
f
g
lf
0
1
2
3
4
5
6
lf
a
b
c
d
e
f
g
lf

Please note that the data represented in both forms is equivalent.
2. The patch you provide breaks -v output, which isn't desirable...

Here's what should be done to fix the cosmetic issue:

Index: display.c
===================================================================
--- display.c	(revision 203332)
+++ display.c	(working copy)
@@ -264,18 +264,20 @@
 			if (need == blocksize)
 				return((u_char *)NULL);
 			/*
-			 * XXX bcmp() is not quite right in the presence
-			 * of multibyte characters.
+			 * We need to explicitly bump the address displayed even
+			 * if -v isn't specified, or the address doesn't
+			 * increment with repeated output...
+			 * Please see bin/118723 for more details.
 			 */
+			eaddress = address + nread;
 			if (vflag != ALL &&
 			    valid_save &&
-			    bcmp(curp, savp, nread) == 0) {
+			    memcmp(curp, savp, nread) == 0) {
 				if (vflag != DUP)
 					(void)printf("*\n");
 				return((u_char *)NULL);
 			}
 			bzero((char *)curp + nread, need);
-			eaddress = address + nread;
 			return(curp);
 		}
 		n = fread((char *)curp + nread, sizeof(u_char),
@@ -290,13 +292,9 @@
 		if (length != -1)
 			length -= n;
 		if (!(need -= n)) {
-			/*
-			 * XXX bcmp() is not quite right in the presence
-			 * of multibyte characters.
-			 */
 			if (vflag == ALL || vflag == FIRST ||
 			    valid_save == 0 ||
-			    bcmp(curp, savp, blocksize) != 0) {
+			    memcmp(curp, savp, blocksize) != 0) {
 				if (vflag == DUP || vflag == FIRST)
 					vflag = WAIT;
 				return(curp);

Please let me know if you agree to the change.

Thanks,
-Garrett
Comment 5 Mark Adler 2010-02-08 05:29:51 UTC
Garrett,

Thanks for looking into the bug.  I had to fire up the way-back machine =
to remember what you were talking about!

On Feb 7, 2010, at 8:25 PM, Garrett Cooper wrote:
> 1. The item you're seeing is a valid *cosmetic* bug --

I don't know what you mean by "cosmetic", but without the patch you can =
have two different input files of different lengths producing exactly =
the same output.  I'd call that simply a bug.

> Please let me know if you agree to the change.

Yes, I agree to the change.  Moving the "eaddress =3D address + nread;" =
before the "if" is the salient portion of the patch and does the trick.  =
The final address now correctly indicates the length of the input, which =
allows the original input to be reconstructed from the dump.

I don't actually have your current source (I noticed some differences in =
the unchanged portion of the diff from the version I have).  However I =
suspect that those changes would not affect this behavior.

Mark
Comment 6 Garrett Cooper 2010-02-08 06:33:47 UTC
On Sun, Feb 7, 2010 at 9:29 PM, Mark Adler <madler@alumni.caltech.edu> wrot=
e:
> Garrett,
>
> Thanks for looking into the bug. =A0I had to fire up the way-back machine=
 to remember what you were talking about!

:)...

> On Feb 7, 2010, at 8:25 PM, Garrett Cooper wrote:
>> 1. The item you're seeing is a valid *cosmetic* bug --
>
> I don't know what you mean by "cosmetic", but without the patch you can h=
ave two different input files of different lengths producing exactly the sa=
me output. =A0I'd call that simply a bug.

Sorry it's a Cisco-ism -- cosmetic bugs are bugs which are
functionally correct, but the data is misrepresented in some manner.
This might be considered a `minor' bug as well.

>> Please let me know if you agree to the change.
>
> Yes, I agree to the change. =A0Moving the "eaddress =3D address + nread;"=
 before the "if" is the salient portion of the patch and does the trick. =
=A0The final address now correctly indicates the length of the input, which=
 allows the original input to be reconstructed from the dump.

Brilliant -- that's what I figured it was after thinking through the
process a bit more thoroughly.

> I don't actually have your current source (I noticed some differences in =
the unchanged portion of the diff from the version I have). =A0However I su=
spect that those changes would not affect this behavior.

Right. I have a few items in the pipeline that I'm working on for this
utility that I'm trying to close out in one commit. The other items in
this file are benign as they affect different pieces of hexdump.

Thanks for checking and I'll add a regression test for this as well in
the final commit :).

Cheers!
-Garrett
Comment 7 Chris Rees freebsd_committer freebsd_triage 2012-07-02 21:45:22 UTC
Responsible Changed
From-To: gcooper->freebsd-bugs

gcooper is not a developer
Comment 8 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:52 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 9 Graham Perrin freebsd_committer freebsd_triage 2022-10-17 12:38:35 UTC
Keyword: 

    patch
or  patch-ready

– in lieu of summary line prefix: 

    [patch]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>
Comment 10 Xin LI freebsd_committer freebsd_triage 2023-06-08 07:13:16 UTC
I think the partial line should never be considered as a match of earlier line. So it seems that we should just check if "need" is a non-zero value:

diff --git a/usr.bin/hexdump/display.c b/usr.bin/hexdump/display.c
index 36306ededfc6..0bf1f1878eda 100644
--- a/usr.bin/hexdump/display.c
+++ b/usr.bin/hexdump/display.c
@@ -271,7 +271,7 @@ get(void)
 			 * XXX bcmp() is not quite right in the presence
 			 * of multibyte characters.
 			 */
-			if (vflag != ALL && 
+			if (!need && vflag != ALL &&
 			    valid_save && 
 			    bcmp(curp, savp, nread) == 0) {
 				if (vflag != DUP) {


I've also created some test cases to verify the behavior for hexdump -C and hexdump -Cv.
Comment 11 commit-hook freebsd_committer freebsd_triage 2023-06-09 01:40:31 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d1016568185ddacbf1148acbe26b27258981b4f0

commit d1016568185ddacbf1148acbe26b27258981b4f0
Author:     Xin LI <delphij@FreeBSD.org>
AuthorDate: 2023-06-09 01:38:47 +0000
Commit:     Xin LI <delphij@FreeBSD.org>
CommitDate: 2023-06-09 01:39:05 +0000

    hexdump: Partial lines cannot be repetitions of earlier lines.

    When checking for repetitions of earlier lines, we compare the
    first nread bytes of the line against the saved line. However,
    when we read a partial line, it should never be treated as a
    repetition of an earlier line, even if the first bytes match.

    This change fixes a bug where a partial line could be
    incorrectly identified as a repetition of an earlier line.

    Reported-by:    Mark Adler <madler@alumni.caltech.edu>
    PR:             118723
    Reviewed-by:    emaste
    MFC-after:      2 weeks
    Differential Revision: https://reviews.freebsd.org/D40471

 usr.bin/hexdump/display.c                                   | 2 +-
 usr.bin/hexdump/tests/Makefile                              | 3 +++
 usr.bin/hexdump/tests/d_hexdump_UCflag_bug118723.out (new)  | 4 ++++
 usr.bin/hexdump/tests/d_hexdump_UCvflag_bug118723.out (new) | 4 ++++
 usr.bin/hexdump/tests/d_hexdump_bug118723.in (new)          | 5 +++++
 usr.bin/hexdump/tests/hexdump_test.sh                       | 8 ++++++++
 6 files changed, 25 insertions(+), 1 deletion(-)