od and hexdump truncates the last line of the dump when that line partially repeats the previous line. Fix: Patch relative to latest version (1.22): How-To-Repeat: Create this 40-byte file (including five newline characters): abcdefg 0123456 abcdefg 0123456 abcdefg (Just the first three lines, 24 bytes, shows the same problem, but I included five lines to make the structure more evident.) od on that file incorrectly gives: 0000000 060542 061544 062546 063412 030061 031063 032065 033012 * 0000040 which leaves off the last line "abcdefg\n", and incorrectly shows the final address as 40 octal (32 decimal). Similarly, hexdump gives the following with the same issues: 0000000 6162 6364 6566 670a 3031 3233 3435 360a * 0000020 With the patch below, od then correctly displays: 0000000 060542 061544 062546 063412 030061 031063 032065 033012 * 0000040 060542 061544 062546 063412 0000050 and hexdump then correctly displays: 0000000 6162 6364 6566 670a 3031 3233 3435 360a * 0000020 6162 6364 6566 670a 0000028
Responsible Changed From-To: freebsd-bugs->gcooper gcooper is looking into this
Can't reproduce on 7.1rc2 / amd64 nor OSX 10.5.5 / x86. May be ppc or an endiam specific issue. Will verify on a G5 with OSX 10.4.11 when I get back to SJC. -Garrett
On Dec 27, 2008, at 1:41 AM, Garrett Cooper wrote: > Can't reproduce on 7.1rc2 / amd64 nor OSX 10.5.5 / x86. May be ppc > or an endiam specific issue. > Will verify on a G5 with OSX 10.4.11 when I get back to SJC. Garrett, I see the bug on my x86 with OS X 10.5.6. See below. (The version of od in my ~/bin directory has my patch.) It is not a PPC-unique or endian issue. Make sure that your test file is exactly as below, with no trailing blanks on the lines and only a single newline character per line. To see the bug, the file must repeat every 16 bytes, and terminate in the middle of the last 16 bytes. The bug is that the last partial repeat is not displayed. Mark gromit% cat xx abcdefg 0123456 abcdefg 0123456 abcdefg gromit% /usr/bin/od xx 0000000 061141 062143 063145 005147 030460 031462 032464 005066 * 0000040 gromit% ~/bin/od xx 0000000 061141 062143 063145 005147 030460 031462 032464 005066 * 0000040 061141 062143 063145 005147 0000050 gromit% /usr/bin/od -x xx 0000000 6261 6463 6665 0a67 3130 3332 3534 0a36 * 0000040 gromit% ~/bin/od -x xx 0000000 6261 6463 6665 0a67 3130 3332 3534 0a36 * 0000040 6261 6463 6665 0a67 0000050 gromit% uname -a Darwin gromit 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386
Ok... I've gotten back to this bug and here are my comments on the issue at hand: 1. The item you're seeing is a valid *cosmetic* bug -- the address isn't being incremented properly, so it gives you the impression that the data being represented is invalid. See the following for a valid representation of the text file with %_u: [root@optimus /scratch/src/head/usr.bin/hexdump]# hexdump -e '"%_u\n"' -v test_118723.txt a b c d e f g lf 0 1 2 3 4 5 6 lf a b c d e f g lf 0 1 2 3 4 5 6 lf a b c d e f g lf versus the non- -v output: [root@optimus /scratch/src/head/usr.bin/hexdump]# hexdump -e '"%_u\n"' test_118723.txt a b c d e f g lf 0 1 2 3 4 5 6 lf a b c d e f g lf 0 1 2 3 4 5 6 lf a b c d e f g lf Please note that the data represented in both forms is equivalent. 2. The patch you provide breaks -v output, which isn't desirable... Here's what should be done to fix the cosmetic issue: Index: display.c =================================================================== --- display.c (revision 203332) +++ display.c (working copy) @@ -264,18 +264,20 @@ if (need == blocksize) return((u_char *)NULL); /* - * XXX bcmp() is not quite right in the presence - * of multibyte characters. + * We need to explicitly bump the address displayed even + * if -v isn't specified, or the address doesn't + * increment with repeated output... + * Please see bin/118723 for more details. */ + eaddress = address + nread; if (vflag != ALL && valid_save && - bcmp(curp, savp, nread) == 0) { + memcmp(curp, savp, nread) == 0) { if (vflag != DUP) (void)printf("*\n"); return((u_char *)NULL); } bzero((char *)curp + nread, need); - eaddress = address + nread; return(curp); } n = fread((char *)curp + nread, sizeof(u_char), @@ -290,13 +292,9 @@ if (length != -1) length -= n; if (!(need -= n)) { - /* - * XXX bcmp() is not quite right in the presence - * of multibyte characters. - */ if (vflag == ALL || vflag == FIRST || valid_save == 0 || - bcmp(curp, savp, blocksize) != 0) { + memcmp(curp, savp, blocksize) != 0) { if (vflag == DUP || vflag == FIRST) vflag = WAIT; return(curp); Please let me know if you agree to the change. Thanks, -Garrett
Garrett, Thanks for looking into the bug. I had to fire up the way-back machine = to remember what you were talking about! On Feb 7, 2010, at 8:25 PM, Garrett Cooper wrote: > 1. The item you're seeing is a valid *cosmetic* bug -- I don't know what you mean by "cosmetic", but without the patch you can = have two different input files of different lengths producing exactly = the same output. I'd call that simply a bug. > Please let me know if you agree to the change. Yes, I agree to the change. Moving the "eaddress =3D address + nread;" = before the "if" is the salient portion of the patch and does the trick. = The final address now correctly indicates the length of the input, which = allows the original input to be reconstructed from the dump. I don't actually have your current source (I noticed some differences in = the unchanged portion of the diff from the version I have). However I = suspect that those changes would not affect this behavior. Mark
On Sun, Feb 7, 2010 at 9:29 PM, Mark Adler <madler@alumni.caltech.edu> wrot= e: > Garrett, > > Thanks for looking into the bug. =A0I had to fire up the way-back machine= to remember what you were talking about! :)... > On Feb 7, 2010, at 8:25 PM, Garrett Cooper wrote: >> 1. The item you're seeing is a valid *cosmetic* bug -- > > I don't know what you mean by "cosmetic", but without the patch you can h= ave two different input files of different lengths producing exactly the sa= me output. =A0I'd call that simply a bug. Sorry it's a Cisco-ism -- cosmetic bugs are bugs which are functionally correct, but the data is misrepresented in some manner. This might be considered a `minor' bug as well. >> Please let me know if you agree to the change. > > Yes, I agree to the change. =A0Moving the "eaddress =3D address + nread;"= before the "if" is the salient portion of the patch and does the trick. = =A0The final address now correctly indicates the length of the input, which= allows the original input to be reconstructed from the dump. Brilliant -- that's what I figured it was after thinking through the process a bit more thoroughly. > I don't actually have your current source (I noticed some differences in = the unchanged portion of the diff from the version I have). =A0However I su= spect that those changes would not affect this behavior. Right. I have a few items in the pipeline that I'm working on for this utility that I'm trying to close out in one commit. The other items in this file are benign as they affect different pieces of hexdump. Thanks for checking and I'll add a regression test for this as well in the final commit :). Cheers! -Garrett
Responsible Changed From-To: gcooper->freebsd-bugs gcooper is not a developer
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>
I think the partial line should never be considered as a match of earlier line. So it seems that we should just check if "need" is a non-zero value: diff --git a/usr.bin/hexdump/display.c b/usr.bin/hexdump/display.c index 36306ededfc6..0bf1f1878eda 100644 --- a/usr.bin/hexdump/display.c +++ b/usr.bin/hexdump/display.c @@ -271,7 +271,7 @@ get(void) * XXX bcmp() is not quite right in the presence * of multibyte characters. */ - if (vflag != ALL && + if (!need && vflag != ALL && valid_save && bcmp(curp, savp, nread) == 0) { if (vflag != DUP) { I've also created some test cases to verify the behavior for hexdump -C and hexdump -Cv.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d1016568185ddacbf1148acbe26b27258981b4f0 commit d1016568185ddacbf1148acbe26b27258981b4f0 Author: Xin LI <delphij@FreeBSD.org> AuthorDate: 2023-06-09 01:38:47 +0000 Commit: Xin LI <delphij@FreeBSD.org> CommitDate: 2023-06-09 01:39:05 +0000 hexdump: Partial lines cannot be repetitions of earlier lines. When checking for repetitions of earlier lines, we compare the first nread bytes of the line against the saved line. However, when we read a partial line, it should never be treated as a repetition of an earlier line, even if the first bytes match. This change fixes a bug where a partial line could be incorrectly identified as a repetition of an earlier line. Reported-by: Mark Adler <madler@alumni.caltech.edu> PR: 118723 Reviewed-by: emaste MFC-after: 2 weeks Differential Revision: https://reviews.freebsd.org/D40471 usr.bin/hexdump/display.c | 2 +- usr.bin/hexdump/tests/Makefile | 3 +++ usr.bin/hexdump/tests/d_hexdump_UCflag_bug118723.out (new) | 4 ++++ usr.bin/hexdump/tests/d_hexdump_UCvflag_bug118723.out (new) | 4 ++++ usr.bin/hexdump/tests/d_hexdump_bug118723.in (new) | 5 +++++ usr.bin/hexdump/tests/hexdump_test.sh | 8 ++++++++ 6 files changed, 25 insertions(+), 1 deletion(-)