The command 'od -c' show the wrong last character on each line if it is a non-printable in the current language (LANG) Tested with LANG=sv_SE.UTF-8 on: FreeBSD fsbd1 10.3-RELEASE-p24 FreeBSD 10.3-RELEASE-p24 #0: Wed Nov 15 04:57:40 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD fbsd2 11.1-RELEASE-p4 FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:05:10 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC i386 FreeBSD rpi1 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r320146M: Tue Jun 20 09:59:03 MDT 2017 raspberry@hive.raspbsd.org:/usr/home/brd/rpi3/crochet/work/obj/arm64.aarch64/usr/src/sys/GENERIC arm64 Steps to reproduce: This is OK [root]# printf 'abcde\345\344\366\374\351\305\304\326\334\311\n' | od -c 0000000 a b c d e 345 344 366 374 351 305 304 326 334 311 \n When the string gets longer and 'od' start a new line the last octet get scrambled. The '012' should be '311': [root]# printf 'Xabcde\345\344\366\374\351\305\304\326\334\311\n' | od -c 0000000 X a b c d e 345 344 366 374 351 305 304 326 334 012 0000020 \n When the string get even longer it is obvious that 'od' is copying first char on second line to last char on first line: (Last char on first line ('311') should be '334') [root]# printf 'XYabcde\345\344\366\374\351\305\304\326\334\311\n' | od -c 0000000 X Y a b c d e 345 344 366 374 351 305 304 326 311 0000020 311 \n If you are on a system other than UTF-8 you can get the same symptoms by specifying LC_ALL: printf 'XYabcde\345\344\366\374\351\305\304\326\334\311\n' | LC_ALL=en_US.UTF-8 od -c
Created attachment 189747 [details] Restore the original character if we peeked ahead, but still can't complete
The problem here is that we forget to restore the original character to print when we encounter what we think is incomplete multibyte sequence at the end of the line buffer or at EOF -- we peek ahead trying to complete the character, but if the conversion still fails, we have the character pointer set to the look-ahead buffer, not what it was originally.
Created attachment 189748 [details] Restore the original character if we peeked ahead, but still can't complete
Taking this; Yuri has opened a review as D13963 [1]. [1] https://reviews.freebsd.org/D13963
A commit references this bug: Author: kevans Date: Sat Jan 20 02:49:33 UTC 2018 New revision: 328188 URL: https://svnweb.freebsd.org/changeset/base/328188 Log: od(1): Fix wrong output for some corner cases in multibyte locales. Restore the original character to print if we used the look-ahead buffer, but that didn't help -- we either got an illegal sequence or still can't complete. PR: 224552 Submitted by: Yuri Pankov MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13963 Changes: head/usr.bin/hexdump/conv.c head/usr.bin/hexdump/tests/Makefile head/usr.bin/hexdump/tests/d_od_cflag_a.out head/usr.bin/hexdump/tests/d_od_cflag_b.out head/usr.bin/hexdump/tests/od_test.sh
A commit references this bug: Author: kevans Date: Sat Jan 27 23:20:02 UTC 2018 New revision: 328500 URL: https://svnweb.freebsd.org/changeset/base/328500 Log: MFC r328188,r328189,r328200: Fix wrong output for multibyte corner cases MFC r328188: od(1): Fix wrong output for corner cases in multibyte locales. Restore the original character to print if we used the look-ahead buffer, but that didn't help -- we either got an illegal sequence or still can't complete. MFC r328189: od(1): Fix mis-patch from r328188 od_test.sh got duplicated erroneously when it was added in r328188. Dedup. MFC r328200: Silence the gcc warning: 'op' may be used uninitialized in this function PR: 224552 Changes: _U stable/11/ stable/11/usr.bin/hexdump/conv.c stable/11/usr.bin/hexdump/tests/Makefile stable/11/usr.bin/hexdump/tests/d_od_cflag_a.out stable/11/usr.bin/hexdump/tests/d_od_cflag_b.out stable/11/usr.bin/hexdump/tests/od_test.sh