Bug 224552 - 'od -c' show wrong char when it is a non-printable
Summary: 'od -c' show wrong char when it is a non-printable
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 11.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: Kyle Evans
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-23 19:14 UTC by pru13allan
Modified: 2018-01-27 23:24 UTC (History)
2 users (show)

See Also:
kevans: mfc-stable11?


Attachments
Restore the original character if we peeked ahead, but still can't complete (1.76 KB, patch)
2018-01-15 13:11 UTC, Yuri Pankov
no flags Details | Diff
Restore the original character if we peeked ahead, but still can't complete (1.51 KB, patch)
2018-01-15 13:58 UTC, Yuri Pankov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description pru13allan 2017-12-23 19:14:49 UTC
The command 'od -c' show the wrong last character on each line if it is a non-printable in the current language (LANG)

Tested with LANG=sv_SE.UTF-8 on:

FreeBSD fsbd1 10.3-RELEASE-p24 FreeBSD 10.3-RELEASE-p24 #0: Wed Nov 15 04:57:40 UTC 2017
 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

FreeBSD fbsd2 11.1-RELEASE-p4 FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:05:10 UTC 2017
 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  i386

FreeBSD rpi1 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r320146M: Tue Jun 20 09:59:03 MDT 2017
 raspberry@hive.raspbsd.org:/usr/home/brd/rpi3/crochet/work/obj/arm64.aarch64/usr/src/sys/GENERIC  arm64

Steps to reproduce:

This is OK
[root]# printf 'abcde\345\344\366\374\351\305\304\326\334\311\n' | od -c
0000000    a   b   c   d   e 345 344 366 374 351 305 304 326 334 311  \n

When the string gets longer and 'od' start a new line the last octet get scrambled.
The '012' should be '311':
[root]# printf 'Xabcde\345\344\366\374\351\305\304\326\334\311\n' | od -c
0000000    X   a   b   c   d   e 345 344 366 374 351 305 304 326 334 012
0000020   \n                                                            

When the string get even longer it is obvious that 'od' is copying first char on
second line to last char on first line:
(Last char on first line ('311') should be '334')
[root]# printf 'XYabcde\345\344\366\374\351\305\304\326\334\311\n' | od -c
0000000    X   Y   a   b   c   d   e 345 344 366 374 351 305 304 326 311
0000020  311  \n

If you are on a system other than UTF-8 you can get the same symptoms by specifying LC_ALL:
printf 'XYabcde\345\344\366\374\351\305\304\326\334\311\n' | LC_ALL=en_US.UTF-8 od -c
Comment 1 Yuri Pankov 2018-01-15 13:11:56 UTC
Created attachment 189747 [details]
Restore the original character if we peeked ahead, but still can't complete
Comment 2 Yuri Pankov 2018-01-15 13:41:24 UTC
The problem here is that we forget to restore the original character to print when we encounter what we think is incomplete multibyte sequence at the end of the line buffer or at EOF -- we peek ahead trying to complete the character, but if the conversion still fails, we have the character pointer set to the look-ahead buffer, not what it was originally.
Comment 3 Yuri Pankov 2018-01-15 13:58:35 UTC
Created attachment 189748 [details]
Restore the original character if we peeked ahead, but still can't complete
Comment 4 Kyle Evans freebsd_committer freebsd_triage 2018-01-18 03:20:41 UTC
Taking this; Yuri has opened a review as D13963 [1].

[1] https://reviews.freebsd.org/D13963
Comment 5 commit-hook freebsd_committer freebsd_triage 2018-01-20 02:50:34 UTC
A commit references this bug:

Author: kevans
Date: Sat Jan 20 02:49:33 UTC 2018
New revision: 328188
URL: https://svnweb.freebsd.org/changeset/base/328188

Log:
  od(1): Fix wrong output for some corner cases in multibyte locales.

  Restore the original character to print if we used the look-ahead
  buffer, but that didn't help -- we either got an illegal sequence
  or still can't complete.

  PR:		224552
  Submitted by:	Yuri Pankov
  MFC after:	1 week
  Differential Revision:	https://reviews.freebsd.org/D13963

Changes:
  head/usr.bin/hexdump/conv.c
  head/usr.bin/hexdump/tests/Makefile
  head/usr.bin/hexdump/tests/d_od_cflag_a.out
  head/usr.bin/hexdump/tests/d_od_cflag_b.out
  head/usr.bin/hexdump/tests/od_test.sh
Comment 6 commit-hook freebsd_committer freebsd_triage 2018-01-27 23:20:41 UTC
A commit references this bug:

Author: kevans
Date: Sat Jan 27 23:20:02 UTC 2018
New revision: 328500
URL: https://svnweb.freebsd.org/changeset/base/328500

Log:
  MFC r328188,r328189,r328200: Fix wrong output for multibyte corner cases

  MFC r328188: od(1): Fix wrong output for corner cases in multibyte locales.

  Restore the original character to print if we used the look-ahead
  buffer, but that didn't help -- we either got an illegal sequence
  or still can't complete.

  MFC r328189: od(1): Fix mis-patch from r328188

  od_test.sh got duplicated erroneously when it was added in r328188. Dedup.

  MFC r328200: Silence the gcc warning: 'op' may be used uninitialized in this
  function

  PR:		224552

Changes:
_U  stable/11/
  stable/11/usr.bin/hexdump/conv.c
  stable/11/usr.bin/hexdump/tests/Makefile
  stable/11/usr.bin/hexdump/tests/d_od_cflag_a.out
  stable/11/usr.bin/hexdump/tests/d_od_cflag_b.out
  stable/11/usr.bin/hexdump/tests/od_test.sh