Bug 264559 - unexpand: nonconformantly (to both POSIX and heirloom) replaces single spaces with tabs, sometimes breaking reversibility
Summary: unexpand: nonconformantly (to both POSIX and heirloom) replaces single spaces...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-08 23:15 UTC by наб
Modified: 2022-06-09 17:24 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description наб 2022-06-08 23:15:45 UTC
In many ways this report mirrors https://bugs.debian.org/1012545:

  printf 'a  b' | unexpand -t1 | cat -A
  printf 'a  b' | unexpand -t2 | cat -A
both yield
  a^I b
and
  printf 'a  b' | unexpand -t2,3 | cat -A
yields
  a^Ib

According to 4.2BSD:
  If the -a option is given, then tabs are inserted whenever they would
  compress the resultant file by replacing two or more characters.
of course, heirloom unexpand doesn't take tab lists,
but this is still wrong according to Issue 7
(quoth IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008)):
  In addition to translating <blank> characters at the beginning of each
  line, translate all sequences of two or more <blank> characters
  immediately preceding a tab stop to the maximum number of <tab>
  characters followed by the minimum number of <space> characters needed
  to fill the same column positions originally filled by the translated
  <blank> characters.

The correct output for all three is, of course:
  a  b
(NetBSD and the illumos gate agree;
 coreutils is broken differently, but that's unrelated).

Best,
наб
Comment 1 наб 2022-06-08 23:17:58 UTC
Well, this is worse than "nonconformantly" in the 2,3 case: when reexpanding, you'll get "a b", which is a different string!
Comment 2 наб 2022-06-09 17:24:17 UTC
Some more, this time /exorbitantly/ broken:
  printf ' ermrxsmg \tjrjc ngsoo\n' | unexpand -t2          | cat -A
  printf ' ermrxsmg \tjrjc ngsoo\n' | unexpand -t4,55,68,78 | cat -A
give
  | ermrxsmg^I^Ijrjc ngsoo$
  |^Iermrxsmg^Ijrjc^Ingsoo$
whereas the correct output is
  | ermrxsmg ^Ijrjc ngsoo$
  | ermrxsmg ^Ijrjc ngsoo$

After reexpansion, the first one is fine:
  | ermrxsmg   jrjc ngsoo$
but the second one very obviously isn't:
  |    ermrxsmg                                           jrjc         ngsoo$