Bug 252896 - usr.bin/cut when used with -w will replace first delimiter with \t delimter
Summary: usr.bin/cut when used with -w will replace first delimiter with \t delimter
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-21 22:18 UTC by antispam007
Modified: 2021-08-20 11:46 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description antispam007 2021-01-21 22:18:55 UTC
It seems that cut is replacing Space Char with a Tab Char when you use the special function -w for delimiters

Fast proof could be a 
     ls -al | cut -w -f x- (where x the field number)
you will get someting like
     filed-x <TAB> field-x+1 <SPC> field-x+2 <SPC> ...
instead of 
     filed-x <SPC> field-x+1 <SPC> field-x+2 <SPC> ...

You could verify it with hd(1) where you see 09 for <TAB> and <20> for SPC

Expected behaviour would be, that cut is not touching/changing the field delimiter
Comment 1 Yuri 2021-08-20 10:54:48 UTC
The question is what to assume as output field delimiter in -w case, as it consumes whitespace/tab sequence as input one -- -w is not standard, and is not supported in GNU version of cut (i.e. we have nothing to compare behavior with).

What you are showing seems to be the corner case (and, possibly, a real bug) of using 'x-' range; if you use any other, e.g. 1,2,3,... or 1-10, all output seems to be delimited by tab characters.

More so, example output you are providing is not correct, x+1..x+2..x+n fields are not delimited by single whitespace, and rather keep the original number of whitespaces between them.
Comment 2 antispam007 2021-08-20 11:46:00 UTC
The expectation is from the man page "...Consecutive spaces and tabs count as one single field separator" so ther should be no change of the type of white space. If the first of the series of white space is a tab, regardless the other whitespaces should be deleted. The same if the first is a white space is a space. This would also be logical: reduce a series of whitespaces regardless what type to a single one.

As there is a different behaviour also in the usage with the field numbers, I am pretty sure that it is a bug. The bug has two sub-bugs: I would not expect the change of the leading white space type ( but that could be a feature that could be described in man) and the second , that it has diffrent behaviour when the number of fields is not given.

At least a useful workaround could be to update man: "-w Use whitespace (spaces and tabs) as the delimiter.  Consecutive spaces and tabs count as one single field separator. All consecutive field separator will be replaced with a single tab when used with a known range of fields"