Bug 241679 - /usr/bin/sort fails if UTF-8 input is received from stdin
Summary: /usr/bin/sort fails if UTF-8 input is received from stdin
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.0-RELEASE
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-02 21:09 UTC by Ronald F. Guilmette
Modified: 2019-11-04 14:22 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ronald F. Guilmette 2019-11-02 21:09:18 UTC
If /usr/bin/sort is given the name of a file on the command line and if that file contains lines which themselves contain UTF-8 encoded data, then the sort utility will function normally and as expexcted.  If on the other hand the same file content is supplied to /usr/bin/sort via its stdin channel, then sort will fail completely and will issue the following error message:

sort: Illegal byte sequence

The can be verified by placing the following line into a file called "test" and
then attempting to sort that file content in different ways:

zürich.email

Example #1:
sort test

The above works just fine.

Example #2:
sort < test

The above fails and issues the error:
sort: Illegal byte sequence
Comment 1 Conrad Meyer freebsd_committer 2019-11-03 18:05:59 UTC
I can't reproduce the issue on 13-CURRENT with the following environment:

LANG=en_US.UTF-8

I also cannot reproduce it on 13-CURRENT with unset LANG or LANG=C or C.US-ASCII, or LC_CTYPE set to any weird non-UTF8 charset I know of.
Comment 2 Ed Maste freebsd_committer 2019-11-03 19:27:07 UTC
Could not reproduce on ref12-amd64.freebsd.org either; perhaps this was fixed on stable/12 after 12.0 released.
Comment 3 David Demelier 2019-11-04 14:22:54 UTC
Can you please tell us what's your shell too? And of course:

file test
hexdump -c test
hexdump -x test