If /usr/bin/sort is given the name of a file on the command line and if that file contains lines which themselves contain UTF-8 encoded data, then the sort utility will function normally and as expexcted. If on the other hand the same file content is supplied to /usr/bin/sort via its stdin channel, then sort will fail completely and will issue the following error message: sort: Illegal byte sequence The can be verified by placing the following line into a file called "test" and then attempting to sort that file content in different ways: zürich.email Example #1: sort test The above works just fine. Example #2: sort < test The above fails and issues the error: sort: Illegal byte sequence
I can't reproduce the issue on 13-CURRENT with the following environment: LANG=en_US.UTF-8 I also cannot reproduce it on 13-CURRENT with unset LANG or LANG=C or C.US-ASCII, or LC_CTYPE set to any weird non-UTF8 charset I know of.
Could not reproduce on ref12-amd64.freebsd.org either; perhaps this was fixed on stable/12 after 12.0 released.
Can you please tell us what's your shell too? And of course: file test hexdump -c test hexdump -x test
Please reopen if it's still a problem and there's a way to reproduce it.
Here is how to reproduce the issue: FreeBSD 12.3-RELEASE-p2 FreeBSD 12.3-RELEASE-p5 host: /tmp# echo $LANG en_US.UTF-8 host: /tmp# cat a | sort sort: Illegal byte sequence host: /tmp# cut -d" " -f 1 a cut: a: Illegal byte sequence host: /tmp# sort a gelöscht host: /tmp# sort a > b host: /tmp# diff a b 1c1 < gel▒scht --- > gelöscht host: /tmp# file a b a: ISO-8859 text b: UTF-8 Unicode text host: /tmp# hexdump -c a 0000000 g e l ▒ s c h t \n 0000009 host: /tmp# hexdump -c b 0000000 g e l ▒ ▒ s c h t \n 000000a host: /tmp# hexdump -x a 0000000 6567 f66c 6373 7468 000a 0000009 host: /tmp# hexdump -x b 0000000 6567 c36c 73b6 6863 0a74 000000a host: /tmp# ls -l a b -rw-r--r-- 1 root wheel 9 Jun 7 20:11 a -rw-r--r-- 1 root wheel 10 Jun 7 20:20 b host: /tmp# my initial issue was with a directory (here in reverse to reproduce) host: /tmp# mkdir `cat a` host: /tmp# find . -type d | sort sort: Illegal byte sequence host: /tmp# rmdir gel\366scht/ host: /tmp# I'm unable to re-open the bug from here.
Re-open after steps to reproduce the issue have been added to the PR.
https://reviews.freebsd.org/D36948
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b58094c0d98e5d0ab8abdcdb655ac902ae8ad66e commit b58094c0d98e5d0ab8abdcdb655ac902ae8ad66e Author: Baptiste Daroussin <bapt@FreeBSD.org> AuthorDate: 2022-10-12 13:31:50 +0000 Commit: Baptiste Daroussin <bapt@FreeBSD.org> CommitDate: 2022-10-12 15:37:33 +0000 sort: replace home made line reader by getdelim(3) The previous code had bug when reading lines with an unexpected encoding, returning without the full line being captured. This result in sort complaining with "sort: Illegal byte sequence" Using getdelim(3) instead of the home made code, fixes the situation. PR: 241679 Reported by: Ronald F. Guilmette <rfg-freebsd@tristatelogic.com> MFC After: 1 week Reviewed by: markj, imp Differential Revision: https://reviews.freebsd.org/D36948 usr.bin/sort/bwstring.c | 109 ------------------------------------------------ usr.bin/sort/file.c | 18 +++++--- 2 files changed, 12 insertions(+), 115 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=489c9df19851aa052553a79de65e37ea1471014e commit 489c9df19851aa052553a79de65e37ea1471014e Author: Baptiste Daroussin <bapt@FreeBSD.org> AuthorDate: 2022-10-12 13:31:50 +0000 Commit: Baptiste Daroussin <bapt@FreeBSD.org> CommitDate: 2022-10-19 07:59:28 +0000 sort: replace home made line reader by getdelim(3) The previous code had bug when reading lines with an unexpected encoding, returning without the full line being captured. This result in sort complaining with "sort: Illegal byte sequence" Using getdelim(3) instead of the home made code, fixes the situation. PR: 241679 Reported by: Ronald F. Guilmette <rfg-freebsd@tristatelogic.com> MFC After: 1 week Reviewed by: markj, imp Differential Revision: https://reviews.freebsd.org/D36948 (cherry picked from commit b58094c0d98e5d0ab8abdcdb655ac902ae8ad66e) usr.bin/sort/bwstring.c | 109 ------------------------------------------------ usr.bin/sort/file.c | 18 +++++--- 2 files changed, 12 insertions(+), 115 deletions(-)