Summary: | tr class and unicode collation | ||
---|---|---|---|
Product: | Base System | Reporter: | Emmanuel Vadot <manu> |
Component: | bin | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | New --- | ||
Severity: | Affects Only Me | CC: | lme, yuripv |
Priority: | --- | ||
Version: | CURRENT | ||
Hardware: | Any | ||
OS: | Any |
Description
Emmanuel Vadot
2017-06-10 03:15:34 UTC
A similar issue with grep and awk: > grep '^[A-Z]' foo foo Bar BAZ Same for egrep, grep -E and awk, using sed(1) works as expected. > grep -V grep (GNU grep) 2.5.1-FreeBSD > locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_ALL=en_US.UTF-8 With LANG=C all three tools work as expected. Looks like at least grep issue is not there (or already fixed) in bsdgrep: loki:yuri:~$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_ALL= loki:yuri:~$ grep '^[A-Z]' foo Bar BAZ loki:yuri:~$ grep --version grep (BSD grep) 2.6.0-FreeBSD WRT the tr issue -- that's what 'tr' on Debian GNU/Linux has to say about it: $ echo test | tr '[:alpha:]' '[:upper:]' tr: misaligned [:upper:] and/or [:lower:] construct And for the original problem, I don't think that any conversion other than [:lower:] <-> [:upper:] makes sense in tr(1) context. It is also noted in tr(1) itself: With the exception of case conversion, characters in the classes are in unspecified order. In other words, I don't think it's collation problem, rather just unspecified behavior, which doesn't need fixing. If there's a real world use case that you think is related to this, please provide examples. |