$ env | grep LC_ $ env | grep LANG $ env LANG=zh_TW.Big5 ps axuwww root 1251 0.0 0.0 10988 2576 v0 Is+ 14:36 0:00.00 -\---/usr-\---/libexec-\---/getty Pc ttyv0 (...) $ env LANG=zh_TW.Big5 LC_CTYPE=C ps axuwww root 1251 0.0 0.0 10988 2576 v0 Is+ 14:36 0:00.00 /usr/libexec/getty Pc ttyv0 $ env LANG=zh_TW.UTF-8 ps axuwww root 1251 0.0 0.0 10988 2576 v0 Is+ 14:36 0:00.00 /usr/libexec/getty Pc ttyv0 (...) $ env LANG=C ps axuwww | grep getty root 1251 0.0 0.0 10988 2576 v0 Is+ 14:36 0:00.00 /usr/libexec/getty Pc ttyv0 (...) $ env LANG=en_US.UTF-8 ps axuwww | grep getty root 1251 0.0 0.0 10988 2576 v0 Is+ 14:36 0:00.00 /usr/libexec/getty Pc ttyv0 (...) $ env LANG=en_US.ISO8859-1 ps axuwww | grep getty root 1251 0.0 0.0 10988 2576 v0 Is+ 14:36 0:00.00 /usr/libexec/getty Pc ttyv0 (...)
I suspect this is libxo-related.
(ISO, UTF-8, and ASCII all share the same single-byte encoding of '/'; ps uses setlocale(); libxo assumes all input is UTF-8. When a non-utf8 encoding is used, ps just passes through those strings to libxo, which probably attempts to encode them again as Big5 or something like that.)
Big5 is 7-bit ASCII compatible, so there should be no reason to encode '/' as anything other than just '/'.
(In reply to Chen-Yu Tsai from comment #3) If you run 'LANG=zh_TW.Big5 ls / | hd', is '/' encoded as just the 7-bit ASCII '/'?
Hm, 'ls' seems to encode as the usual 0x2f. I still think this is something xo-related :-).
One other detail: '-' also seems to get butchered, becoming '-\----'.
',' also gets prefixed with the same string ('-\---'). It suggests to me some kind of escape sequence that is then getting converted at least one more time.
I just found that the file /usr/src/tools/tools/locale/etc/final-maps/map.Big5 does not include definitions for ASCII characters. If I copy the charmap of map.US-ASCII and paste it into map.Big5, everything seems to be fixed.
(In reply to Ting-Wei Lan from comment #8) Ting-Wei: could you provide a patch for that?
(In reply to Li-Wen Hsu from comment #9) I haven't found the way to properly fix it. map.Big5 looks like a generated file.
The latest mapping data (ftp://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT) has no real changes since 1994 version that we currently have, more so, it's marked as obsolete. Let me check if the one compiled from CLDR is any better.
Created attachment 207808 [details] Patch for map.Big5 This is unlikely to be the correct patch, but at least it fixed the Big5 locale problem for me.
Could you please try https://people.freebsd.org/~yuripv/Big5.cm as map.Big5? This one is compiled from current CLDR data we use, and might be better than patching obsolete one.
(In reply to Yuri Pankov from comment #13) Yes, the Big5.cm file is the same as my patched map.Big5 except for the order, comments and white spaces. The generated LC_CTYPE file is also the same.
(In reply to Ting-Wei Lan from comment #14) Thank you, let me take care of it.
A commit references this bug: Author: yuripv Date: Sat Oct 5 21:28:46 UTC 2019 New revision: 353127 URL: https://svnweb.freebsd.org/changeset/base/353127 Log: Pre-generate Big5 charmap from CLDR data. The one used previously was missing the characters in 0-127 range, making various tools try to escape them in output. PR: 235100 Reviewed by: bapt Tested by: Ting-Wei Lan <lantw44@gmail.com> Differential Revision: https://reviews.freebsd.org/D21794 Changes: head/tools/tools/locale/etc/final-maps/map.Big5 head/tools/tools/locale/tools/finalize
Can we merge this into stable/{11,12} and even releng/12.1?
A commit references this bug: Author: yuripv Date: Mon Jun 15 15:59:44 UTC 2020 New revision: 362200 URL: https://svnweb.freebsd.org/changeset/base/362200 Log: MFC r353127: Pre-generate Big5 charmap from CLDR data. The one used previously was missing the characters in 0-127 range, making various tools try to escape them in output. PR: 235100 Reviewed by: bapt Tested by: Ting-Wei Lan <lantw44@gmail.com> Differential Revision: https://reviews.freebsd.org/D21794 Changes: _U stable/12/ stable/12/tools/tools/locale/etc/final-maps/map.Big5 stable/12/tools/tools/locale/tools/finalize
(In reply to Li-Wen Hsu from comment #17) I'm really sorry for the delay here, does it still make sense to merge this to stable/11?
(In reply to Yuri Pankov from comment #19) Although I no longer use FreeBSD 11 except for CI environments, I still think it should be merged to all supported releases. The issue has caused a lot of csh crashes for me.
A commit references this bug: Author: lwhsu Date: Mon Nov 2 01:05:42 UTC 2020 New revision: 367262 URL: https://svnweb.freebsd.org/changeset/base/367262 Log: MFC r353127: Pre-generate Big5 charmap from CLDR data. The one used previously was missing the characters in 0-127 range, making various tools try to escape them in output. PR: 235100 Reviewed by: bapt Tested by: Ting-Wei Lan <lantw44@gmail.com> Differential Revision: https://reviews.freebsd.org/D21794 Changes: _U stable/11/ stable/11/tools/tools/locale/etc/final-maps/map.Big5 stable/11/tools/tools/locale/tools/finalize