Bug 235100 - Setting LANG=zh_TW.Big5 expends `/` to "-\---/"
Summary: Setting LANG=zh_TW.Big5 expends `/` to "-\---/"
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Yuri Pankov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-21 08:32 UTC by Li-Wen Hsu
Modified: 2019-10-06 08:02 UTC (History)
5 users (show)

See Also:
lwhsu: mfc-stable11?
lwhsu: mfc-stable12?


Attachments
Patch for map.Big5 (4.73 KB, patch)
2019-09-25 16:31 UTC, Ting-Wei Lan
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Li-Wen Hsu freebsd_committer 2019-01-21 08:32:03 UTC
$ env | grep LC_
$ env | grep LANG
$ env LANG=zh_TW.Big5 ps axuwww
root       1251   0.0  0.0   10988    2576 v0  Is+  14:36     0:00.00 -\---/usr-\---/libexec-\---/getty Pc ttyv0
(...)
$ env LANG=zh_TW.Big5 LC_CTYPE=C ps axuwww
root       1251   0.0  0.0   10988    2576 v0  Is+  14:36     0:00.00 /usr/libexec/getty Pc ttyv0
$ env LANG=zh_TW.UTF-8 ps axuwww
root       1251   0.0  0.0   10988    2576 v0  Is+  14:36     0:00.00 /usr/libexec/getty Pc ttyv0
(...)
$ env LANG=C ps axuwww | grep getty
root       1251   0.0  0.0   10988    2576 v0  Is+  14:36     0:00.00 /usr/libexec/getty Pc ttyv0
(...)
$ env LANG=en_US.UTF-8 ps axuwww | grep getty
root       1251   0.0  0.0   10988    2576 v0  Is+  14:36     0:00.00 /usr/libexec/getty Pc ttyv0
(...)
$ env LANG=en_US.ISO8859-1 ps axuwww | grep getty 
root       1251   0.0  0.0   10988    2576 v0  Is+  14:36     0:00.00 /usr/libexec/getty Pc ttyv0
(...)
Comment 1 Conrad Meyer freebsd_committer 2019-01-21 16:32:53 UTC
I suspect this is libxo-related.
Comment 2 Conrad Meyer freebsd_committer 2019-01-21 16:37:17 UTC
(ISO, UTF-8, and ASCII all share the same single-byte encoding of '/'; ps uses setlocale(); libxo assumes all input is UTF-8.  When a non-utf8 encoding is used, ps just passes through those strings to libxo, which probably attempts to encode them again as Big5 or something like that.)
Comment 3 Chen-Yu Tsai 2019-01-21 17:04:18 UTC
Big5 is 7-bit ASCII compatible, so there should be no reason to encode '/' as anything other than just '/'.
Comment 4 Conrad Meyer freebsd_committer 2019-01-21 17:06:27 UTC
(In reply to Chen-Yu Tsai from comment #3)
If you run 'LANG=zh_TW.Big5 ls / | hd', is '/' encoded as just the 7-bit ASCII '/'?
Comment 5 Conrad Meyer freebsd_committer 2019-01-21 17:07:33 UTC
Hm, 'ls' seems to encode as the usual 0x2f.  I still think this is something xo-related :-).
Comment 6 Conrad Meyer freebsd_committer 2019-01-21 17:09:44 UTC
One other detail: '-' also seems to get butchered, becoming '-\----'.
Comment 7 Conrad Meyer freebsd_committer 2019-01-21 17:11:35 UTC
',' also gets prefixed with the same string ('-\---').  It suggests to me some kind of escape sequence that is then getting converted at least one more time.
Comment 8 Ting-Wei Lan 2019-09-25 15:45:00 UTC
I just found that the file /usr/src/tools/tools/locale/etc/final-maps/map.Big5 does not include definitions for ASCII characters. If I copy the charmap of map.US-ASCII and paste it into map.Big5, everything seems to be fixed.
Comment 9 Li-Wen Hsu freebsd_committer 2019-09-25 16:09:01 UTC
(In reply to Ting-Wei Lan from comment #8)
Ting-Wei: could you provide a patch for that?
Comment 10 Ting-Wei Lan 2019-09-25 16:13:47 UTC
(In reply to Li-Wen Hsu from comment #9)
I haven't found the way to properly fix it. map.Big5 looks like a generated file.
Comment 11 Yuri Pankov freebsd_committer 2019-09-25 16:23:56 UTC
The latest mapping data (ftp://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT) has no real changes since 1994 version that we currently have, more so, it's marked as obsolete.  Let me check if the one compiled from CLDR is any better.
Comment 12 Ting-Wei Lan 2019-09-25 16:31:36 UTC
Created attachment 207808 [details]
Patch for map.Big5

This is unlikely to be the correct patch, but at least it fixed the Big5 locale problem for me.
Comment 13 Yuri Pankov freebsd_committer 2019-09-25 16:58:08 UTC
Could you please try https://people.freebsd.org/~yuripv/Big5.cm as map.Big5?  This one is compiled from current CLDR data we use, and might be better than patching obsolete one.
Comment 14 Ting-Wei Lan 2019-09-25 17:39:48 UTC
(In reply to Yuri Pankov from comment #13)
Yes, the Big5.cm file is the same as my patched map.Big5 except for the order, comments and white spaces. The generated LC_CTYPE file is also the same.
Comment 15 Yuri Pankov freebsd_committer 2019-09-25 17:48:15 UTC
(In reply to Ting-Wei Lan from comment #14)
Thank you, let me take care of it.
Comment 16 commit-hook freebsd_committer 2019-10-05 21:29:29 UTC
A commit references this bug:

Author: yuripv
Date: Sat Oct  5 21:28:46 UTC 2019
New revision: 353127
URL: https://svnweb.freebsd.org/changeset/base/353127

Log:
  Pre-generate Big5 charmap from CLDR data.

  The one used previously was missing the characters in 0-127 range,
  making various tools try to escape them in output.

  PR:		235100
  Reviewed by:	bapt
  Tested by:	Ting-Wei Lan <lantw44@gmail.com>
  Differential Revision:	https://reviews.freebsd.org/D21794

Changes:
  head/tools/tools/locale/etc/final-maps/map.Big5
  head/tools/tools/locale/tools/finalize
Comment 17 Li-Wen Hsu freebsd_committer 2019-10-06 08:02:06 UTC
Can we merge this into stable/{11,12} and even releng/12.1?