Summary: | iswprint() wrong for some FULL WIDTH characters in UTF-8 locale | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Daniel Ponte <amigan> | ||||
Component: | bin | Assignee: | Yuri Pankov <yuripv> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Many People | CC: | cem, yuripv | ||||
Priority: | --- | ||||||
Version: | CURRENT | ||||||
Hardware: | Any | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Daniel Ponte
2019-09-29 03:26:30 UTC
*** This bug has been marked as a duplicate of bug 225692 *** Not a dupe per comment #30 in the other bug. In the future, please investigate on the new bug first to confirm duplicate before marking so. > FWIW, I tried the reproducer attached to this issue, and all of the characters > that were originally reported as unprintable are still OK (i.e., printable), > running 13.0-CURRENT r352495. Copying request for information from other bug: - Could you please provide the UTF-8 or wide character codes for the ones that are not rendered correctly? - Running CURRENT? Can you provide the working and non-working revisions? I apologize for the confusion and lack of detail. This is specifically for Powerline characters. They are not rendering correctly when the mosh server is FreeBSD. The work in #225692 most certainly appeared to correct this, previously.I have modified the test program slightly to demonstrate these characters, and also to emit the tested character. % uname -v ; ./wcw FreeBSD 13.0-CURRENT #0 r352860: Sat Sep 28 21:19:24 EDT 2019 root@argon.h.c907:/usr/obj/usr/src/amd64.amd64/sys/GENERIC alnum:0x400100, cntrl:0x200, ideogram:0x80000, print:0x40000, space:0x4000, xdigit:0x10000, alpha:0x100, digit:0x400, lower:0x1000, punct:0x2000, special:0x100000, blank:0x20000, graph:0x800, phonogram:0x200000, rune:0xffffff00, upper:0x8000, Default Locale is: C Character d 0x64 is in classes: alnum print xdigit alpha lower graph rune in C locale, iswprint(0x64) = 1 in en_US.UTF-8 locale, iswprint(0x64) = 1 in ja_JP.UTF-8 locale, iswprint(0x64) = 1 Character 0xe0b1 is in classes: cntrl rune in C locale, iswprint(0xe0b1) = 0 in en_US.UTF-8 locale, iswprint(0xe0b1) = 0 in ja_JP.UTF-8 locale, iswprint(0xe0b1) = 0 Character 0xe0b2 is in classes: cntrl rune in C locale, iswprint(0xe0b2) = 0 in en_US.UTF-8 locale, iswprint(0xe0b2) = 0 in ja_JP.UTF-8 locale, iswprint(0xe0b2) = 0 Character 0xe0b3 is in classes: cntrl rune in C locale, iswprint(0xe0b3) = 0 in en_US.UTF-8 locale, iswprint(0xe0b3) = 0 in ja_JP.UTF-8 locale, iswprint(0xe0b3) = 0 versus % uname -a ; ./wcw FreeBSD dtvax.dynatron.me 12.0-BETA4 FreeBSD 12.0-BETA4 r340285 GENERIC amd64 alnum:0x400100, cntrl:0x200, ideogram:0x80000, print:0x40000, space:0x4000, xdigit:0x10000, alpha:0x100, digit:0x400, lower:0x1000, punct:0x2000, special:0x100000, blank:0x20000, graph:0x800, phonogram:0x200000, rune:0xffffff00, upper:0x8000, Default Locale is: C Character d 0x64 is in classes: alnum print xdigit alpha lower graph rune in C locale, iswprint(0x64) = 1 in en_US.UTF-8 locale, iswprint(0x64) = 1 in ja_JP.UTF-8 locale, iswprint(0x64) = 1 Character 0xe0b1 is in classes: print graph rune in C locale, iswprint(0xe0b1) = 0 in en_US.UTF-8 locale, iswprint(0xe0b1) = 1 in ja_JP.UTF-8 locale, iswprint(0xe0b1) = 1 Character 0xe0b2 is in classes: print graph rune in C locale, iswprint(0xe0b2) = 0 in en_US.UTF-8 locale, iswprint(0xe0b2) = 1 in ja_JP.UTF-8 locale, iswprint(0xe0b2) = 1 Character 0xe0b3 is in classes: print graph rune in C locale, iswprint(0xe0b3) = 0 in en_US.UTF-8 locale, iswprint(0xe0b3) = 1 in ja_JP.UTF-8 locale, iswprint(0xe0b3) = 1 Character 0xe0b0 is in classes: print graph rune in C locale, iswprint(0xe0b0) = 0 in en_US.UTF-8 locale, iswprint(0xe0b0) = 1 in ja_JP.UTF-8 locale, iswprint(0xe0b0) = 1 Created attachment 207943 [details]
Modified reproducer
Thanks for the update. It's not the CLDR34/Unicode11 update itself, and rather a followup in base r340491. As the commit message says, there's no direct mappings between UnicodeData.txt and POSIX character classes, so I used my best judgement there :-) The characters you are after fall in the following range (UnicodeData.txt): E000;<Private Use, First>;Co;0;L;;;;;N;;;;; F8FF;<Private Use, Last>;Co;0;L;;;;;N;;;;; "Co" there means "Other, Private Use". I *think* we could mark all those characters as printable, it won't hurt anything. I agree. I will admit that Unicode is one of those dungeon-dwellers-only-but-fundamental things I don't fully understand :) , but I do know that these Private Use Area characters are widely used as printables, and this does seem to be a regression. Could you please try replacing src/share/ctypedef/C.UTF-8.src with https://people.freebsd.org/~yuripv/C.UTF-8.src and rebuilding? (In reply to Yuri Pankov from comment #8) This resolves the issue on both CURRENT and 12.1-STABLE. Thank you. A commit references this bug: Author: yuripv Date: Sat Oct 5 22:17:55 UTC 2019 New revision: 353130 URL: https://svnweb.freebsd.org/changeset/base/353130 Log: Mark "private use area" characters as printable. At least some of the characters in E000-F8FF range are used by Powerline fonts, and having no attributes for these ranges in UnicodeData.txt other than "Other, Private Use" it should be safe to mark all of them as printable. Some actually were before r340491, so this fixes the regression introduced there as well. PR: 240911 Reviewed by: bapt Tested by: Daniel Ponte <amigan@gmail.com> Differential Revision: https://reviews.freebsd.org/D21850 Changes: head/share/ctypedef/C.UTF-8.src head/tools/tools/locale/tools/utf8-rollup.pl A commit references this bug: Author: yuripv Date: Wed Dec 2 22:44:41 UTC 2020 New revision: 368288 URL: https://svnweb.freebsd.org/changeset/base/368288 Log: MFC r353130: Mark "private use area" characters as printable. At least some of the characters in E000-F8FF range are used by Powerline fonts, and having no attributes for these ranges in UnicodeData.txt other than "Other, Private Use" it should be safe to mark all of them as printable. Some actually were before r340491, so this fixes the regression introduced there as well. PR: 240911 Reviewed by: bapt Tested by: Daniel Ponte <amigan@gmail.com> Differential Revision: https://reviews.freebsd.org/D21850 Changes: _U stable/12/ stable/12/share/ctypedef/C.UTF-8.src stable/12/tools/tools/locale/tools/utf8-rollup.pl Better late than never, sorry for the delay, completely forgot about MFC. |