Basically any emoji character from UTF-8 like "👌" is incorrectly reported as length "1" by wcswidth(const wchar_t *pwcs, size_t n); Seems like libc in FreeBSD supports older version of Unicode standard than 9?
It's not exactly about unicode support, rather about the character width data being very outdated. I have (hopefully) fixed this in -CURRENT, see base r368390. I'll check if it's possible to MFC shortly. As a really quick fix, try replacing tools/tools/locale/etc/final-maps/widths.txt with the one from -CURRENT (https://svnweb.freebsd.org/base/head/tools/tools/locale/etc/final-maps/widths.txt?view=log) and rebuilding/installing ctype data in share/ctypedefs.
Would be awesome to MFC that to 12-stable :) Will try manual patching for now. Thanks!
Ok, it's not that easy. I rebuilt my system from stable/12 with widths.txt from CURRENT as you suggested… but it didn't help. Here's my example C code that prints "length 1" for multibyte long char: ```C #include <stdio.h> #include <stddef.h> #include <wchar.h> #include <locale.h> int main () { setlocale(LC_ALL, "en_US.UTF-8"); const wchar_t* wc = L"👌"; int length = wcswidth(wc, 1); printf("%ls, length: %d\n", wc, length); } ``` outputs: 👌, length: 1
Your code snippet shows 2 for me on -CURRENT, so looks like we need proper MFC to fix.
(In reply to dmilith from comment #3) I just installed a jail using 12.2-RELEASE base.txz, checked out stable/12 to usr/src, replaced widths.txt with the one from head: $ svnlite status M tools/tools/locale/etc/final-maps/widths.txt And after buildworld/installworld/reboot, I see the following: $ ./w 👌, length: 2 That's what I expected when I suggested trying it (as it would confirm my expectations about what changes exactly need to be MFCed), wonder why it did not work for you.
Funny, I did just that… Just from stable/12 branch.
I did doublecheck on my src.conf and I have "WITH_LOCALES=1" there… so unsure what could go wrong there… Will investigate. Thanks for checking!
Sorry. I did second build with patched widths.txt (but on native system, no jails) and still have "length: 1". Have no clue why it works for you.
(In reply to dmilith from comment #8) Apparently Makefile in share/ctypedef does not properly record dependency on widths.txt; try cleaning up the built objects first, i.e.: cd share/ctypedef && make clean && make && make install
That shouldn't matter in my case. I use ramdisks for building system. So there's something else.
Maybe ccache build issue?
(In reply to dmilith from comment #11) Unlikely, the objects are built using localedef, not compiler. I wonder what are the full contents of your src.conf are? And if share/ctypedef is built/installed at all?
My src.conf: https://gist.github.com/dmilith/6668e4ab62d55256cfeff9f14606c4c9 Unsure which option could disable that. Maybe GPL/GNU stuff?
To sum it up, we found that the problem is not in the data, rather in having NO_CLEAN defined and absence of explicit dependency on widths.txt (this will be fixed separately).
A commit references this bug: Author: yuripv Date: Sun Dec 13 22:25:55 UTC 2020 New revision: 368619 URL: https://svnweb.freebsd.org/changeset/base/368619 Log: MFC r368390: update wcwidth data from utf8proc Character width data being out of date is a constant source of weird rendering issues and wasted time trying to diagnose those, e.g. as reported by Jeremy Chadwick: https://gitlab.com/muttmua/mutt/-/issues/67 Sadly, there is no real ("standard") wcwidth data source, so this tries to rectify the problem using the utf8proc one (through its C API) which would hopefully benefeat both FreeBSD and utf8proc through bug reports (if any). PR: 251767 Changes: _U stable/12/ stable/12/tools/tools/locale/Makefile stable/12/tools/tools/locale/README stable/12/tools/tools/locale/etc/final-maps/widths.txt stable/12/tools/tools/locale/tools/getwidths.c stable/12/tools/tools/locale/tools/mkwidths.pl
I'm going to close this PR and take care of dependency problem as part of other work I'm planning. Thanks for the help figuring out what was going on here!