Bug 251767 - wcswidth() from libc reports wrong string lenght (no unicode 9 support)
Summary: wcswidth() from libc reports wrong string lenght (no unicode 9 support)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.2-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: Yuri Pankov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-11 22:40 UTC by dmilith
Modified: 2020-12-13 22:30 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dmilith 2020-12-11 22:40:45 UTC
Basically any emoji character from UTF-8 like "👌" is incorrectly reported as length "1" by wcswidth(const wchar_t *pwcs, size_t n);

Seems like libc in FreeBSD supports older version of Unicode standard than 9?
Comment 1 Yuri Pankov freebsd_committer 2020-12-12 00:18:45 UTC
It's not exactly about unicode support, rather about the character width data being very outdated.  I have (hopefully) fixed this in -CURRENT, see base r368390.  I'll check if it's possible to MFC shortly.  As a really quick fix, try replacing tools/tools/locale/etc/final-maps/widths.txt with the one from -CURRENT (https://svnweb.freebsd.org/base/head/tools/tools/locale/etc/final-maps/widths.txt?view=log) and rebuilding/installing ctype data in share/ctypedefs.
Comment 2 dmilith 2020-12-12 09:36:41 UTC
Would be awesome to MFC that to 12-stable :) Will try manual patching for now. Thanks!
Comment 3 dmilith 2020-12-12 14:23:40 UTC
Ok, it's not that easy.
I rebuilt my system from stable/12 with widths.txt from CURRENT as you suggested… but it didn't help. Here's my example C code that prints "length 1" for multibyte long char:

```C
#include <stdio.h>
#include <stddef.h>
#include <wchar.h>
#include <locale.h>

int main () {
  setlocale(LC_ALL, "en_US.UTF-8");
  const wchar_t* wc = L"👌";
  int length = wcswidth(wc, 1);
  printf("%ls, length: %d\n", wc, length);
}
```

outputs:

👌, length: 1
Comment 4 Yuri Pankov freebsd_committer 2020-12-12 14:29:40 UTC
Your code snippet shows 2 for me on -CURRENT, so looks like we need proper MFC to fix.
Comment 5 Yuri Pankov freebsd_committer 2020-12-12 18:33:05 UTC
(In reply to dmilith from comment #3)
I just installed a jail using 12.2-RELEASE base.txz, checked out stable/12 to usr/src, replaced widths.txt with the one from head:

$ svnlite status
M       tools/tools/locale/etc/final-maps/widths.txt

And after buildworld/installworld/reboot, I see the following:
$ ./w
👌, length: 2

That's what I expected when I suggested trying it (as it would confirm my expectations about what changes exactly need to be MFCed), wonder why it did not work for you.
Comment 6 dmilith 2020-12-12 19:23:26 UTC
Funny, I did just that… Just from stable/12 branch.
Comment 7 dmilith 2020-12-12 20:04:05 UTC
I did doublecheck on my src.conf and I have "WITH_LOCALES=1" there… so unsure what could go wrong there… Will investigate. Thanks for checking!
Comment 8 dmilith 2020-12-12 22:20:00 UTC
Sorry. I did second build with patched widths.txt (but on native system, no jails) and still have "length: 1". Have no clue why it works for you.
Comment 9 Yuri Pankov freebsd_committer 2020-12-12 23:52:05 UTC
(In reply to dmilith from comment #8)
Apparently Makefile in share/ctypedef does not properly record dependency on widths.txt; try cleaning up the built objects first, i.e.:

cd share/ctypedef && make clean && make && make install
Comment 10 dmilith 2020-12-13 11:45:10 UTC
That shouldn't matter in my case. I use ramdisks for building system. So there's something else.
Comment 11 dmilith 2020-12-13 11:54:35 UTC
Maybe ccache build issue?
Comment 12 Yuri Pankov freebsd_committer 2020-12-13 11:57:12 UTC
(In reply to dmilith from comment #11)
Unlikely, the objects are built using localedef, not compiler.  I wonder what are the full contents of your src.conf are?  And if share/ctypedef is built/installed at all?
Comment 13 dmilith 2020-12-13 13:55:15 UTC
My src.conf: https://gist.github.com/dmilith/6668e4ab62d55256cfeff9f14606c4c9
Unsure which option could disable that. Maybe GPL/GNU stuff?
Comment 14 Yuri Pankov freebsd_committer 2020-12-13 22:19:18 UTC
To sum it up, we found that the problem is not in the data, rather in having NO_CLEAN defined and absence of explicit dependency on widths.txt (this will be fixed separately).
Comment 15 commit-hook freebsd_committer 2020-12-13 22:27:54 UTC
A commit references this bug:

Author: yuripv
Date: Sun Dec 13 22:25:55 UTC 2020
New revision: 368619
URL: https://svnweb.freebsd.org/changeset/base/368619

Log:
  MFC r368390:

  update wcwidth data from utf8proc

  Character width data being out of date is a constant source
  of weird rendering issues and wasted time trying to diagnose
  those, e.g. as reported by Jeremy Chadwick:

  https://gitlab.com/muttmua/mutt/-/issues/67

  Sadly, there is no real ("standard") wcwidth data source, so
  this tries to rectify the problem using the utf8proc one (through
  its C API) which would hopefully benefeat both FreeBSD and
  utf8proc through bug reports (if any).

  PR:		251767

Changes:
_U  stable/12/
  stable/12/tools/tools/locale/Makefile
  stable/12/tools/tools/locale/README
  stable/12/tools/tools/locale/etc/final-maps/widths.txt
  stable/12/tools/tools/locale/tools/getwidths.c
  stable/12/tools/tools/locale/tools/mkwidths.pl
Comment 16 Yuri Pankov freebsd_committer 2020-12-13 22:30:11 UTC
I'm going to close this PR and take care of dependency problem as part of other work I'm planning.  Thanks for the help figuring out what was going on here!