Bug 211762

Summary: Some locale definitions have defects
Product: Base System Reporter: Karl Williamson <khw>
Component: confAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Many People CC: bapt, emaste, jkeenan, yuripv
Priority: ---    
Version: 10.3-STABLE   
Hardware: Any   
OS: Any   

Description Karl Williamson 2016-08-12 01:09:44 UTC
Some of the locale definitions that come with 10.3 have defects.  These include:
[:lower:] matches 0xDF but [:alpha:]) doesn't with locales

0xBD should be [:upper:] but isn't in 'lv_LV.ISO8859-13'  

In case you are not aware of it, you can pretty much get out of the business of supporting UTF-8 locale definitions by using the freely available POSIX ones supplied by Unicode.  The recent releases, you have to generate them yourself from the CLDR DB.  I can't seem to find the link to the tool that does it, just now.

Earlier versions had them pre-computed:
http://unicode.org/Public/cldr/2.0.1/  posix.zip
Comment 1 Baptiste Daroussin freebsd_committer 2016-10-22 20:10:04 UTC
This should be fixed in FreeBSD 11 where they have been completly reworked (unfortunately it won't be mergeable on 10) can you confirm?
Comment 2 Karl Williamson 2016-11-01 17:17:35 UTC
(In reply to Baptiste Daroussin from comment #1)

Yes, these are no longer showing as bugs.

However, the UTF-8 locales in this release show some collation weirdness for some controls.  If we actually find this is a probable bug, we will file a new bug report
Comment 3 Baptiste Daroussin freebsd_committer 2016-11-01 20:31:59 UTC
Yes there is a bug, I will request and errata for, don't know if that fixes the one you found, but I will merge my fixes in 11-STABLE next week and request en errata
Comment 4 Yuri Pankov freebsd_committer 2018-11-15 16:02:28 UTC
Looking at the UnicodeData.txt, 0xbd is "VULGAR FRACTION ONE HALF", are sure it should be "upper"?
Comment 5 Karl Williamson 2018-11-15 18:48:44 UTC
(In reply to Yuri Pankov from comment #4)

This bug is about Lithuanian locales where 0xBD is not the same as it is in Unicode
Comment 6 Yuri Pankov freebsd_committer 2018-11-15 20:12:19 UTC
(a bit of ugly code)

$ cat t.c
#include <err.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>

        char mbc[] = { 0xBD };
        wchar_t wc;

        if (setlocale(LC_ALL, "lv_LV.ISO8859-13") == NULL)
                err(1, "setlocale");
        if (mbtowc(&wc, mbc, 1) == -1)
                err(1, "mbtowc");
        printf("%#x\n", wc);

        return (0);
$ cc -o t t.c
$ ./t

So it looks like it *is* the same.  What character did you mean exactly?
Comment 7 Yuri Pankov freebsd_committer 2018-11-15 20:32:32 UTC
OK, sorry, I'm really starting to forget what single byte locales are, and that this code isn't applicable here.  But, luckily for me, ISO8859-13 charmap shows that 0xBD maps to 0x00BD wide character.