Bug 209617

Summary: Confusing behavior of newlocale() and its incomplete manual page caused wctomb() to fail
Product: Base System Reporter: jau
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed Not A Bug    
Severity: Affects Only Me CC: yuripv
Priority: ---    
Version: 10.3-RELEASE   
Hardware: Any   
OS: Any   

Description jau 2016-05-18 18:07:47 UTC
For any and all unicode code points larger than 0xff wctomb() returns -1
and sets errno to EILSEQ when LC_CTYPE is set to a UTF-8 locale.

Another symptom of something going wrong is that the proper UTF-8 encoding
for code point 0xff would be two bytes 11000011 10111111, but wctomb() just
passes it through as one byte 11111111 with no conversion.

I noticed this on FreeBSD 10.3-STABLE r299892 on amd64.
Comment 1 jau 2016-05-20 06:56:14 UTC
Though the problem was originally found on amd64 it seems
that the same thing happens also on ppc.
When wctomb() misbehaves the exact same manner on two hardware
types with different byte orders and different word sizes
I guess it is fair to say the problem is architecture independent.
Comment 2 jau 2016-05-20 16:48:14 UTC
Right, it was not wctomb() at all that was in fault.
The reason for the misbehavior was...

loc = newlocale (LC_CTYPE_MASK, "fi_FI.UTF-8", LC_GLOBAL_LOCALE);

Other xlocale manual pages mention LC_GLOBAL_LOCALE
pretty much as a generic handle to the process default
locale. The man page for newlocale() doesn't explain
the base locale at all. Neither LC_GLOBAL_LOCALE nor
NULL as the base are explained. It is simply left to
depend on the best guess of the programmer.

It seems that the following modified call works just
fine and also wctomb() after this change...

loc = newlocale (LC_CTYPE_MASK, "fi_FI.UTF-8", NULL);

First of all the meaning of NULL and LC_GLOBAL_LOCALE
should be documented.
Secondly I assume that NULL is taken as a reference to
the current thread locale, whatever it is, not as the
global locale, unless the current thread locale happens
to be the global locale. In such a case it would seem
reasonable to really accept LC_GLOBAL_LOCALE as exactly
what one would expect, the process' global locale
independent of what the current thread locale is.
The current setup is so confusing that I cannot be the
only one getting in trouble with this.
Comment 3 Yuri Pankov 2017-12-02 20:05:12 UTC
Given that you didn't provide the code snippet and results, I'll just guess that you seem to be misunderstanding what newlocale(3) is about -- it simply creates the locale object to be used by *_l versions of the functions, i.e. you should have used wctomb_l(3) with the locale object you created, or uselocale(3) to set current thread's locale.  The meaning of NULL and LC_GLOBAL_LOCALE is described in xlocale(3).
Comment 4 Yuri Pankov freebsd_committer freebsd_triage 2018-11-17 11:26:24 UTC
I'm going to close this as NULL and LC_GLOBAL_LOCALE are documented in xlocale(3), and uselocale(3) has references to it in SEE ALSO.  Other than that, everything is working as intended and documented.  Please reopen if you think there are other actionable items in this report that I missed.