Bug 251674

Summary: libc++: std::wcout does not use global locale set via setlocale()
Product: Base System Reporter: Yuri Victorovich <yuri>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Only Me CC: dim, emaste, yuripv
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
URL: https://bugs.llvm.org/show_bug.cgi?id=48444

Description Yuri Victorovich freebsd_committer 2020-12-07 22:45:27 UTC
> #include <iostream>
> #include <locale.h>
> int main() {
>         //setlocale(LC_ALL, "C.UTF-8");
>         setlocale(LC_ALL, "en_US.UTF-8");
>         std::wcout << L'>' << L'◯' << L'<' << std::endl;
> }

> $ c++ test-wchar.cpp && ./a.out
> >$ 

It doesn't print the "large circle" unicode character U+25EF and the next char.

Replacing the function body with:
> printf("◯\n");
prints the circle, so the problem is in std::wcout.

Console's locale is en_US.UTF-8:
> $ env | grep LC
> LC_ALL=en_US.UTF-8

12.2-STABLE r366720
Comment 1 Yuri Pankov freebsd_committer 2020-12-08 01:35:55 UTC
Reproduced this on current as well.  Note that it seems to be clang (libc++?) specific -- compiling with gcc9 from ports shows correct behavior.
Comment 2 Yuri Pankov freebsd_committer 2020-12-08 01:43:33 UTC
Now searching the web a bit for this got me the following which apparently "fixes" the issue:

        std::locale mylocale("");

As I know next to nothing about c++, wonder why the difference in requirements between clang (libc++?) and gcc (libstdc++?).
Comment 3 Yuri Victorovich freebsd_committer 2020-12-08 01:48:00 UTC
(In reply to Yuri Pankov from comment #2)

I don't think "std::wcout.imbue(mylocale);" should be required. It should be initialized with the currently chosen locale.
Comment 4 Yuri Victorovich freebsd_committer 2020-12-08 03:28:05 UTC
This works with clang-10:
> int main() {
>         std::locale mylocale("");
>         std::wcout.imbue(mylocale);
>         std::wcout << L'>' << L'◯' << L'<' << std::endl;
> }

but with gcc-9 and gcc-10 it fails:
> $ ./a.out 
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  locale::facet::_S_create_c_locale name not valid
> Abort trap
Comment 5 Yuri Pankov freebsd_committer 2020-12-08 04:14:50 UTC
(In reply to Yuri Victorovich from comment #4)
So it's full of wonders, for clang you need:


...and for gcc you need:

    setlocale(LC_ALL, "");

Reproduced the same with clang/libc++ 10/11 on Debian, so it does not seem to be FreeBSD specific.
Comment 6 Yuri Victorovich freebsd_committer 2020-12-08 05:09:49 UTC
With both clang and gcc this line
> std::cout << std::wcout.getloc().name() << std::endl;
shows the locale in std::wcout defaults to "C" when it should default to the current user's locale.

Without this std::wcout isn't usable from libraries because libraries have to use the default state of std::wcout and it does not correspond to user's locale without the top-level program setting it in std::wcout.
Comment 7 Yuri Pankov freebsd_committer 2020-12-08 06:46:59 UTC
(In reply to Yuri Victorovich from comment #6)
Everything (well, almost) defaults to C locale, including printf(), e.g. the following will fail without setlocale() call:

    printf("printf=%C\n", L'◯');

And it looks like the problem is that libc++'s wcout does NOT use the global locale set via that call, while libstdc++'s one does.  Whether it is a bug or deliberate choice, I have no idea.

Dimitry, any thoughts?
Comment 8 Dimitry Andric freebsd_committer 2020-12-08 10:27:54 UTC
See e.g.:

which says:

> From Josuttis, p. 697-698, which says, that "there is only *one*
> relation (of the C++ locale mechanism) to the C locale mechanism: the
> global C locale is modified if a named C++ locale object is set as
> the global locale" (emphasis Paolo), that is:
> std::locale::global(std::locale(""));
> affects the C functions as if the following call was made:
> std::setlocale(LC_ALL, "");
> On the other hand, there is *no* vice versa, that is, calling
> setlocale has *no* whatsoever on the C++ locale mechanism, in
> particular on the working of locale(""), which constructs the locale
> object from the environment of the running program, that is, in
> practice, the set of LC_ALL, LANG, etc. variable of the shell.

The above wording is also found in e.g. the C++11 standard, in

> static locale global(const locale& loc);
> 1. Sets the global locale to its argument.
> 2. Effects: Causes future calls to the constructor locale() to return
>    a copy of the argument. If the argument has a name, does
>      std::setlocale(LC_ALL, loc.name().c_str());
>    otherwise, the efect on the C locale, if any, is
>    implementation-defined. No library function other than
>    locale::global() shall afect the value returned by locale().
>    [Note: See 22.6 for data race considerations when setlocale is
>    invoked.]
> 3. Returns: The previous value of locale().
Comment 9 Yuri Pankov freebsd_committer 2020-12-08 13:46:05 UTC
(In reply to Dimitry Andric from comment #8)
So libstdc++'s wcout being affected by setlocale() call is just an implementation choice, the one that libc++ didn't make?
Comment 10 Yuri Victorovich freebsd_committer 2020-12-08 17:31:56 UTC
I asked a similar question in the libc++ bugtracker. Maybe they would have some insight about std::wcout's locale default.
Comment 11 Dimitry Andric freebsd_committer 2020-12-08 19:18:40 UTC
(In reply to Yuri Pankov from comment #9)
> So libstdc++'s wcout being affected by setlocale() call is just an
> implementation choice, the one that libc++ didn't make?

Apparently, although that documentation link from libstdc++ that I pasted doesn't really tell anything about it, except maybe the part:

> Locale initialization: at what point does _S_classic, _S_global get
> initialized? Can named locales assume this initialization has already taken
> place? 

but it seems this doc article is very old. Looking at libstdc++'s implementation, it appears they initialize a default locale() object here: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/src/c%2B%2B98/ios_locale.cc#l44

  // Called only by basic_ios<>::init.
  ios_base::_M_init() throw()
    // NB: May be called more than once
    _M_precision = 6;
    _M_width = 0;
    _M_flags = skipws | dec;
    _M_ios_locale = locale();

This default locale() object is constructed in https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/src/c%2B%2B98/locale_init.cc, but it seems like a separate copy of a C-like locale by default, e.g it has:

  locale::_S_initialize_once() throw()
    // 2 references.
    // One reference for _S_classic, one for _S_global
    _S_classic = new (&c_locale_impl) _Impl(2);
    _S_global = _S_classic;
    new (&c_locale) locale(_S_classic);

and the _Impl constructor is:

  // Construct "C" _Impl.
  _Impl(size_t __refs) throw()
  : _M_refcount(__refs), _M_facets(0), _M_facets_size(num_facets),
  _M_caches(0), _M_names(0)
    _M_facets = new (&facet_vec) const facet*[_M_facets_size]();
    _M_caches = new (&cache_vec) const facet*[_M_facets_size]();

    // Name the categories.
    _M_names = new (&name_vec) char*[_S_categories_size]();
    _M_names[0] = new (&name_c[0]) char[2];
    std::memcpy(_M_names[0], locale::facet::_S_get_c_name(), 2);

    // This is needed as presently the C++ version of "C" locales
    // != data in the underlying locale model for __timepunct,
    // numpunct, and moneypunct. Also, the "C" locales must be
    // constructed in a way such that they are pre-allocated.
    // NB: Set locale::facets(ref) count to one so that each individual
    // facet is not destroyed when the locale (and thus locale::_Impl) is
    // destroyed.
    _M_init_facet(new (&ctype_c) std::ctype<char>(0, false, 1));
    _M_init_facet(new (&codecvt_c) codecvt<char, char, mbstate_t>(1));
... much more of this ...

So I think what you're seeing with libstdc++ is intentional, in the sense that they have a default locale which is sort-of the same as the default C locale (or even C.UTF-8).

The only call to setlocale() in that .cc file is when you call std::locale::global(), as indicated in the docs.