Bug 265950 - POSIX 2008 locale failures when global locale not C
Summary: POSIX 2008 locale failures when global locale not C
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-19 21:33 UTC by Karl Williamson
Modified: 2023-11-13 00:08 UTC (History)
4 users (show)

See Also:


Attachments
A short C reproducer program (718 bytes, text/plain)
2022-08-19 21:33 UTC, Karl Williamson
no flags Details
Another reproducer C program (1.04 KB, text/plain)
2023-11-13 00:08 UTC, Karl Williamson
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Karl Williamson 2022-08-19 21:33:29 UTC
Created attachment 236019 [details]
A short C reproducer program

See attached program.

setlocale() is used to set up the global locale to C.UTF-8.  Then the per-thread calls are used with that as a basis.  newlocale(Latin1), then uselocale() of that new object yield an object which doesn't have Latin1 characteristics.

This bug does not happen if the global locale is C instead of C.UTF-8.
Comment 1 Yuri Pankov freebsd_committer freebsd_triage 2023-09-19 02:31:08 UTC
This one is interesting and is actually an issue with isupper() (and other is* functions) via __sbistype()->__sbmaskrune().

__sbmaskrune() looks like the following:

static __inline int
__sbmaskrune(__ct_rune_t _c, unsigned long _f)
{
        return (_c < 0 || _c >= __mb_sb_limit) ? 0 :
               _CurrentRuneLocale->__runetype[_c] & _f;
}

The culprit here is __mb_sb_limit which is NOT related to thread locale and rather specifies the limit of the global locale.

When global locale is set to the one with UTF-8 encoding (also true for EUC and other encodings), __mb_sb_limit goes down to 128 (from the initial 256, which is also the limit for ISO8859-1 and other single byte locales) so we are failing the 0xC0 test early.

This behavior seems to be introduced in base 367ed4e13d697ceb415183d8a7acddf5f707667c, long before the xlocale work was integrated so it wasn't really an issue back then.
Comment 2 Karl Williamson 2023-11-13 00:08:36 UTC
Created attachment 246266 [details]
Another reproducer C program

This has bit us again.  I came up with the attached reproducer.  It's so close to the original that I imagine it is the same bug.