Bug 258360 - race between setlocale() and iconv_open() causes segfault
Summary: race between setlocale() and iconv_open() causes segfault
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: threads (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks: 258212
  Show dependency treegraph
 
Reported: 2021-09-08 04:25 UTC by Henry Hu
Modified: 2021-09-17 19:20 UTC (History)
8 users (show)

See Also:


Attachments
patch (2.59 KB, patch)
2021-09-08 13:39 UTC, Mark Johnston
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Henry Hu 2021-09-08 04:25:07 UTC
This was discovered by Dmitry Wagin in PR 258212. I'm able to reproduce the crash with a very simple program, which simply calls iconv_open() and setlocale() concurrently:

#include <iconv.h>
#include <pthread.h>
#include <locale.h>

void* iconv_thread(void* arg) {
    iconv_open("UTF-8", "UTF-8");
    return NULL;
}

void* locale_thread(void* arg) {
    setlocale(LC_ALL, "en_US.UTF-8");
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, iconv_thread, NULL);
    pthread_create(&t2, NULL, locale_thread, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
}

On my machine it has a 7/10 chance of crashing. On another user's machine it only crashes 1/15, so repeated testing is needed.

The stack can be seen at https://pastebin.com/raw/isvMUDRd
Comment 1 Xin LI freebsd_committer 2021-09-08 05:07:00 UTC
Add a few committers who would have better insights.
Comment 2 Mark Johnston freebsd_committer 2021-09-08 13:37:32 UTC
(In reply to Xin LI from comment #1)
The basic problem is that localeconv_l() updates the global lconv lazily.  When setlocale() updates the process-global locale, it sets a couple of flags, monetary_locale_changed and numeric_locale_changed, to indicate that the lconv needs to be reloaded.  But setlocale() sets those flags before it's actually finished doing some swizzling.

Note that the problem doesn't directly involve iconv_open(), it just happens to be triggered because iconv calls snprintf() at some point.  I'm having a hard time understanding what guarantees we're supposed to provide here, POSIX doesn't seem to say much about thread-safety and setlocale(), but it seems pretty reasonable to expect localeconv_l() to behave correctly wrt a concurrent setlocale() call, though.
Comment 3 Mark Johnston freebsd_committer 2021-09-08 13:39:18 UTC
Created attachment 227762 [details]
patch

This fixes the segfaults for me.  I suspect there are other similar races lurking, but I don't know this code well enough to spot them.  This should probably also be using C11 atomics, for now I just used sys/atomic.h.
Comment 4 Oleh Vinichenko 2021-09-08 16:29:38 UTC
patch fixes telegram-desktop segfault, described in 258212
Comment 5 Oleh Vinichenko 2021-09-08 16:34:43 UTC
on 13.0 for matter of base system version
Comment 6 Henry Hu 2021-09-09 02:29:24 UTC
(In reply to Mark Johnston from comment #3)
Yes, this patch fixes the issue for me.
Comment 7 Dmitry Wagin 2021-09-09 15:55:42 UTC
JFY, I check three different CPU with non-patched sources:

 1. Intel(R) Core(TM) i7-9700T - the problem exists
 2. Intel(R) Core(TM) i7-10700T - the problem exists
 3. Intel(R) Xeon(R) CPU E5-2690 - telegram-desktop works fine
Comment 8 Mark Johnston freebsd_committer 2021-09-10 14:33:12 UTC
https://reviews.freebsd.org/D31899
Comment 9 commit-hook freebsd_committer 2021-09-17 15:01:45 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7eb138a9e53636366e615bdf04062fedc044bcea

commit 7eb138a9e53636366e615bdf04062fedc044bcea
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-09-17 14:44:23 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-09-17 14:47:46 +0000

    libc/locale: Fix races between localeconv(3) and setlocale(3)

    Each locale embeds a lazily initialized lconv which is populated by
    localeconv(3) and localeconv_l(3).  When setlocale(3) updates the global
    locale, the lconv needs to be (lazily) reinitialized.  To signal this,
    we set flag variables in the locale structure.  There are two problems:

    - The flags are set before the locale is fully updated, so a concurrent
      localeconv() call can observe partially initialized locale data.
    - No barriers ensure that localeconv() observes a fully initialized
      locale if a flag is set.

    So, move the flag update appropriately, and use acq/rel barriers to
    provide some synchronization.  Note that this is inadequate in the face
    of multiple concurrent calls to setlocale(3), but this is not expected
    to work regardless.

    Thanks to Henry Hu <henry.hu.sh@gmail.com> for providing a test case
    demonstrating the race.

    PR:             258360
    MFC after:      3 weeks
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D31899

 lib/libc/locale/lmonetary.c  | 4 ++--
 lib/libc/locale/lnumeric.c   | 4 ++--
 lib/libc/locale/localeconv.c | 8 ++++----
 3 files changed, 8 insertions(+), 8 deletions(-)
Comment 10 Gleb Smirnoff freebsd_committer 2021-09-17 18:24:13 UTC
I found out that setlocale() is not supposed to be thread-safe, so fixed the race on the side of telegram-desktop.

https://stackoverflow.com/questions/4057319/is-setlocale-thread-safe-function/35583206#35583206
Comment 11 Mark Johnston freebsd_committer 2021-09-17 18:39:30 UTC
(In reply to Gleb Smirnoff from comment #10)
Hmm, yes, I assumed that it should be safe to concurrently call snprintf() and set the global locale, but perhaps not...

Did your patch get applied?  If so it may be best to revert my commit.
Comment 12 Gleb Smirnoff freebsd_committer 2021-09-17 18:42:19 UTC
Yes, they accepted it.

Is your patch adding atomics only to setlocale() or to any other function that uses locale as well?
Comment 13 Mark Johnston freebsd_committer 2021-09-17 19:07:32 UTC
(In reply to Gleb Smirnoff from comment #12)
localeconv_l() now issues an acquire barrier, it is not just setlocale() that is modified.  This function is used internally within libc, e.g., in stdio functions.  There is no extra overhead on amd64, but in principle there is on arm64 and other weakly ordered arches.  I was not able to measure it on an arm64 test system though, and I suspect it is not consequential.
Comment 14 Gleb Smirnoff freebsd_committer 2021-09-17 19:20:13 UTC
You know better whether it needs to be reverted or not. localeconv_l() can be called implicitly by many functions, so if it is slowed down on arm64, that's unfortunate.

According to 258212 it was glib update that caused tdesktop to crash. My joint research with Telegram guys confirms that. We found out that glib has commented out pthread_join() in Gio object destructor. I believe as more Linux distros will pickup buggy glib, tdesktop will race there, too. There could be more applications affected. So, very likely this isn't going to be only our FreeBSD problem. Of course on glibc it might not crash, could even by luck avoid any undefined behavior.