Bug 258360

Summary: race between setlocale() and iconv_open() causes segfault
Product: Base System Reporter: Henry Hu <henry.hu.sh>
Component: threadsAssignee: Mark Johnston <markj>
Status: Open ---    
Severity: Affects Some People CC: delphij, dmitry.wagin, glebius, lwhsu, markj, moonlapse81, theraven, tmunro
Priority: ---    
Version: 13.0-STABLE   
Hardware: Any   
OS: Any   
Bug Depends on:    
Bug Blocks: 258212    
Attachments:
Description Flags
patch none

Description Henry Hu 2021-09-08 04:25:07 UTC
This was discovered by Dmitry Wagin in PR 258212. I'm able to reproduce the crash with a very simple program, which simply calls iconv_open() and setlocale() concurrently:

#include <iconv.h>
#include <pthread.h>
#include <locale.h>

void* iconv_thread(void* arg) {
    iconv_open("UTF-8", "UTF-8");
    return NULL;
}

void* locale_thread(void* arg) {
    setlocale(LC_ALL, "en_US.UTF-8");
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_create(&t1, NULL, iconv_thread, NULL);
    pthread_create(&t2, NULL, locale_thread, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
}

On my machine it has a 7/10 chance of crashing. On another user's machine it only crashes 1/15, so repeated testing is needed.

The stack can be seen at https://pastebin.com/raw/isvMUDRd
Comment 1 Xin LI freebsd_committer 2021-09-08 05:07:00 UTC
Add a few committers who would have better insights.
Comment 2 Mark Johnston freebsd_committer 2021-09-08 13:37:32 UTC
(In reply to Xin LI from comment #1)
The basic problem is that localeconv_l() updates the global lconv lazily.  When setlocale() updates the process-global locale, it sets a couple of flags, monetary_locale_changed and numeric_locale_changed, to indicate that the lconv needs to be reloaded.  But setlocale() sets those flags before it's actually finished doing some swizzling.

Note that the problem doesn't directly involve iconv_open(), it just happens to be triggered because iconv calls snprintf() at some point.  I'm having a hard time understanding what guarantees we're supposed to provide here, POSIX doesn't seem to say much about thread-safety and setlocale(), but it seems pretty reasonable to expect localeconv_l() to behave correctly wrt a concurrent setlocale() call, though.
Comment 3 Mark Johnston freebsd_committer 2021-09-08 13:39:18 UTC
Created attachment 227762 [details]
patch

This fixes the segfaults for me.  I suspect there are other similar races lurking, but I don't know this code well enough to spot them.  This should probably also be using C11 atomics, for now I just used sys/atomic.h.
Comment 4 Oleh Vinichenko 2021-09-08 16:29:38 UTC
patch fixes telegram-desktop segfault, described in 258212
Comment 5 Oleh Vinichenko 2021-09-08 16:34:43 UTC
on 13.0 for matter of base system version
Comment 6 Henry Hu 2021-09-09 02:29:24 UTC
(In reply to Mark Johnston from comment #3)
Yes, this patch fixes the issue for me.
Comment 7 Dmitry Wagin 2021-09-09 15:55:42 UTC
JFY, I check three different CPU with non-patched sources:

 1. Intel(R) Core(TM) i7-9700T - the problem exists
 2. Intel(R) Core(TM) i7-10700T - the problem exists
 3. Intel(R) Xeon(R) CPU E5-2690 - telegram-desktop works fine
Comment 8 Mark Johnston freebsd_committer 2021-09-10 14:33:12 UTC
https://reviews.freebsd.org/D31899
Comment 9 commit-hook freebsd_committer 2021-09-17 15:01:45 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7eb138a9e53636366e615bdf04062fedc044bcea

commit 7eb138a9e53636366e615bdf04062fedc044bcea
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-09-17 14:44:23 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-09-17 14:47:46 +0000

    libc/locale: Fix races between localeconv(3) and setlocale(3)

    Each locale embeds a lazily initialized lconv which is populated by
    localeconv(3) and localeconv_l(3).  When setlocale(3) updates the global
    locale, the lconv needs to be (lazily) reinitialized.  To signal this,
    we set flag variables in the locale structure.  There are two problems:

    - The flags are set before the locale is fully updated, so a concurrent
      localeconv() call can observe partially initialized locale data.
    - No barriers ensure that localeconv() observes a fully initialized
      locale if a flag is set.

    So, move the flag update appropriately, and use acq/rel barriers to
    provide some synchronization.  Note that this is inadequate in the face
    of multiple concurrent calls to setlocale(3), but this is not expected
    to work regardless.

    Thanks to Henry Hu <henry.hu.sh@gmail.com> for providing a test case
    demonstrating the race.

    PR:             258360
    MFC after:      3 weeks
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D31899

 lib/libc/locale/lmonetary.c  | 4 ++--
 lib/libc/locale/lnumeric.c   | 4 ++--
 lib/libc/locale/localeconv.c | 8 ++++----
 3 files changed, 8 insertions(+), 8 deletions(-)
Comment 10 Gleb Smirnoff freebsd_committer 2021-09-17 18:24:13 UTC
I found out that setlocale() is not supposed to be thread-safe, so fixed the race on the side of telegram-desktop.

https://stackoverflow.com/questions/4057319/is-setlocale-thread-safe-function/35583206#35583206
Comment 11 Mark Johnston freebsd_committer 2021-09-17 18:39:30 UTC
(In reply to Gleb Smirnoff from comment #10)
Hmm, yes, I assumed that it should be safe to concurrently call snprintf() and set the global locale, but perhaps not...

Did your patch get applied?  If so it may be best to revert my commit.
Comment 12 Gleb Smirnoff freebsd_committer 2021-09-17 18:42:19 UTC
Yes, they accepted it.

Is your patch adding atomics only to setlocale() or to any other function that uses locale as well?
Comment 13 Mark Johnston freebsd_committer 2021-09-17 19:07:32 UTC
(In reply to Gleb Smirnoff from comment #12)
localeconv_l() now issues an acquire barrier, it is not just setlocale() that is modified.  This function is used internally within libc, e.g., in stdio functions.  There is no extra overhead on amd64, but in principle there is on arm64 and other weakly ordered arches.  I was not able to measure it on an arm64 test system though, and I suspect it is not consequential.
Comment 14 Gleb Smirnoff freebsd_committer 2021-09-17 19:20:13 UTC
You know better whether it needs to be reverted or not. localeconv_l() can be called implicitly by many functions, so if it is slowed down on arm64, that's unfortunate.

According to 258212 it was glib update that caused tdesktop to crash. My joint research with Telegram guys confirms that. We found out that glib has commented out pthread_join() in Gio object destructor. I believe as more Linux distros will pickup buggy glib, tdesktop will race there, too. There could be more applications affected. So, very likely this isn't going to be only our FreeBSD problem. Of course on glibc it might not crash, could even by luck avoid any undefined behavior.
Comment 15 commit-hook freebsd_committer 2021-10-20 00:54:40 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f89204d6b99d11aa1f67722e8c1d33b0fc4d61d7

commit f89204d6b99d11aa1f67722e8c1d33b0fc4d61d7
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-09-17 14:44:23 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-10-20 00:53:33 +0000

    libc/locale: Fix races between localeconv(3) and setlocale(3)

    Each locale embeds a lazily initialized lconv which is populated by
    localeconv(3) and localeconv_l(3).  When setlocale(3) updates the global
    locale, the lconv needs to be (lazily) reinitialized.  To signal this,
    we set flag variables in the locale structure.  There are two problems:

    - The flags are set before the locale is fully updated, so a concurrent
      localeconv() call can observe partially initialized locale data.
    - No barriers ensure that localeconv() observes a fully initialized
      locale if a flag is set.

    So, move the flag update appropriately, and use acq/rel barriers to
    provide some synchronization.  Note that this is inadequate in the face
    of multiple concurrent calls to setlocale(3), but this is not expected
    to work regardless.

    Thanks to Henry Hu <henry.hu.sh@gmail.com> for providing a test case
    demonstrating the race.

    PR:             258360
    Sponsored by:   The FreeBSD Foundation

    (cherry picked from commit 7eb138a9e53636366e615bdf04062fedc044bcea)

 lib/libc/locale/lmonetary.c  | 4 ++--
 lib/libc/locale/lnumeric.c   | 4 ++--
 lib/libc/locale/localeconv.c | 8 ++++----
 3 files changed, 8 insertions(+), 8 deletions(-)
Comment 16 commit-hook freebsd_committer 2021-10-20 01:14:44 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=589daa05be6f390bed6fdeb820c47f8db114c5ec

commit 589daa05be6f390bed6fdeb820c47f8db114c5ec
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2021-10-20 01:13:12 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-10-20 01:13:12 +0000

    Revert "libc/locale: Fix races between localeconv(3) and setlocale(3)"

    This reverts commit f89204d6b99d11aa1f67722e8c1d33b0fc4d61d7.

    I didn't intend to push this commit yet, pending discussion on PR
    258360.

    PR:     258360

 lib/libc/locale/lmonetary.c  | 4 ++--
 lib/libc/locale/lnumeric.c   | 4 ++--
 lib/libc/locale/localeconv.c | 8 ++++----
 3 files changed, 8 insertions(+), 8 deletions(-)
Comment 17 Mark Johnston freebsd_committer 2021-10-20 01:18:34 UTC
(In reply to Gleb Smirnoff from comment #14)
Sorry for the delayed followup.  As I mentioned I believe the overhead is quite minimal and checked it with a test program that calls localeconv() repeatedly on an espressobin and a graviton EC2 instance.  I had also looked at glibc and noted that it is not susceptible to segfaulting in the face of this kind of race.  So I don't see any real reason not to try and provide sensible behaviour there.  If you still prefer a revert, I am happy to do it, I don't feel very strongly either way so long as the telegram bug is gone one way or another.