This was discovered by Dmitry Wagin in PR 258212. I'm able to reproduce the crash with a very simple program, which simply calls iconv_open() and setlocale() concurrently: #include <iconv.h> #include <pthread.h> #include <locale.h> void* iconv_thread(void* arg) { iconv_open("UTF-8", "UTF-8"); return NULL; } void* locale_thread(void* arg) { setlocale(LC_ALL, "en_US.UTF-8"); return NULL; } int main() { pthread_t t1, t2; pthread_create(&t1, NULL, iconv_thread, NULL); pthread_create(&t2, NULL, locale_thread, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); } On my machine it has a 7/10 chance of crashing. On another user's machine it only crashes 1/15, so repeated testing is needed. The stack can be seen at https://pastebin.com/raw/isvMUDRd
Add a few committers who would have better insights.
(In reply to Xin LI from comment #1) The basic problem is that localeconv_l() updates the global lconv lazily. When setlocale() updates the process-global locale, it sets a couple of flags, monetary_locale_changed and numeric_locale_changed, to indicate that the lconv needs to be reloaded. But setlocale() sets those flags before it's actually finished doing some swizzling. Note that the problem doesn't directly involve iconv_open(), it just happens to be triggered because iconv calls snprintf() at some point. I'm having a hard time understanding what guarantees we're supposed to provide here, POSIX doesn't seem to say much about thread-safety and setlocale(), but it seems pretty reasonable to expect localeconv_l() to behave correctly wrt a concurrent setlocale() call, though.
Created attachment 227762 [details] patch This fixes the segfaults for me. I suspect there are other similar races lurking, but I don't know this code well enough to spot them. This should probably also be using C11 atomics, for now I just used sys/atomic.h.
patch fixes telegram-desktop segfault, described in 258212
on 13.0 for matter of base system version
(In reply to Mark Johnston from comment #3) Yes, this patch fixes the issue for me.
JFY, I check three different CPU with non-patched sources: 1. Intel(R) Core(TM) i7-9700T - the problem exists 2. Intel(R) Core(TM) i7-10700T - the problem exists 3. Intel(R) Xeon(R) CPU E5-2690 - telegram-desktop works fine
https://reviews.freebsd.org/D31899
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7eb138a9e53636366e615bdf04062fedc044bcea commit 7eb138a9e53636366e615bdf04062fedc044bcea Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-09-17 14:44:23 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-09-17 14:47:46 +0000 libc/locale: Fix races between localeconv(3) and setlocale(3) Each locale embeds a lazily initialized lconv which is populated by localeconv(3) and localeconv_l(3). When setlocale(3) updates the global locale, the lconv needs to be (lazily) reinitialized. To signal this, we set flag variables in the locale structure. There are two problems: - The flags are set before the locale is fully updated, so a concurrent localeconv() call can observe partially initialized locale data. - No barriers ensure that localeconv() observes a fully initialized locale if a flag is set. So, move the flag update appropriately, and use acq/rel barriers to provide some synchronization. Note that this is inadequate in the face of multiple concurrent calls to setlocale(3), but this is not expected to work regardless. Thanks to Henry Hu <henry.hu.sh@gmail.com> for providing a test case demonstrating the race. PR: 258360 MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31899 lib/libc/locale/lmonetary.c | 4 ++-- lib/libc/locale/lnumeric.c | 4 ++-- lib/libc/locale/localeconv.c | 8 ++++---- 3 files changed, 8 insertions(+), 8 deletions(-)
I found out that setlocale() is not supposed to be thread-safe, so fixed the race on the side of telegram-desktop. https://stackoverflow.com/questions/4057319/is-setlocale-thread-safe-function/35583206#35583206
(In reply to Gleb Smirnoff from comment #10) Hmm, yes, I assumed that it should be safe to concurrently call snprintf() and set the global locale, but perhaps not... Did your patch get applied? If so it may be best to revert my commit.
Yes, they accepted it. Is your patch adding atomics only to setlocale() or to any other function that uses locale as well?
(In reply to Gleb Smirnoff from comment #12) localeconv_l() now issues an acquire barrier, it is not just setlocale() that is modified. This function is used internally within libc, e.g., in stdio functions. There is no extra overhead on amd64, but in principle there is on arm64 and other weakly ordered arches. I was not able to measure it on an arm64 test system though, and I suspect it is not consequential.
You know better whether it needs to be reverted or not. localeconv_l() can be called implicitly by many functions, so if it is slowed down on arm64, that's unfortunate. According to 258212 it was glib update that caused tdesktop to crash. My joint research with Telegram guys confirms that. We found out that glib has commented out pthread_join() in Gio object destructor. I believe as more Linux distros will pickup buggy glib, tdesktop will race there, too. There could be more applications affected. So, very likely this isn't going to be only our FreeBSD problem. Of course on glibc it might not crash, could even by luck avoid any undefined behavior.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f89204d6b99d11aa1f67722e8c1d33b0fc4d61d7 commit f89204d6b99d11aa1f67722e8c1d33b0fc4d61d7 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-09-17 14:44:23 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-10-20 00:53:33 +0000 libc/locale: Fix races between localeconv(3) and setlocale(3) Each locale embeds a lazily initialized lconv which is populated by localeconv(3) and localeconv_l(3). When setlocale(3) updates the global locale, the lconv needs to be (lazily) reinitialized. To signal this, we set flag variables in the locale structure. There are two problems: - The flags are set before the locale is fully updated, so a concurrent localeconv() call can observe partially initialized locale data. - No barriers ensure that localeconv() observes a fully initialized locale if a flag is set. So, move the flag update appropriately, and use acq/rel barriers to provide some synchronization. Note that this is inadequate in the face of multiple concurrent calls to setlocale(3), but this is not expected to work regardless. Thanks to Henry Hu <henry.hu.sh@gmail.com> for providing a test case demonstrating the race. PR: 258360 Sponsored by: The FreeBSD Foundation (cherry picked from commit 7eb138a9e53636366e615bdf04062fedc044bcea) lib/libc/locale/lmonetary.c | 4 ++-- lib/libc/locale/lnumeric.c | 4 ++-- lib/libc/locale/localeconv.c | 8 ++++---- 3 files changed, 8 insertions(+), 8 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=589daa05be6f390bed6fdeb820c47f8db114c5ec commit 589daa05be6f390bed6fdeb820c47f8db114c5ec Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-10-20 01:13:12 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-10-20 01:13:12 +0000 Revert "libc/locale: Fix races between localeconv(3) and setlocale(3)" This reverts commit f89204d6b99d11aa1f67722e8c1d33b0fc4d61d7. I didn't intend to push this commit yet, pending discussion on PR 258360. PR: 258360 lib/libc/locale/lmonetary.c | 4 ++-- lib/libc/locale/lnumeric.c | 4 ++-- lib/libc/locale/localeconv.c | 8 ++++---- 3 files changed, 8 insertions(+), 8 deletions(-)
(In reply to Gleb Smirnoff from comment #14) Sorry for the delayed followup. As I mentioned I believe the overhead is quite minimal and checked it with a test program that calls localeconv() repeatedly on an espressobin and a graviton EC2 instance. I had also looked at glibc and noted that it is not susceptible to segfaulting in the face of this kind of race. So I don't see any real reason not to try and provide sensible behaviour there. If you still prefer a revert, I am happy to do it, I don't feel very strongly either way so long as the telegram bug is gone one way or another.