For example net-im/qTox and gwenview5. More information is here: http://bugs.icu-project.org/trac/ticket/12886 https://bugreports.qt.io/browse/QTBUG-57522 https://github.com/qTox/qTox/issues/4012#issuecomment-273962027 This patch was tested for net-im/qTox: --- common/putil.cpp.orig 2016-10-19 17:20:56 UTC +++ common/putil.cpp @@ -1813,6 +1813,31 @@ /* Remap CP949 to a similar codepage to avoid issues with backslash and won symbol. */ name = "EUC-KR"; } + if (locale != NULL && uprv_strcmp(name, "euc") == 0) { + /* Linux underspecifies the "EUC" name. */ + if (uprv_strcmp(locale, "korean") == 0) { + name = "EUC-KR"; + } + else if (uprv_strcmp(locale, "japanese") == 0) { + /* See comment below about eucJP */ + name = "eucjis"; + } + } + else if (uprv_strcmp(name, "eucjp") == 0) { + /* + ibm-1350 is the best match, but unavailable. + ibm-954 is mostly a superset of ibm-1350. + ibm-33722 is the default for eucJP (similar to Windows). + */ + name = "eucjis"; + } + else if (locale != NULL && uprv_strcmp(locale, "en_US_POSIX") != 0 && + (uprv_strcmp(name, "ANSI_X3.4-1968") == 0 || uprv_strcmp(name, "US-ASCII") == 0)) { + /* + * For non C/POSIX locale, default the code page to UTF-8 instead of US-ASCII. + */ + name = "UTF-8"; + } #elif U_PLATFORM == U_PF_HPUX if (locale != NULL && uprv_strcmp(locale, "zh_HK") == 0 && uprv_strcmp(name, "big5") == 0) { /* HP decided to extend big5 as hkbig5 even though it's not compatible :-( */ @@ -1942,7 +1967,7 @@ nl_langinfo may use the same buffer as setlocale. */ { const char *codeset = nl_langinfo(U_NL_LANGINFO_CODESET); -#if U_PLATFORM_IS_DARWIN_BASED || U_PLATFORM_IS_LINUX_BASED +#if U_PLATFORM_IS_DARWIN_BASED || U_PLATFORM_IS_LINUX_BASED || U_PLATFORM == U_PF_BSD /* * On Linux and MacOSX, ensure that default codepage for non C/POSIX locale is UTF-8 * instead of ASCII.
Could you please add the patch as a proper attachment :) I can confirm, that this fixes the issue seen here: https://people.freebsd.org/~tcberner/icu_problem.png of gwenview refusing to open non-ascii-named files.
Created attachment 179228 [details] devel/icu/files/patch-common_putil.cpp
Any input from office@ on this?
Created attachment 181127 [details] convert ASCII to UTF-8 outside C/POSIX locale It is obvious we should be handling the ASCII case like Linux and OS X. However, I do not think it wise to copy the Linux section wholesale as there may be unintended consequences to changing the handling of Korean and Japanese. Instead, I have taken the approach of make BSD be the same as Darwin. The handling of CP949 was identical but Darwin was already handling the ASCII->UTF-8 and we can just tack onto the #if instead of copying code. I have verified this change corrects the issue observed in qTox.
A commit references this bug: Author: rezny Date: Fri Apr 7 22:06:08 UTC 2017 New revision: 437961 URL: https://svnweb.freebsd.org/changeset/ports/437961 Log: Behave same on BSDs as on Darwin in that UTF-8 shall be used instead of ASCII outside the POSIX 'C' locale and UTF-8 is deafult in case anything should call ucnv_getDefaultName() prior to calling setlocale(). This change fixes problems that occur in multiple Qt5 applications when handling files with names containing non-ASCII characters. PR: 216372 Reported by: vvd@unislabs.com Approved by: bapt (office@), swills (mentor) Differential Revision: https://reviews.freebsd.org/D10128 Changes: head/devel/icu/Makefile head/devel/icu/files/patch-common_putil.cpp