For example, the following JavaScript code may produce unintended results. (function () { var i, d=[], s=[]; d[0] = new Date(0); s[0] = d[0].toLocaleString("en-US"); d[1] = new Date(s[0]); console.log(d[0], s[0], d[1]); for (i = 0; i < s[0].length; ++i) { console.log(s[0].charAt(i), s[0].charCodeAt(i).toString(16)); } })(); d[1] is expected to be the same as d[0], but is "Invalid Date" in icu-dependent web browsers (firefox-esr-102.7.0,1, chromium-109.0.5414.74 and... seamonkey-2.49.4_27 :) ). The reason for this is that the string converted to LocaleString contains U+202F. One problem with this is that the en, en-* locale have been deified as not containing multibyte characters in the language areas that use multibyte characters (e.g. Japan :) ). This is why they would choose this method. In fact, there are sites that display "Invalid Date" because of this. The problem with this is that it will behave differently with browsers that are not using icu. As far as I have tried, Windows10+ChromeEdge and Android+Edge return LocaleString without multibyte characters, which works as expected. I think the distribution file for the port already has a database of the parts related to this, but the source is this. https://github.com/unicode-org/icu/blob/bb0e745e25c99cc57055caf45c81b95ef63b25d4/icu4c/source/data/locales/en.txt What should it be?
Created attachment 240299 [details] Ports only for use in overlays, etc. This creates icudt*.dat that replaces some multibyte characters in en locale. Replacing ${LOCALBASE}/share/icu/72.1/icudt72l.dat with this will eliminate the above problem. For example, it is easier to see the weather forecast at different times of the day and the extent of rainfall :)
Has this finding been reported to ICU upstream? (I actually just hit this "Invalid Date" problem myself)
(In reply to Charlie Li from comment #2) No, I have not yet done that at all. As for my thoughts on this issue... This is not a problem for people using en, en-US, or en-* locales, it is the right. This does not seem to be a problem on the Linux side which seems to use ICU in the same way. At least it is not a problem with Android(+MS Edge browser). This is more of a problem on the website production side. The approach of trying to put the time string output by the new feature called LocaleString into the old-fashioned Date.parse function is strange. Therefore, shouldn't we be reporting to the website where the problem occurs? Anyway, I posted here because it didn't seem to be much of an issue on the Linux side, but has Linux outside of Android disappeared? :)
Some application consumers like Mozilla bundle libraries like ICU, which may not be the latest version. I've been hitting this with the en locales myself.
For now, the space character has changed to a multibyte character due to this commit. https://github.com/unicode-org/cldr/commit/a83026ab8c8fa6ed88f1047c4d0c6089f88b7e5d This is where it was reflected in the ICU. https://github.com/unicode-org/icu/commit/64b35481263ac4df37a28a9c549553ecc9710db2
(In reply to Charlie Li from comment #4) Chromium 110 and Firefox 110 bundle ICU 72.
Created attachment 240577 [details] Experimental patch for devel/icu It won't use the bundle's icudt72l.dat, but will rebuild it. It can be toggled by option. To begin with, it may not be usable as is in a big-endian environment.
Created attachment 240578 [details] experimental patch for devel/icu It just builds one that allows the use of the environment variable ICU_DATA. Running as env ICU_DATA=/usr/local/share/icudt seamonkey will use different data like the port of attachment 240299 [details]. This may possibly mean that there is a risk like LD_PRELOAD. It seems that the browser has built in a behavior to convert whitespace characters, but would this be the case if we were to take action here?
The problem for me seemed to have disappeared starting Firefox 110; 109 exhibited the issue.
chromium also had no more problems with chromium-110, I think. I don't know what kind of fix it is, but it may be that a bug like the one below was embedded on a dare :) Mar/14/2023 10:49 PM
ICU 73.2 seems to have changed due to compatibility. https://github.com/unicode-org/icu/releases/tag/release-73-2