All schilytools contain copyright information, with the following output: freebsd% smake --version Smake release 1.5 2021/05/14 (amd64-unknown-freebsd13.0) Copyright (C) 1985, 87, 88, 91, 1995-2021 J�rg Schilling He means "Jörg" instead of "J�rg".
$ find /tmp/usr/ports/devel/schilybase/work/schily-2021-07-29/ -type f -name \*.c | xargs file | grep -c ISO-8859 117 That's quite a lot of files, and that's just the C source code, without READMEs et al. Have you tried to convince upstream to switch to UTF-8? I mean, while we could certainly fix the copyright string, we should not carry non-FreeBSD-specific, cosmetic-only patches in the ports without upstreaming them first.
I have reported this issue up stream. As it is purely cosmetical (cf. comment #1), the only thing I'm going to do is report this upstream. The author has some custom locale handling and I'm not sure if changing character sets would fix anything. Most likely, it would just make the situation worse as the code base expects string constants to be encoded in ISO-8859-1 (IIRC).
If you setup a correct locale or use better software, this problem does not exist. The files are using the central Europe standard encoding ISO8859-1 that is using an encoding that is identical to the low 8 bits of UNICODE. The problem is UTF-8 software with insufficient behavior. If you e.g use a better pager software (like the pager "p" from the schilytools), it will correctly display my name. If your pager does not do that correctly, did you consider making a bug report against that pager? BTW: Similar support could be added to the rendering machines in e.g. xterm. Another way to deal with your locale rendering problem would be to install *.mo files. If you would help with creating translation files, I would be willing to add gettext() support to the tools that do not yet have it.
(In reply to Jörg Schilling from comment #3) Jörg, The error is reproducible in any standard UTF-8 locale. I am not sure what your pager does, but this sounds a lot like it takes input and inteprets it in a different locale than it is supposed to be. Do you really consider not implementing UTF-8 in an intentionally defective manner to be “insufficient behaviour?” Also note that Germany is not the navel of the word and people do use 8-bit character sets other than ISO 8859-1. So for any country other than Germany, even the hack you have implemented in p would yield defective results in the general case and be plain useless. For example, consider a Japanese user whose native 8-bit encoding is Shift-JIS. What do you think this use thinks if your p turns his Shift-JIS documents into umlauts? There's really no excuse for you not adapting the copyright string to the correct locale, or at least transliterate with oe if the encoding is not ISO-8859-1. As for translation files, GNU gettext supports automatically generating missing strings from strings of other character sets for the same language. So just internationalising the software without any .mo files might already solve the problem I think.
I have everything set properly, it's the standard FreeBSD Virtualbox image and I didn't touch anything: https://download.freebsd.org/ftp/releases/VM-IMAGES/13.0-RELEASE/amd64/Latest/ Attached is the screenshot of default install. I did not change any locale.
Created attachment 227276 [details] ö letter totally omitted
Re: comment #6: It seems that someone did set up UTF-8 on a hardware that only supports 7-Bit ASCII. I guess you could report a related bug to the creator of the filesystem image.
(In reply to Jörg Schilling from comment #7) Jörg, I'm not sure how you get this idea. The FreeBSD console uses software rendering and is thus perfectly capable of displaying Unicode. And in fact, it does so just fine. However, as your code produces an ISO-8859-1 encoded ö instead of the expected UTF-8 encoded ö, it is not displayed. This is a defect in your code, not the vt(4) or sc(4).
GNU gettext definitely does not do do iconv() conversion for the from string, it could not do that, since it cannot know the related locale for the from string. iconv() is only called in case there was a language translation from an .mo file before and if that .mo file uses GNU mime enhancements to mark the encoding of the .mo file. See: http://web.archive.org/web/20030608111824/http://www.li18nux.org/docs/pdf/LI18NUX-2000-amd4.pdf for the Sun/GNU agreement from Y2000 as long as the upcomming POSIX standard text is not yet ready for gettext(). The problem here is that we have standards that have been defined by US people that (because they are from the US) do not suffer from the pitfalls of the insufficient UTF-8 standard. Shift JIS is a really bad encoding for the command line, since it is stateful. Stateful encodings are broken by design. If you however are in an UTF-8 based locale, you in contrary to your claims need to assume that the person is using UNICODE encoding and ISO8859-1 is identical to the range 0..255 of UNICODE. What p(1) does is just to implement a more intelligent handling for bytes that result in an EILSEQ error. There is no "official" way for such a case and what p(1) does just results in what people expect. This intelligent handling could be made part of the rendering machines that are e.g. used by xterm...
(In reply to Jörg Schilling from comment #7) > It seems that someone did set up UTF-8 on a hardware that only supports 7-Bit > ASCII. Ha, that'd be funny. I mean, seeing 7-bit-only capable hardware in common use in XXI century. > The problem is UTF-8 software with insufficient behavior. No, the problem is that you're still using the central Europe standard encoding ISO8859-1 while the rest of the world had moved to UTF-8.
Internationalization is based on support for various different encodings. It is a mistake to assume only a single "globally" used encoding, regardless of which encodig this is. So assuming UTF-8 is also a mistake. Maybe my conclusion is not yet understood: UNICODE is an extension to the range of ISO-8859-1 and software that supports UNICODE should be written in a way to support that. My pager p(1) does this. The only other method that currently could help is to install gettext () based translations to the english language, even in case that the source language is english as well. This is currently the only other widely supported method to permit a hidden iconv() call to the output. BTW: other authors seem to have resignated with internationalization and just do not write their names correctly to avoid problems in similar cases.
Created attachment 227609 [details] devel/schilybase: update to 2021-09-01 and fix PR #257905 This patch updates the schilytools to 2021-09-01 and fixes this bug by installing dummy message catalogues. This patch contains an ISO-8859-1 encoded file. When applying it, make sure that cksum(1) reports for devel/schilybase/files/SCHILY_utils.po: 4279375820 364 devel/schilybase/files/SCHILY_utils.po The patch also fixes a number of other unrelated issues: - add WWW to misc/schilytools/pkg-descr - use make loops and conditionals instead of shell loops - regenerate some patches to make them apply cleanly - add an NLS option to devel/schilybase (toggles dummy message catalogue generation) Changes: https://sourceforge.net/projects/schilytools/files/AN-2021-09-01 Reported by: portscout Tested with Poudriere on arm64 FreeBSD 13.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=82ebe485f60306d21adec8fa5765522b3490df22 commit 82ebe485f60306d21adec8fa5765522b3490df22 Author: Robert Clausecker <fuz@fuz.su> AuthorDate: 2021-09-02 16:23:15 +0000 Commit: Juraj Lutter <otis@FreeBSD.org> CommitDate: 2021-09-02 20:42:20 +0000 devel/schilybase: Update to 2021-09-01 Update to 2021-09-01 and while here, also: - add WWW to misc/schilytools/pkg-descr - use make loops and conditionals instead of shell loops - regenerate some patches to make them apply cleanly - add an NLS option to devel/schilybase (toggles dummy message catalogue generation) PR: 257905 Reported by: bugzeo <kiboto6933@eyeremind.com> Differential Revision: https://reviews.freebsd.org/D31808 devel/schilybase/Makefile | 23 ++++++++++++-- devel/schilybase/Makefile.master | 30 ++++++++---------- devel/schilybase/distinfo | 6 ++-- devel/schilybase/files/SCHILY_utils.po (new) | 14 +++++++++ devel/schilybase/files/patch-compare_Makefile | 4 +-- devel/schilybase/files/patch-mt_Makefile | 8 ++--- devel/schilybase/pkg-plist | 45 +++++++++++++++++++++++++++ misc/schilytools/pkg-descr | 2 ++ 8 files changed, 104 insertions(+), 28 deletions(-)
The issue is still present. Jörg has died, so it's up to you to fix all his packages/ports. root@freebsd:~ # pkg install smake Updating FreeBSD repository catalogue... FreeBSD repository is up to date. All repositories are up to date. The following 1 package(s) will be affected (of 0 checked): New packages to be INSTALLED: smake: 2021.09.18 Number of packages to be installed: 1 56 KiB to be downloaded. Proceed with this action? [y/N]: y [1/1] Fetching smake-2021.09.18.pkg: 100% 56 KiB 57.4kB/s 00:01 Checking integrity... done (0 conflicting) [1/1] Installing smake-2021.09.18... [1/1] Extracting smake-2021.09.18: 100% root@freebsd:~ # smake --version Smake release 1.6 2021/08/12 (amd64-unknown-freebsd13.0) Copyright (C) 1985, 87, 88, 91, 1995-2021 J�rg Schilling
(In reply to bugzeo from comment #14) Hi bugzeo, Actually the issue should have been fixed by now by adding a set of dummy message catalogues. Can you please let me know what (a) your locale (output of locale(1)) and (b) the character encoding configured in your terminal emulator is? Also, have you built the package with option NLS enabled?
I don't know if I set any NLS option, how do I check that? reebsd% locale LANG=C.UTF-8 LC_CTYPE="C.UTF-8" LC_COLLATE="C.UTF-8" LC_TIME="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_ALL= I just run pkg update && pkg upgrade the rest if up to you. Just fix it if you like to.
Found the package schilytools-utf8 which I presume should work better, what about switching to it? https://sourceforge.net/projects/schilytools-utf8/
(In reply to bugzeo from comment #17) Hi bugzeo, I will not switch to that fork. Please show the output of "pkg show schilybase" so I can see how you have configured the package. Have you perhaps disabled NLS support?
Created attachment 231804 [details] devel/schilybase: fix umlauts in locale C.UTF-8 Okay, I think I found the issue. The attached patch should fix it. Please MFH if possible.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=965979bfb06f5fb3c8f3a10ef52656f7c37f76fa commit 965979bfb06f5fb3c8f3a10ef52656f7c37f76fa Author: Robert Clausecker <fuz@fuz.su> AuthorDate: 2022-02-14 07:40:06 +0000 Commit: Juraj Lutter <otis@FreeBSD.org> CommitDate: 2022-02-14 07:46:55 +0000 devel/schilybase: Fix umlauts in locale C.UTF-8 PR: 257905 Reported by: bugzeo <kiboto6933@eyeremind.com> MFH: 2022Q1 devel/schilybase/Makefile | 6 +++--- devel/schilybase/Makefile.master | 2 +- devel/schilybase/pkg-plist | 1 + 3 files changed, 5 insertions(+), 4 deletions(-)
A commit in branch 2022Q1 references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=acd895cd1115ff46b441e66298b5164e9e280c29 commit acd895cd1115ff46b441e66298b5164e9e280c29 Author: Robert Clausecker <fuz@fuz.su> AuthorDate: 2022-02-14 07:40:06 +0000 Commit: Juraj Lutter <otis@FreeBSD.org> CommitDate: 2022-02-14 07:47:44 +0000 devel/schilybase: Fix umlauts in locale C.UTF-8 PR: 257905 Reported by: bugzeo <kiboto6933@eyeremind.com> MFH: 2022Q1 (cherry picked from commit 965979bfb06f5fb3c8f3a10ef52656f7c37f76fa) devel/schilybase/Makefile | 6 +++--- devel/schilybase/Makefile.master | 2 +- devel/schilybase/pkg-plist | 1 + 3 files changed, 5 insertions(+), 4 deletions(-)
Committed, thanks.
Now I confirm it's fixed. Thanks a lot. Did you check that C.UTF-8 is fixed in all the other FreeBSD packages?
(In reply to bugzeo from comment #23) No, but all packages from the schilytools set make use of the same message catalogue, so there's no reason why it shouldn't work for all of them.
(In reply to Robert Clausecker from comment #24) I mean for _ALL_ FreeBSD packages and ports, not only *schily*
(In reply to bugzeo from comment #25) I am not sure what you expect me to do. The problem boiled down to a missing message catalogue for the schilytools. If other packages were broken before, they will continue to be broken. If you find such an issue elsewhere, make a new bug report about the software in question as that would be out of scope for this issue.
Thanks issue closed.
It's happening again with star 1.7.0 (2023/01/11).
freebsd 14.0-RELEASE
Thank you for the renewed report. This looks like a regression in gettext-tools as we didn't change anything about this translation file. Let me investigate in detail please.
I have filed a bug report with the gettext project and will push a workaround shortly. I am sorry for the inconvenience.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=e8c2558f623d87d044e0dec460f606a40a46d359 commit e8c2558f623d87d044e0dec460f606a40a46d359 Author: Robert Clausecker <fuz@FreeBSD.org> AuthorDate: 2023-11-27 20:31:58 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2023-12-03 10:18:35 +0000 devel/schilybase: work around bug in gettext-0.22 gettext 0.22 started to transcode PO files to UTF-8 before processing them, converting all msgids and messages to Unicode in the process. This breaks schilytools which assumes msgids are in ISO-8859-1 encoding. Work around the breakage using the new --no-convert option. A bug report was filed with upstream in the hope that they may fix the bug. Reported by: bugzeo <kiboto6933@eyeremind.com> See also: https://savannah.gnu.org/news/?id=10378 PR: 257905 MFH: 2023Q4 devel/schilybase/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
A commit in branch 2023Q4 references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=2b8cc87a75d6684adaa5cc8a57c8834db42fc812 commit 2b8cc87a75d6684adaa5cc8a57c8834db42fc812 Author: Robert Clausecker <fuz@FreeBSD.org> AuthorDate: 2023-11-27 20:31:58 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2023-12-03 10:19:52 +0000 devel/schilybase: work around bug in gettext-0.22 gettext 0.22 started to transcode PO files to UTF-8 before processing them, converting all msgids and messages to Unicode in the process. This breaks schilytools which assumes msgids are in ISO-8859-1 encoding. Work around the breakage using the new --no-convert option. A bug report was filed with upstream in the hope that they may fix the bug. Reported by: bugzeo <kiboto6933@eyeremind.com> See also: https://savannah.gnu.org/news/?id=10378 PR: 257905 MFH: 2023Q4 (cherry picked from commit e8c2558f623d87d044e0dec460f606a40a46d359) devel/schilybase/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Thank you for being so vigilant about this issue. It should be addressed now.
I's happening again in: star: star 1.7.0 (amd64-unknown-freebsd14.0) 2023/01/11 FreeBSD 14.2-RELEASE
(In reply to bugzeo from comment #35) Please open a separate PR for this. Thanks.
(In reply to Mark Linimon from comment #36) > Please open a separate PR for this. Thanks. Why? Having one PR is easier to track, I always found numerous bugs about the same or identical problems a nuisance, unless Robert confirms that this is a different problem now.
(In reply to Alexey Dokuchaev from comment #37) Sorry, I didn't get around to work on this one yet. It's okay to reopen this bug report.
Removing Schilly from CC.
(In reply to Juraj Lutter from comment #39) Schily is dead, I don't think he'll mind.
@linimon What patch do you plan to commit? I would appreciate if you could post it for review first.