Bug 257905

Summary: misc/schilytools Character � displayed instead of ö
Product: Ports & Packages Reporter: bugzeo <kiboto6933>
Component: Individual Port(s)Assignee: Robert Clausecker <fuz>
Status: Closed FIXED    
Severity: Affects Many People CC: danfe, fuz, otis, schily
Priority: --- Flags: danfe: maintainer-feedback+
fuz: merge-quarterly?
Version: Latest   
Hardware: Any   
OS: Any   
URL: https://cgit.freebsd.org/ports/tree/misc/schilytools
Attachments:
Description Flags
ö letter totally omitted
none
devel/schilybase: update to 2021-09-01 and fix PR #257905
fuz: maintainer-approval+
devel/schilybase: fix umlauts in locale C.UTF-8 fuz: maintainer-approval+

Description bugzeo 2021-08-17 05:43:42 UTC
All schilytools contain copyright information, with the following output:

freebsd% smake --version
Smake release 1.5 2021/05/14 (amd64-unknown-freebsd13.0) Copyright (C) 1985, 87, 88, 91, 1995-2021 J�rg Schilling

He means "Jörg" instead of "J�rg".
Comment 1 Alexey Dokuchaev freebsd_committer freebsd_triage 2021-08-17 07:51:18 UTC
$ find /tmp/usr/ports/devel/schilybase/work/schily-2021-07-29/ -type f -name \*.c | xargs file | grep -c ISO-8859
117

That's quite a lot of files, and that's just the C source code, without READMEs et al.  Have you tried to convince upstream to switch to UTF-8?

I mean, while we could certainly fix the copyright string, we should not carry non-FreeBSD-specific, cosmetic-only patches in the ports without upstreaming them first.
Comment 2 Robert Clausecker freebsd_committer freebsd_triage 2021-08-17 08:29:41 UTC
I have reported this issue up stream.  As it is purely cosmetical (cf. comment #1), the only thing I'm going to do is report this upstream.  The author has some custom locale handling and I'm not sure if changing character sets would fix anything.  Most likely, it would just make the situation worse as the code base expects string constants to be encoded in ISO-8859-1 (IIRC).
Comment 3 Jörg Schilling 2021-08-17 08:52:27 UTC
If you setup a correct locale or use better software, this problem does not exist.

The files are using the central Europe standard encoding ISO8859-1 that is using an encoding that is identical to the low 8 bits of UNICODE.

The problem is UTF-8 software with insufficient behavior.

If you e.g use a better pager software (like the pager "p" from the schilytools), it will correctly display my name. If your pager does not do that correctly, did you consider making a bug report against that pager?

BTW: Similar support could be added to the rendering machines in e.g. xterm.

Another way to deal with your locale rendering problem would be to install *.mo files. If you would help with creating translation files, I would be willing to add gettext() support to the tools that do not yet have it.
Comment 4 Robert Clausecker freebsd_committer freebsd_triage 2021-08-17 09:45:30 UTC
(In reply to Jörg Schilling from comment #3)

Jörg,

The error is reproducible in any standard UTF-8 locale.  I am not sure what your pager does, but this sounds a lot like it takes input and inteprets it in a different locale than it is supposed to be.

Do you really consider not implementing UTF-8 in an intentionally defective manner to be “insufficient behaviour?”  Also note that Germany is not the navel of the word and people do use 8-bit character sets other than ISO 8859-1.  So for any country other than Germany, even the hack you have implemented in p would yield defective results in the general case and be plain useless.

For example, consider a Japanese user whose native 8-bit encoding is Shift-JIS.  What do you think this use thinks if your p turns his Shift-JIS documents into umlauts?

There's really no excuse for you not adapting the copyright string to the correct locale, or at least transliterate with oe if the encoding is not ISO-8859-1.

As for translation files, GNU gettext supports automatically generating missing strings from strings of other character sets for the same language.  So just internationalising the software without any .mo files might already solve the problem I think.
Comment 5 bugzeo 2021-08-17 10:07:33 UTC
I have everything set properly, it's the standard FreeBSD Virtualbox image and I didn't touch anything:
https://download.freebsd.org/ftp/releases/VM-IMAGES/13.0-RELEASE/amd64/Latest/

Attached is the screenshot of default install. I did not change any locale.
Comment 6 bugzeo 2021-08-17 10:08:11 UTC
Created attachment 227276 [details]
ö letter totally omitted
Comment 7 Jörg Schilling 2021-08-17 13:56:15 UTC
Re: comment #6:

It seems that someone did set up UTF-8 on a hardware that only supports 7-Bit ASCII.

I guess you could report a related bug to the creator of the filesystem image.
Comment 8 Robert Clausecker freebsd_committer freebsd_triage 2021-08-17 13:59:44 UTC
(In reply to Jörg Schilling from comment #7)

Jörg, I'm not sure how you get this idea.  The FreeBSD console uses software rendering and is thus perfectly capable of displaying Unicode.  And in fact, it does so just fine.  However, as your code produces an ISO-8859-1 encoded ö instead of the expected UTF-8 encoded ö, it is not displayed.  This is a defect in your code, not the vt(4) or sc(4).
Comment 9 Jörg Schilling 2021-08-17 18:59:25 UTC
GNU gettext definitely does not do do iconv() conversion for the from string, it could not do that, since it cannot know the related locale for the from string.

iconv() is only called in case there was a language translation from an .mo file before and if that .mo file uses GNU mime enhancements to mark the encoding of the .mo file.

See: http://web.archive.org/web/20030608111824/http://www.li18nux.org/docs/pdf/LI18NUX-2000-amd4.pdf for the Sun/GNU agreement from Y2000 as long as the upcomming POSIX standard text is not yet ready for gettext().

The problem here is that we have standards that have been defined by US people that (because they are from the US)  do not suffer from the pitfalls of the insufficient UTF-8 standard.

Shift JIS is a really bad encoding for the command line, since it is stateful. Stateful encodings are broken by design.

If you however are in an UTF-8 based locale, you in contrary to your claims need to assume that the person is using UNICODE encoding and ISO8859-1 is identical to the range 0..255 of UNICODE.

What p(1) does is just to implement a more intelligent handling for bytes that result in an EILSEQ error. There is no "official" way for such a case and what p(1) does just results in what people expect.

This intelligent handling could be made part of the rendering machines that are e.g. used by xterm...
Comment 10 Alexey Dokuchaev freebsd_committer freebsd_triage 2021-08-18 06:18:04 UTC
(In reply to Jörg Schilling from comment #7)
> It seems that someone did set up UTF-8 on a hardware that only supports 7-Bit
> ASCII.
Ha, that'd be funny.  I mean, seeing 7-bit-only capable hardware in common use in XXI century.

> The problem is UTF-8 software with insufficient behavior.
No, the problem is that you're still using the central Europe standard encoding ISO8859-1 while the rest of the world had moved to UTF-8.
Comment 11 Jörg Schilling 2021-08-18 08:02:34 UTC
Internationalization is based on support for various different encodings.

It is a mistake to assume only a single "globally" used encoding, regardless of which encodig this is. So assuming UTF-8 is also a mistake.

Maybe my conclusion is not yet understood:

UNICODE is an extension to the range of ISO-8859-1 and software that supports UNICODE should be written in a way to support that. My pager p(1) does this. 

The only other method that currently could help is to install gettext () based translations to the english language, even in case that the source language is english as well. This is currently the only other widely supported method to permit a hidden iconv() call to the output.

BTW: other authors seem to have resignated with internationalization and just do not write their names correctly to avoid problems in similar cases.
Comment 12 Robert Clausecker freebsd_committer freebsd_triage 2021-09-02 15:47:11 UTC
Created attachment 227609 [details]
devel/schilybase: update to 2021-09-01 and fix PR #257905

This patch updates the schilytools to 2021-09-01 and fixes this bug by installing dummy message catalogues.

This patch contains an ISO-8859-1 encoded file.  When applying it, make sure that cksum(1) reports for devel/schilybase/files/SCHILY_utils.po:

    4279375820 364 devel/schilybase/files/SCHILY_utils.po

The patch also fixes a number of other unrelated issues:

 - add WWW to misc/schilytools/pkg-descr
 - use make loops and conditionals instead of shell loops
 - regenerate some patches to make them apply cleanly
 - add an NLS option to devel/schilybase (toggles dummy message catalogue generation)

Changes: https://sourceforge.net/projects/schilytools/files/AN-2021-09-01
Reported by: portscout

Tested with Poudriere on arm64 FreeBSD 13.
Comment 13 commit-hook freebsd_committer freebsd_triage 2021-09-02 20:42:47 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=82ebe485f60306d21adec8fa5765522b3490df22

commit 82ebe485f60306d21adec8fa5765522b3490df22
Author:     Robert Clausecker <fuz@fuz.su>
AuthorDate: 2021-09-02 16:23:15 +0000
Commit:     Juraj Lutter <otis@FreeBSD.org>
CommitDate: 2021-09-02 20:42:20 +0000

    devel/schilybase: Update to 2021-09-01

    Update to 2021-09-01 and while here, also:
     - add WWW to misc/schilytools/pkg-descr
     - use make loops and conditionals instead of shell loops
     - regenerate some patches to make them apply cleanly
     - add an NLS option to devel/schilybase (toggles dummy message catalogue
       generation)

    PR:             257905
    Reported by:    bugzeo <kiboto6933@eyeremind.com>
    Differential Revision: https://reviews.freebsd.org/D31808

 devel/schilybase/Makefile                     | 23 ++++++++++++--
 devel/schilybase/Makefile.master              | 30 ++++++++----------
 devel/schilybase/distinfo                     |  6 ++--
 devel/schilybase/files/SCHILY_utils.po (new)  | 14 +++++++++
 devel/schilybase/files/patch-compare_Makefile |  4 +--
 devel/schilybase/files/patch-mt_Makefile      |  8 ++---
 devel/schilybase/pkg-plist                    | 45 +++++++++++++++++++++++++++
 misc/schilytools/pkg-descr                    |  2 ++
 8 files changed, 104 insertions(+), 28 deletions(-)
Comment 14 bugzeo 2022-01-07 00:25:15 UTC
The issue is still present. Jörg has died, so it's up to you to fix all his packages/ports.

root@freebsd:~ # pkg install smake
Updating FreeBSD repository catalogue...
FreeBSD repository is up to date.
All repositories are up to date.
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
	smake: 2021.09.18

Number of packages to be installed: 1

56 KiB to be downloaded.

Proceed with this action? [y/N]: y
[1/1] Fetching smake-2021.09.18.pkg: 100%   56 KiB  57.4kB/s    00:01    
Checking integrity... done (0 conflicting)
[1/1] Installing smake-2021.09.18...
[1/1] Extracting smake-2021.09.18: 100%
root@freebsd:~ # smake --version
Smake release 1.6 2021/08/12 (amd64-unknown-freebsd13.0) Copyright (C) 1985, 87, 88, 91, 1995-2021 J�rg Schilling
Comment 15 Robert Clausecker freebsd_committer freebsd_triage 2022-01-07 01:37:03 UTC
(In reply to bugzeo from comment #14)

Hi bugzeo,

Actually the issue should have been fixed by now by adding a set of dummy message catalogues.  Can you please let me know what (a) your locale (output of locale(1)) and (b) the character encoding configured in your terminal emulator is?  Also, have you built the package with option NLS enabled?
Comment 16 bugzeo 2022-02-13 22:09:07 UTC
I don't know if I set any NLS option, how do I check that?
reebsd% locale
LANG=C.UTF-8
LC_CTYPE="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_TIME="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=

I just run pkg update && pkg upgrade the rest if up to you. Just fix it if you like to.
Comment 17 bugzeo 2022-02-13 22:10:52 UTC
Found the package schilytools-utf8 which I presume should work better, what about switching to it?
https://sourceforge.net/projects/schilytools-utf8/
Comment 18 Robert Clausecker freebsd_committer freebsd_triage 2022-02-13 22:24:33 UTC
(In reply to bugzeo from comment #17)

Hi bugzeo,

I will not switch to that fork.

Please show the output of "pkg show schilybase" so I can see how you have configured the package.  Have you perhaps disabled NLS support?
Comment 19 Robert Clausecker freebsd_committer freebsd_triage 2022-02-13 23:25:24 UTC
Created attachment 231804 [details]
devel/schilybase: fix umlauts in locale C.UTF-8

Okay, I think I found the issue.  The attached patch should fix it.

Please MFH if possible.
Comment 20 commit-hook freebsd_committer freebsd_triage 2022-02-14 07:47:44 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=965979bfb06f5fb3c8f3a10ef52656f7c37f76fa

commit 965979bfb06f5fb3c8f3a10ef52656f7c37f76fa
Author:     Robert Clausecker <fuz@fuz.su>
AuthorDate: 2022-02-14 07:40:06 +0000
Commit:     Juraj Lutter <otis@FreeBSD.org>
CommitDate: 2022-02-14 07:46:55 +0000

    devel/schilybase: Fix umlauts in locale C.UTF-8

    PR:             257905
    Reported by:    bugzeo <kiboto6933@eyeremind.com>
    MFH:            2022Q1

 devel/schilybase/Makefile        | 6 +++---
 devel/schilybase/Makefile.master | 2 +-
 devel/schilybase/pkg-plist       | 1 +
 3 files changed, 5 insertions(+), 4 deletions(-)
Comment 21 commit-hook freebsd_committer freebsd_triage 2022-02-14 07:48:46 UTC
A commit in branch 2022Q1 references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=acd895cd1115ff46b441e66298b5164e9e280c29

commit acd895cd1115ff46b441e66298b5164e9e280c29
Author:     Robert Clausecker <fuz@fuz.su>
AuthorDate: 2022-02-14 07:40:06 +0000
Commit:     Juraj Lutter <otis@FreeBSD.org>
CommitDate: 2022-02-14 07:47:44 +0000

    devel/schilybase: Fix umlauts in locale C.UTF-8

    PR:             257905
    Reported by:    bugzeo <kiboto6933@eyeremind.com>
    MFH:            2022Q1

    (cherry picked from commit 965979bfb06f5fb3c8f3a10ef52656f7c37f76fa)

 devel/schilybase/Makefile        | 6 +++---
 devel/schilybase/Makefile.master | 2 +-
 devel/schilybase/pkg-plist       | 1 +
 3 files changed, 5 insertions(+), 4 deletions(-)
Comment 22 Juraj Lutter freebsd_committer freebsd_triage 2022-02-14 07:51:11 UTC
Committed, thanks.
Comment 23 bugzeo 2022-02-18 22:47:46 UTC
Now I confirm it's fixed. Thanks a lot.
Did you check that C.UTF-8 is fixed in all the other FreeBSD packages?
Comment 24 Robert Clausecker freebsd_committer freebsd_triage 2022-02-18 23:08:52 UTC
(In reply to bugzeo from comment #23)

No, but all packages from the schilytools set make use of the same message catalogue, so there's no reason why it shouldn't work for all of them.
Comment 25 bugzeo 2022-02-20 16:56:02 UTC
(In reply to Robert Clausecker from comment #24)
I mean for _ALL_ FreeBSD packages and ports, not only *schily*
Comment 26 Robert Clausecker freebsd_committer freebsd_triage 2022-02-20 19:19:04 UTC
(In reply to bugzeo from comment #25)

I am not sure what you expect me to do.  The problem boiled down to a missing message catalogue for the schilytools.  If other packages were broken before, they will continue to be broken.  If you find such an issue elsewhere, make a new bug report about the software in question as that would be out of scope for this issue.
Comment 27 bugzeo 2022-08-04 21:47:18 UTC
Thanks issue closed.
Comment 28 bugzeo 2023-11-27 19:44:51 UTC
It's happening again with star 1.7.0 (2023/01/11).
Comment 29 bugzeo 2023-11-27 19:45:46 UTC
freebsd 14.0-RELEASE
Comment 30 Robert Clausecker freebsd_committer freebsd_triage 2023-11-27 20:19:09 UTC
Thank you for the renewed report.  This looks like a regression in gettext-tools as we didn't change anything about this translation file.

Let me investigate in detail please.
Comment 31 Robert Clausecker freebsd_committer freebsd_triage 2023-11-27 20:28:34 UTC
I have filed a bug report with the gettext project and will push a workaround shortly.  I am sorry for the inconvenience.
Comment 32 commit-hook freebsd_committer freebsd_triage 2023-12-03 10:19:51 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=e8c2558f623d87d044e0dec460f606a40a46d359

commit e8c2558f623d87d044e0dec460f606a40a46d359
Author:     Robert Clausecker <fuz@FreeBSD.org>
AuthorDate: 2023-11-27 20:31:58 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2023-12-03 10:18:35 +0000

    devel/schilybase: work around bug in gettext-0.22

    gettext 0.22 started to transcode PO files to UTF-8 before processing
    them, converting all msgids and messages to Unicode in the process.
    This breaks schilytools which assumes msgids are in ISO-8859-1 encoding.
    Work around the breakage using the new --no-convert option.  A bug
    report was filed with upstream in the hope that they may fix the bug.

    Reported by:    bugzeo <kiboto6933@eyeremind.com>
    See also:       https://savannah.gnu.org/news/?id=10378
    PR:             257905
    MFH:            2023Q4

 devel/schilybase/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 33 commit-hook freebsd_committer freebsd_triage 2023-12-03 10:20:54 UTC
A commit in branch 2023Q4 references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=2b8cc87a75d6684adaa5cc8a57c8834db42fc812

commit 2b8cc87a75d6684adaa5cc8a57c8834db42fc812
Author:     Robert Clausecker <fuz@FreeBSD.org>
AuthorDate: 2023-11-27 20:31:58 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2023-12-03 10:19:52 +0000

    devel/schilybase: work around bug in gettext-0.22

    gettext 0.22 started to transcode PO files to UTF-8 before processing
    them, converting all msgids and messages to Unicode in the process.
    This breaks schilytools which assumes msgids are in ISO-8859-1 encoding.
    Work around the breakage using the new --no-convert option.  A bug
    report was filed with upstream in the hope that they may fix the bug.

    Reported by:    bugzeo <kiboto6933@eyeremind.com>
    See also:       https://savannah.gnu.org/news/?id=10378
    PR:             257905
    MFH:            2023Q4

    (cherry picked from commit e8c2558f623d87d044e0dec460f606a40a46d359)

 devel/schilybase/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 34 Robert Clausecker freebsd_committer freebsd_triage 2023-12-03 10:21:27 UTC
Thank you for being so vigilant about this issue.
It should be addressed now.