Bug 257905 - misc/schilytools Character � displayed instead of ö
Summary: misc/schilytools Character � displayed instead of ö
Status: Closed Not Accepted
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Many People
Assignee: Juraj Lutter
URL: https://cgit.freebsd.org/ports/tree/m...
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-17 05:43 UTC by bugzeo
Modified: 2021-09-02 20:42 UTC (History)
4 users (show)

See Also:
danfe: maintainer-feedback+


Attachments
ö letter totally omitted (5.49 KB, image/png)
2021-08-17 10:08 UTC, bugzeo
no flags Details
devel/schilybase: update to 2021-09-01 and fix PR #257905 (9.87 KB, patch)
2021-09-02 15:47 UTC, Robert Clausecker
fuz: maintainer-approval+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description bugzeo 2021-08-17 05:43:42 UTC
All schilytools contain copyright information, with the following output:

freebsd% smake --version
Smake release 1.5 2021/05/14 (amd64-unknown-freebsd13.0) Copyright (C) 1985, 87, 88, 91, 1995-2021 J�rg Schilling

He means "Jörg" instead of "J�rg".
Comment 1 Alexey Dokuchaev freebsd_committer 2021-08-17 07:51:18 UTC
$ find /tmp/usr/ports/devel/schilybase/work/schily-2021-07-29/ -type f -name \*.c | xargs file | grep -c ISO-8859
117

That's quite a lot of files, and that's just the C source code, without READMEs et al.  Have you tried to convince upstream to switch to UTF-8?

I mean, while we could certainly fix the copyright string, we should not carry non-FreeBSD-specific, cosmetic-only patches in the ports without upstreaming them first.
Comment 2 Robert Clausecker 2021-08-17 08:29:41 UTC
I have reported this issue up stream.  As it is purely cosmetical (cf. comment #1), the only thing I'm going to do is report this upstream.  The author has some custom locale handling and I'm not sure if changing character sets would fix anything.  Most likely, it would just make the situation worse as the code base expects string constants to be encoded in ISO-8859-1 (IIRC).
Comment 3 Jörg Schilling 2021-08-17 08:52:27 UTC
If you setup a correct locale or use better software, this problem does not exist.

The files are using the central Europe standard encoding ISO8859-1 that is using an encoding that is identical to the low 8 bits of UNICODE.

The problem is UTF-8 software with insufficient behavior.

If you e.g use a better pager software (like the pager "p" from the schilytools), it will correctly display my name. If your pager does not do that correctly, did you consider making a bug report against that pager?

BTW: Similar support could be added to the rendering machines in e.g. xterm.

Another way to deal with your locale rendering problem would be to install *.mo files. If you would help with creating translation files, I would be willing to add gettext() support to the tools that do not yet have it.
Comment 4 Robert Clausecker 2021-08-17 09:45:30 UTC
(In reply to Jörg Schilling from comment #3)

Jörg,

The error is reproducible in any standard UTF-8 locale.  I am not sure what your pager does, but this sounds a lot like it takes input and inteprets it in a different locale than it is supposed to be.

Do you really consider not implementing UTF-8 in an intentionally defective manner to be “insufficient behaviour?”  Also note that Germany is not the navel of the word and people do use 8-bit character sets other than ISO 8859-1.  So for any country other than Germany, even the hack you have implemented in p would yield defective results in the general case and be plain useless.

For example, consider a Japanese user whose native 8-bit encoding is Shift-JIS.  What do you think this use thinks if your p turns his Shift-JIS documents into umlauts?

There's really no excuse for you not adapting the copyright string to the correct locale, or at least transliterate with oe if the encoding is not ISO-8859-1.

As for translation files, GNU gettext supports automatically generating missing strings from strings of other character sets for the same language.  So just internationalising the software without any .mo files might already solve the problem I think.
Comment 5 bugzeo 2021-08-17 10:07:33 UTC
I have everything set properly, it's the standard FreeBSD Virtualbox image and I didn't touch anything:
https://download.freebsd.org/ftp/releases/VM-IMAGES/13.0-RELEASE/amd64/Latest/

Attached is the screenshot of default install. I did not change any locale.
Comment 6 bugzeo 2021-08-17 10:08:11 UTC
Created attachment 227276 [details]
ö letter totally omitted
Comment 7 Jörg Schilling 2021-08-17 13:56:15 UTC
Re: comment #6:

It seems that someone did set up UTF-8 on a hardware that only supports 7-Bit ASCII.

I guess you could report a related bug to the creator of the filesystem image.
Comment 8 Robert Clausecker 2021-08-17 13:59:44 UTC
(In reply to Jörg Schilling from comment #7)

Jörg, I'm not sure how you get this idea.  The FreeBSD console uses software rendering and is thus perfectly capable of displaying Unicode.  And in fact, it does so just fine.  However, as your code produces an ISO-8859-1 encoded ö instead of the expected UTF-8 encoded ö, it is not displayed.  This is a defect in your code, not the vt(4) or sc(4).
Comment 9 Jörg Schilling 2021-08-17 18:59:25 UTC
GNU gettext definitely does not do do iconv() conversion for the from string, it could not do that, since it cannot know the related locale for the from string.

iconv() is only called in case there was a language translation from an .mo file before and if that .mo file uses GNU mime enhancements to mark the encoding of the .mo file.

See: http://web.archive.org/web/20030608111824/http://www.li18nux.org/docs/pdf/LI18NUX-2000-amd4.pdf for the Sun/GNU agreement from Y2000 as long as the upcomming POSIX standard text is not yet ready for gettext().

The problem here is that we have standards that have been defined by US people that (because they are from the US)  do not suffer from the pitfalls of the insufficient UTF-8 standard.

Shift JIS is a really bad encoding for the command line, since it is stateful. Stateful encodings are broken by design.

If you however are in an UTF-8 based locale, you in contrary to your claims need to assume that the person is using UNICODE encoding and ISO8859-1 is identical to the range 0..255 of UNICODE.

What p(1) does is just to implement a more intelligent handling for bytes that result in an EILSEQ error. There is no "official" way for such a case and what p(1) does just results in what people expect.

This intelligent handling could be made part of the rendering machines that are e.g. used by xterm...
Comment 10 Alexey Dokuchaev freebsd_committer 2021-08-18 06:18:04 UTC
(In reply to Jörg Schilling from comment #7)
> It seems that someone did set up UTF-8 on a hardware that only supports 7-Bit
> ASCII.
Ha, that'd be funny.  I mean, seeing 7-bit-only capable hardware in common use in XXI century.

> The problem is UTF-8 software with insufficient behavior.
No, the problem is that you're still using the central Europe standard encoding ISO8859-1 while the rest of the world had moved to UTF-8.
Comment 11 Jörg Schilling 2021-08-18 08:02:34 UTC
Internationalization is based on support for various different encodings.

It is a mistake to assume only a single "globally" used encoding, regardless of which encodig this is. So assuming UTF-8 is also a mistake.

Maybe my conclusion is not yet understood:

UNICODE is an extension to the range of ISO-8859-1 and software that supports UNICODE should be written in a way to support that. My pager p(1) does this. 

The only other method that currently could help is to install gettext () based translations to the english language, even in case that the source language is english as well. This is currently the only other widely supported method to permit a hidden iconv() call to the output.

BTW: other authors seem to have resignated with internationalization and just do not write their names correctly to avoid problems in similar cases.
Comment 12 Robert Clausecker 2021-09-02 15:47:11 UTC
Created attachment 227609 [details]
devel/schilybase: update to 2021-09-01 and fix PR #257905

This patch updates the schilytools to 2021-09-01 and fixes this bug by installing dummy message catalogues.

This patch contains an ISO-8859-1 encoded file.  When applying it, make sure that cksum(1) reports for devel/schilybase/files/SCHILY_utils.po:

    4279375820 364 devel/schilybase/files/SCHILY_utils.po

The patch also fixes a number of other unrelated issues:

 - add WWW to misc/schilytools/pkg-descr
 - use make loops and conditionals instead of shell loops
 - regenerate some patches to make them apply cleanly
 - add an NLS option to devel/schilybase (toggles dummy message catalogue generation)

Changes: https://sourceforge.net/projects/schilytools/files/AN-2021-09-01
Reported by: portscout

Tested with Poudriere on arm64 FreeBSD 13.
Comment 13 commit-hook freebsd_committer 2021-09-02 20:42:47 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=82ebe485f60306d21adec8fa5765522b3490df22

commit 82ebe485f60306d21adec8fa5765522b3490df22
Author:     Robert Clausecker <fuz@fuz.su>
AuthorDate: 2021-09-02 16:23:15 +0000
Commit:     Juraj Lutter <otis@FreeBSD.org>
CommitDate: 2021-09-02 20:42:20 +0000

    devel/schilybase: Update to 2021-09-01

    Update to 2021-09-01 and while here, also:
     - add WWW to misc/schilytools/pkg-descr
     - use make loops and conditionals instead of shell loops
     - regenerate some patches to make them apply cleanly
     - add an NLS option to devel/schilybase (toggles dummy message catalogue
       generation)

    PR:             257905
    Reported by:    bugzeo <kiboto6933@eyeremind.com>
    Differential Revision: https://reviews.freebsd.org/D31808

 devel/schilybase/Makefile                     | 23 ++++++++++++--
 devel/schilybase/Makefile.master              | 30 ++++++++----------
 devel/schilybase/distinfo                     |  6 ++--
 devel/schilybase/files/SCHILY_utils.po (new)  | 14 +++++++++
 devel/schilybase/files/patch-compare_Makefile |  4 +--
 devel/schilybase/files/patch-mt_Makefile      |  8 ++---
 devel/schilybase/pkg-plist                    | 45 +++++++++++++++++++++++++++
 misc/schilytools/pkg-descr                    |  2 ++
 8 files changed, 104 insertions(+), 28 deletions(-)