Bug 233252 - converters/iconv: different output on Linux
Summary: converters/iconv: different output on Linux
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-ports-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-16 12:54 UTC by Pascal Christen
Modified: 2023-10-02 19:37 UTC (History)
6 users (show)

See Also:
bugzilla: maintainer-feedback? (bland)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pascal Christen 2018-11-16 12:54:23 UTC
I asked the same question on the mailing-list without any response so let's try it here:

I have a problem with iconv (libc) or maybe it's by design on BSD:

FreeBSD11.2:
[test at server:~] % echo 'Ørnæréjö' | iconv -f utf-8 -t ascii//TRANSLIT
Ornaer'ej"o

macOS Mojave:
Pascals-MBP:~ pascalchristen$ echo 'Ørnæréjö' | iconv -f utf-8
-tascii//TRANSLIT
Ornaer'ej"o

Ubuntu 18.04:
test at DE-NUE01 ~ # echo 'Ørnæréjö' | iconv -f utf-8 -t ascii//TRANSLIT Ornaerejo


As you can see, on BSD some characters are with ' or " and on Linux it's
just as expected when using TRANSLIT. Any ideas?



https://lists.freebsd.org/pipermail/freebsd-bugs/2018-November/084192.html
Comment 1 Lorenzo Salvadore freebsd_committer freebsd_triage 2018-11-16 13:05:38 UTC
You can check if the iconv is from base system or from ports by running "which iconv". If the output is "/usr/bin/iconv" it is from base system.

Do you see the same behavior with iconv from base and from ports?
Comment 2 Pascal Christen 2018-11-16 13:14:42 UTC
(In reply to Lorenzo Salvadore from comment #1)
No, it's not the same output from the base & port:

[root@s12:~] # echo 'Ørnæréjö' | /usr/local/bin/iconv -f utf-8 -t ascii//TRANSLIT
Ornaer'ej"o
[root@s12:~] # echo 'Ørnæréjö' | /usr/bin/iconv -f utf-8 -t ascii//TRANSLIT
?rn?r?j?
iconv: warning: invalid characters: 4
Comment 3 Lorenzo Salvadore freebsd_committer freebsd_triage 2018-11-16 15:23:59 UTC
I found that /usr/local/bin/iconv is istalled by converters/libiconv, not by converters/iconv.

converters/iconv installs /usr/local/bin/biconv instead.

I confirm Pascal's tests: I receive the same outputs. As for biconv, my output is "biconv: unable to open specified converter: no such file or directory". The biconv manpage says that a ICONV_PATH environment affects the behavior of biconv but I have not found anything about it.

I CC gnome@FreeBSD.org, the maintainers for converters/libiconv. Maybe the summary of this bug should be changed to include all iconv versions.
Comment 4 Jan Beich freebsd_committer freebsd_triage 2018-11-17 01:52:44 UTC
(In reply to Pascal Christen from comment #0)
> As you can see, on BSD some characters are with ' or " and on Linux it's
> just as expected when using TRANSLIT. Any ideas?

iconv() from GNU libc (unlike GNU libiconv) transliterates based on locale data. To do the same on FreeBSD you may need ICU e.g.,

$ pkg install icu
$ echo "Ørnæréjö" | uconv -f utf-8 -x ascii
Ornaerejo

(In reply to Pascal Christen from comment #2)
> $ echo 'Ørnæréjö' | /usr/bin/iconv -f utf-8 -t ascii//TRANSLIT
> ?rn?r?j?
> iconv: warning: invalid characters: 4

//TRANSLIT is not supported by iconv() from base. Ports that depend on it should define USES=iconv:translit and adjust CONFIGURE_ARGS (or similar) to use GNU libiconv symbols instead. How to switch to ICU for better transliteration support doesn't appear to be documented.

https://www.freebsd.org/doc/en/books/porters-handbook/using-iconv.html
https://www.freebsd.org/doc/en/books/porters-handbook/uses-iconv.html

(In reply to Pascal Christen from comment #0)
> I asked the same question on the mailing-list without any response

freebsd-bugs@ mailing list is filled with bugzilla notifications, so regular questions are often lost in the noise. Better try freebsd-questions@ or pick a more specific one but activity may vary.
Comment 5 Pascal Christen 2019-03-15 13:33:56 UTC
(In reply to Jan Beich from comment #4)

Thanks for that explanation. 

> How to switch to ICU for better transliteration support 
> doesn't appear to be documented.

Any idea how we can achieve that? 

Greetings Pascal
Comment 6 bruno 2023-07-02 17:53:11 UTC
Regarding the iconv implementation from "ports", that is, GNU libiconv, this behaviour is intended:
1) Transliterations don't depend on the locale in GNU libiconv (as opposed to GNU libc).
2) Latin letters with diacritics or accents are approximated. 'e stands for e with acute accent, "o stands for o with diaeresis, and so on. It loses less information than if the diacritics and accents were dropped.

For more info, please write to bug-gnu-libiconv at gnu.org.
Comment 7 Rene Ladan freebsd_committer freebsd_triage 2023-10-02 19:37:30 UTC
Maintainer reset.