Bug 272334 - Misleading 'iconv -l' output
Summary: Misleading 'iconv -l' output
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.2-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-standards (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-02 17:45 UTC by bruno
Modified: 2023-07-16 13:13 UTC (History)
0 users

See Also:


Attachments
Program that extracts the conversion table of an encoding (4.19 KB, text/plain)
2023-07-04 20:38 UTC, bruno
no flags Details
Conversion table of ISO-8859-1 (3.00 KB, text/plain)
2023-07-04 20:39 UTC, bruno
no flags Details
Conversion table of ISO-8859-10 (3.00 KB, text/plain)
2023-07-04 20:40 UTC, bruno
no flags Details
Conversion table of ISO-8859-11 (2.91 KB, text/plain)
2023-07-04 20:40 UTC, bruno
no flags Details
Conversion table of ISO-8859-13 (3.00 KB, text/plain)
2023-07-04 20:41 UTC, bruno
no flags Details
Conversion table of ISO-8859-14 (3.00 KB, text/plain)
2023-07-04 20:42 UTC, bruno
no flags Details
Conversion table of ISO-8859-15 (3.00 KB, text/plain)
2023-07-04 20:43 UTC, bruno
no flags Details
Conversion table of ISO-8859-16 (3.00 KB, text/plain)
2023-07-04 20:43 UTC, bruno
no flags Details
Conversion table of ARMSCII-8 (3.00 KB, text/plain)
2023-07-04 20:44 UTC, bruno
no flags Details
Conversion table of ARMSCII-8A (2.48 KB, text/plain)
2023-07-04 20:44 UTC, bruno
no flags Details
Conversion table of BIG5-E (270.51 KB, text/plain)
2023-07-04 20:45 UTC, bruno
no flags Details
Conversion table of BIG-5 (270.51 KB, text/plain)
2023-07-04 20:45 UTC, bruno
no flags Details
Conversion table of CP942 (127.29 KB, text/plain)
2023-07-04 20:51 UTC, bruno
no flags Details
Conversion table of CP942C (127.29 KB, text/plain)
2023-07-04 20:51 UTC, bruno
no flags Details
Conversion table of CP943 (133.54 KB, text/plain)
2023-07-04 20:51 UTC, bruno
no flags Details
Conversion table of CP943C (133.54 KB, text/plain)
2023-07-04 20:52 UTC, bruno
no flags Details
Conversion table of ISO646-CA (1.50 KB, text/plain)
2023-07-04 20:52 UTC, bruno
no flags Details
Conversion table of ISO646-CA2 (1.50 KB, text/plain)
2023-07-04 20:53 UTC, bruno
no flags Details
Conversion table of ISO646-ES (1.50 KB, text/plain)
2023-07-04 20:53 UTC, bruno
no flags Details
Conversion table of ISO646-ES2 (1.50 KB, text/plain)
2023-07-04 20:54 UTC, bruno
no flags Details
Conversion table of ISO646-FR (1.50 KB, text/plain)
2023-07-04 20:54 UTC, bruno
no flags Details
Conversion table of ISO646-FR1 (1.50 KB, text/plain)
2023-07-04 20:55 UTC, bruno
no flags Details
Conversion table of ISO646-NO (1.50 KB, text/plain)
2023-07-04 20:55 UTC, bruno
no flags Details
Conversion table of ISO646-NO2 (1.50 KB, text/plain)
2023-07-04 20:56 UTC, bruno
no flags Details
Conversion table of ISO646-PT (1.50 KB, text/plain)
2023-07-04 20:56 UTC, bruno
no flags Details
Conversion table of ISO646-PT2 (1.50 KB, text/plain)
2023-07-04 20:56 UTC, bruno
no flags Details
Conversion table of ISO646-SE2 (1.50 KB, text/plain)
2023-07-04 20:58 UTC, bruno
no flags Details
Conversion table of KOI8-R (3.00 KB, text/plain)
2023-07-04 20:58 UTC, bruno
no flags Details
Conversion table of KOI8-RU (3.00 KB, text/plain)
2023-07-04 20:59 UTC, bruno
no flags Details
Conversion table of MACROMAN (2.99 KB, text/plain)
2023-07-04 20:59 UTC, bruno
no flags Details
Conversion table of MACROMANIA (3.00 KB, text/plain)
2023-07-04 21:00 UTC, bruno
no flags Details
Conversion table of UTF-16BE (compressed) (480.77 KB, application/x-xz)
2023-07-04 21:03 UTC, bruno
no flags Details
Conversion table of UTF-16LE (compressed) (283.14 KB, application/x-xz)
2023-07-04 21:04 UTC, bruno
no flags Details
Conversion table of UTF-32BE (compressed) (16.75 KB, application/x-xz)
2023-07-04 21:04 UTC, bruno
no flags Details
Conversion table of UTF-32LE (compressed) (7.27 KB, application/x-xz)
2023-07-04 21:05 UTC, bruno
no flags Details
Conversion table of CP10029 (3.00 KB, text/plain)
2023-07-04 21:06 UTC, bruno
no flags Details
Conversion table of MACCENTEURO (3.00 KB, text/plain)
2023-07-04 21:07 UTC, bruno
no flags Details
Conversion table of WINDOWS-874 (2.73 KB, text/plain)
2023-07-04 21:08 UTC, bruno
no flags Details
Conversion table of CP1162 (2.91 KB, text/plain)
2023-07-04 21:08 UTC, bruno
no flags Details
Conversion table of CP874 (2.73 KB, text/plain)
2023-07-04 21:09 UTC, bruno
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description bruno 2023-07-02 17:45:43 UTC
'iconv -l' prints all the supported encoding names and alias names.

For most of the encodings, it prints one line that contains the primary name first, followed by all the aliases of that encoding. So, I assumed that this is the case throughout the output. But that is not the case!

The output contains one huge line, 605 characters long, starting with "ISO-8859-1":

========================================================================
ISO-8859-1 CP819 CSISOLATIN1 IBM819 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1 CSISOLATIN6 ISO-8859-10 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6 ISO-8859-11 ISO-IR-166 ISO8859-11 ISO_8859-11 TIS-620 TIS.2533-1 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0 ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 ISO_8859-13:1998 L7 LATIN7 ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8 CP923 IBM923 ISO-8859-15 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 L9 LATIN9 ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10
========================================================================

So, it looks like ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16 are all aliases of ISO-8859-1 ! But that is not the case. To make things correct, this line of output should replaced with the following seven lines:

========================================================================
ISO-8859-1 CP819 CSISOLATIN1 IBM819 ISO-IR-100 ISO8859-1 ISO_8859-1 ISO_8859-1:1987 L1 LATIN1
ISO-8859-10 CSISOLATIN6 ISO-IR-157 ISO8859-10 ISO_8859-10 ISO_8859-10:1992 L6 LATIN6
ISO-8859-11 ISO-IR-166 ISO8859-11 ISO_8859-11 TIS-620 TIS.2533-1 TIS620 TIS620-0 TIS620.2529-1 TIS620.2533-0
ISO-8859-13 ISO-IR-179 ISO8859-13 ISO_8859-13 ISO_8859-13:1998 L7 LATIN7
ISO-8859-14 ISO-CELTIC ISO-IR-199 ISO8859-14 ISO_8859-14 ISO_8859-14:1998 L8 LATIN8
ISO-8859-15 CP923 IBM923 ISO-IR-203 ISO8859-15 ISO_8859-15 ISO_8859-15:1998 L9 LATIN9
ISO-8859-16 ISO-IR-226 ISO8859-16 ISO_8859-16 ISO_8859-16:2001 L10 LATIN10
========================================================================
Comment 1 bruno 2023-07-04 20:34:35 UTC
The description contains just the first among 20 issues with the 'iconv -l' output.
Here are the further ones:

2) The line
=====================================================================
ARMSCII-8 AST166-8 AST_34.002 ARMSCII-8A AST166-A AST_34.002_A
=====================================================================
should be split into two lines, because ARMSCII-8 and ARMSCII-8A are
different encodings:
=====================================================================
ARMSCII-8 AST166-8 AST_34.002
ARMSCII-8A AST166-A AST_34.002_A
=====================================================================

3) The line
=====================================================================
BIG5-E BIG5E BIG-5 BIG-FIVE BIG5 BIG5-ETEN BIG5ETEN BIGFIVE CN-BIG5 CSBIG5
=====================================================================
should be split into two lines, because BIG5-E and BIG-5 are
different encodings:
=====================================================================
BIG5-E BIG5E
BIG-5 BIG-FIVE BIG5 BIG5-ETEN BIG5ETEN BIGFIVE CN-BIG5 CSBIG5
=====================================================================

4) The line
=====================================================================
CP942 942 IBM942 942C CP942C IBM942C
=====================================================================
should be split into two lines, because CP942 and CP942C are
different encodings:
=====================================================================
CP942 942 IBM942
CP942C 942C IBM942C
=====================================================================

5) The line
=====================================================================
CP943 943 IBM943 943C CP943C IBM943C
=====================================================================
should be split into two lines, because CP943 and CP943C are
different encodings:
=====================================================================
CP943 943 IBM943
CP943C 943C IBM943C
=====================================================================

6) The line
=====================================================================
ISO646-CA CA CSA7-1 CSA_Z243.4-1985-1 ISO-IR-121 CSA7-2 CSA_Z243.4-1985-2 ISO-IR-122 ISO646-CA2
=====================================================================
should be split into two lines, because ISO646-CA and ISO646-CA2 are
different encodings:
=====================================================================
ISO646-CA CA CSA7-1 CSA_Z243.4-1985-1 ISO-IR-121
ISO646-CA2 CSA7-2 CSA_Z243.4-1985-2 ISO-IR-122
=====================================================================

7) The line
=====================================================================
ISO646-ES ES ISO-IR-17 ES2 ISO-IR-85 ISO646-ES2
=====================================================================
should be split into two lines, because ISO646-ES and ISO646-ES2 are
different encodings:
=====================================================================
ISO646-ES ES ISO-IR-17
ISO646-ES2 ES2 ISO-IR-85
=====================================================================

8) The line
=====================================================================
ISO646-FR FR ISO-IR-69 NF_Z_62-010 ISO-IR-25 ISO646-FR1 NF_Z_62-010_(1973)
=====================================================================
should be split into two lines, because ISO646-FR and ISO646-FR1 are
different encodings:
=====================================================================
ISO646-FR FR ISO-IR-69 NF_Z_62-010
ISO646-FR1 ISO-IR-25 NF_Z_62-010_(1973)
=====================================================================

9) The line
=====================================================================
ISO646-NO ISO-IR-60 NO NS_4551-1 ISO-IR-61 ISO646-NO2 NO2 NS_4551-2
=====================================================================
should be split into two lines, because ISO646-NO and ISO646-NO2 are
different encodings:
=====================================================================
ISO646-NO ISO-IR-60 NO NS_4551-1
ISO646-NO2 ISO-IR-61 NO2 NS_4551-2
=====================================================================

10) The line
=====================================================================
ISO646-PT ISO-IR-16 PT ISO-IR-84 ISO646-PT2 PT2
=====================================================================
should be split into two lines, because ISO646-PT and ISO646-PT2 are
different encodings:
=====================================================================
ISO646-PT ISO-IR-16 PT
ISO646-PT2 ISO-IR-84 PT2
=====================================================================

11) The line
=====================================================================
ISO646-SE FI ISO-IR-10 ISO646-FI SE SEN_850200_B ISO-IR-11 ISO646-SE2 SE2 SEN_850200_C
=====================================================================
should be split into two lines, because ISO646-SE and ISO646-SE2 are
different encodings:
=====================================================================
ISO646-SE FI ISO-IR-10 ISO646-FI SE SEN_850200_B
ISO646-SE2 ISO-IR-11 SE2 SEN_850200_C
=====================================================================

12) The line
=====================================================================
KOI8-R KOI8-RU
=====================================================================
should be split into two lines, because KOI8-R and KOI8-RU are
different encodings:
=====================================================================
KOI8-R
KOI8-RU
=====================================================================

13) The line
=====================================================================
MACROMAN CSMACINTOSH MAC MACINTOSH MACROMANIA MACROMANIAN
=====================================================================
should be split into two lines, because MACROMAN and MACROMANIA are
different encodings:
=====================================================================
MACROMAN CSMACINTOSH MAC MACINTOSH
MACROMANIA MACROMANIAN
=====================================================================

14) The line
=====================================================================
UTF-16 UNICODE UTF16 CSUNICODE CSUNICODE11 ISO-10646-UCS-2 UCS-2 UCS-2BE UNICODE-1-1 UNICODEBIG UTF-16BE UTF16BE UCS-2LE UNICODELITTLE UTF-16LE UTF16LE
=====================================================================
should be split into two lines, because UTF-16BE and UTF-16LE are
different encodings:
=====================================================================
UTF-16 UNICODE UTF16 CSUNICODE CSUNICODE11 ISO-10646-UCS-2 UCS-2 UCS-2BE UNICODE-1-1 UNICODEBIG UTF-16BE UTF16BE
UCS-2LE UNICODELITTLE UTF-16LE UTF16LE
=====================================================================

15) The line
=====================================================================
UTF-32 CSUCS4 ISO-10646-UCS-4 UCS-4 UCS-4BE UTF-32BE UTF32BE UCS-4LE UTF-32LE UTF32LE
=====================================================================
should be split into two lines, because UTF-32BE and UTF-32LE are
different encodings:
=====================================================================
UTF-32 CSUCS4 ISO-10646-UCS-4 UCS-4 UCS-4BE UTF-32BE UTF32BE
UCS-4LE UTF-32LE UTF32LE
=====================================================================

16) The lines
=====================================================================
CP10029 10029 CP10029_MACLATIN2
MACCENTEURO MACCENTRALEUROPE
=====================================================================
should be joined into a single line, because these encodings are identical:
=====================================================================
CP10029 10029 CP10029_MACLATIN2 MACCENTEURO MACCENTRALEUROPE
=====================================================================

17) The entry ISO646-BASIC@1983 should be removed, since iconv_open returns EINVAL for it.
Then, among the the lines
=====================================================================
ISO646-BASIC:1983 ISO_646.BASIC:1983 REF REF
ISO646-BASIC:1983
=====================================================================
the second one should be removed, since it is part of the first line:
=====================================================================
ISO646-BASIC:1983 ISO_646.BASIC:1983 REF REF
=====================================================================

18) The entry ISO646-IRV@1983 should be removed, since iconv_open returns EINVAL for it.
Then, among the the lines
=====================================================================
ISO646-IRV:1983 IRV ISO-IR-2
ISO646-IRV:1983
=====================================================================
the second one should be removed, since it is part of the first line:
=====================================================================
ISO646-IRV:1983 IRV ISO-IR-2
=====================================================================

19) The entry JISX0208@1990 should be removed, since iconv_open returns EINVAL for it.
Then, among the the lines
=====================================================================
JISX0208:1990 CSISO87JISX0208 ISO-IR-87 JIS0208 JISX0208-1990 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 JIS_X0208:1990 X0208
JISX0208:1990
=====================================================================
the second one should be removed, since it is part of the first line:
=====================================================================
JISX0208:1990 CSISO87JISX0208 ISO-IR-87 JIS0208 JISX0208-1990 JIS_C6226-1983 JIS_X0208 JIS_X0208-1983 JIS_X0208-1990 JIS_X0208:1990 X0208
=====================================================================

20) The entry WINDOWS-874 occurs in two different lines:
=====================================================================
CP1162 1162 CSIBM1162 IBM-1162 IBM1162 MSCP874 WINDOWS-874
CP874 874 IBM874 WINDOWS-874
=====================================================================
It should be removed from the first line, since the WINDOWS-874 encoding is identical to CP874 and different from CP1162:
=====================================================================
CP1162 1162 CSIBM1162 IBM-1162 IBM1162 MSCP874
CP874 874 IBM874 WINDOWS-874
=====================================================================

As proofs, I'm attaching the encoding tables, that I got by running e.g.
./test-from WINDOWS-874 > WINDOWS-874.TXT
Comment 2 bruno 2023-07-04 20:38:46 UTC
Created attachment 243206 [details]
Program that extracts the conversion table of an encoding
Comment 3 bruno 2023-07-04 20:39:49 UTC
Created attachment 243207 [details]
Conversion table of ISO-8859-1
Comment 4 bruno 2023-07-04 20:40:25 UTC
Created attachment 243208 [details]
Conversion table of ISO-8859-10
Comment 5 bruno 2023-07-04 20:40:52 UTC
Created attachment 243209 [details]
Conversion table of ISO-8859-11
Comment 6 bruno 2023-07-04 20:41:33 UTC
Created attachment 243210 [details]
Conversion table of ISO-8859-13
Comment 7 bruno 2023-07-04 20:42:20 UTC
Created attachment 243211 [details]
Conversion table of ISO-8859-14
Comment 8 bruno 2023-07-04 20:43:03 UTC
Created attachment 243212 [details]
Conversion table of ISO-8859-15
Comment 9 bruno 2023-07-04 20:43:46 UTC
Created attachment 243213 [details]
Conversion table of ISO-8859-16
Comment 10 bruno 2023-07-04 20:44:18 UTC
Created attachment 243214 [details]
Conversion table of ARMSCII-8
Comment 11 bruno 2023-07-04 20:44:50 UTC
Created attachment 243215 [details]
Conversion table of ARMSCII-8A
Comment 12 bruno 2023-07-04 20:45:25 UTC
Created attachment 243216 [details]
Conversion table of BIG5-E
Comment 13 bruno 2023-07-04 20:45:55 UTC
Created attachment 243217 [details]
Conversion table of BIG-5
Comment 14 bruno 2023-07-04 20:51:01 UTC
Created attachment 243218 [details]
Conversion table of CP942
Comment 15 bruno 2023-07-04 20:51:30 UTC
Created attachment 243219 [details]
Conversion table of CP942C
Comment 16 bruno 2023-07-04 20:51:58 UTC
Created attachment 243220 [details]
Conversion table of CP943
Comment 17 bruno 2023-07-04 20:52:22 UTC
Created attachment 243221 [details]
Conversion table of CP943C
Comment 18 bruno 2023-07-04 20:52:53 UTC
Created attachment 243222 [details]
Conversion table of ISO646-CA
Comment 19 bruno 2023-07-04 20:53:21 UTC
Created attachment 243223 [details]
Conversion table of ISO646-CA2
Comment 20 bruno 2023-07-04 20:53:53 UTC
Created attachment 243225 [details]
Conversion table of ISO646-ES
Comment 21 bruno 2023-07-04 20:54:19 UTC
Created attachment 243226 [details]
Conversion table of ISO646-ES2
Comment 22 bruno 2023-07-04 20:54:45 UTC
Created attachment 243227 [details]
Conversion table of ISO646-FR
Comment 23 bruno 2023-07-04 20:55:10 UTC
Created attachment 243228 [details]
Conversion table of ISO646-FR1
Comment 24 bruno 2023-07-04 20:55:40 UTC
Created attachment 243229 [details]
Conversion table of ISO646-NO
Comment 25 bruno 2023-07-04 20:56:06 UTC
Created attachment 243230 [details]
Conversion table of ISO646-NO2
Comment 26 bruno 2023-07-04 20:56:33 UTC
Created attachment 243231 [details]
Conversion table of ISO646-PT
Comment 27 bruno 2023-07-04 20:56:56 UTC
Created attachment 243232 [details]
Conversion table of ISO646-PT2
Comment 28 bruno 2023-07-04 20:58:15 UTC
Created attachment 243233 [details]
Conversion table of ISO646-SE2
Comment 29 bruno 2023-07-04 20:58:51 UTC
Created attachment 243234 [details]
Conversion table of KOI8-R
Comment 30 bruno 2023-07-04 20:59:16 UTC
Created attachment 243235 [details]
Conversion table of KOI8-RU
Comment 31 bruno 2023-07-04 20:59:48 UTC
Created attachment 243237 [details]
Conversion table of MACROMAN
Comment 32 bruno 2023-07-04 21:00:38 UTC
Created attachment 243238 [details]
Conversion table of MACROMANIA
Comment 33 bruno 2023-07-04 21:03:41 UTC
Created attachment 243242 [details]
Conversion table of UTF-16BE (compressed)
Comment 34 bruno 2023-07-04 21:04:14 UTC
Created attachment 243243 [details]
Conversion table of UTF-16LE (compressed)
Comment 35 bruno 2023-07-04 21:04:42 UTC
Created attachment 243244 [details]
Conversion table of UTF-32BE (compressed)
Comment 36 bruno 2023-07-04 21:05:14 UTC
Created attachment 243245 [details]
Conversion table of UTF-32LE (compressed)
Comment 37 bruno 2023-07-04 21:06:50 UTC
Created attachment 243246 [details]
Conversion table of CP10029
Comment 38 bruno 2023-07-04 21:07:37 UTC
Created attachment 243247 [details]
Conversion table of MACCENTEURO
Comment 39 bruno 2023-07-04 21:08:05 UTC
Created attachment 243248 [details]
Conversion table of WINDOWS-874
Comment 40 bruno 2023-07-04 21:08:34 UTC
Created attachment 243249 [details]
Conversion table of CP1162
Comment 41 bruno 2023-07-04 21:09:03 UTC
Created attachment 243250 [details]
Conversion table of CP874