Bug 272384 - The iconv converter from GB18030 to UTF-8 is broken
Summary: The iconv converter from GB18030 to UTF-8 is broken
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.2-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-05 14:27 UTC by bruno
Modified: 2023-07-16 13:12 UTC (History)
0 users

See Also:


Attachments
mapping table extractor (4.19 KB, text/plain)
2023-07-05 14:28 UTC, bruno
no flags Details
actual GB18030.TXT (compressed) (64.59 KB, application/x-xz)
2023-07-05 14:29 UTC, bruno
no flags Details
expected GB18030.TXT for 2005 version (compressed) (867.04 KB, application/x-xz)
2023-07-05 14:29 UTC, bruno
no flags Details
expected GB18030.TXT for 2022 version (compressed) (821.20 KB, application/x-xz)
2023-07-05 14:30 UTC, bruno
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description bruno 2023-07-05 14:27:34 UTC
The iconv converter from GB18030 to UTF-8 is broken: It maps only 63486 characters. It should map 1112064 characters. All valid Unicode code points (U+0000..U+D7FF, U+E000..U+10FFFF) are representable in GB18030. See https://en.wikipedia.org/wiki/GB_18030#Mapping for details.

How to reproduce:
$ cc -Wall -o table-from table-from.c
$ ./table-from GB18030 > GB18030.TXT

Actual output: see actual-GB18030.TXT

Expected output: one of expected-GB18030-2005.TXT (for a GB18030:2005 compliant converter) or expected-GB18030-2022.TXT (for a GB18030:2022 compliant converter).
Comment 1 bruno 2023-07-05 14:28:21 UTC
Created attachment 243269 [details]
mapping table extractor
Comment 2 bruno 2023-07-05 14:29:11 UTC
Created attachment 243270 [details]
actual GB18030.TXT (compressed)
Comment 3 bruno 2023-07-05 14:29:51 UTC
Created attachment 243271 [details]
expected GB18030.TXT for 2005 version (compressed)
Comment 4 bruno 2023-07-05 14:30:16 UTC
Created attachment 243272 [details]
expected GB18030.TXT for 2022 version (compressed)