Bug 122524 - www/links1 uses 7-bit us-ascii codepage only when using "-dump"
Summary: www/links1 uses 7-bit us-ascii codepage only when using "-dump"
Status: Closed Feedback Timeout
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: Normal Affects Only Me
Assignee: Dmitry Sivachenko
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-07 11:40 UTC by Alexander Zagrebin
Modified: 2015-12-24 22:07 UTC (History)
2 users (show)

See Also:


Attachments
file.diff (1.97 KB, patch)
2008-04-07 11:40 UTC, Alexander Zagrebin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Zagrebin 2008-04-07 11:40:01 UTC
When running in the interactive mode, links 0.98 (www/links1) works fine.
But when it is used for dumping html page to stdout (links -dump ...), it always assumes us-ascii (7-bit) encoding for output.
So there are some problems, if html page uses non us-ascii encoding.
For example:
1. Some programs (mail/mutt, misc/mc etc.) can use the "links -dump ..." as html-to-text converter. When html has non us-ascii encoding, we are getting an unreadable output at most cases.
2. FreeBSD documentation project uses the links to convert html documentation to plain text version. So plain text documentation for, for example, ru_RU.KOI8-R is unreadable.

Fix: I have added -dump-codepage <codepage> command line parameter (see the patch).
It defines an output codepage, when links is running in the "dump" mode.
I use koi8-r encoding, and, with this patch applied, I can use links like
"links -dump -dump-codepage koi8-r source.html"



Patch attached with submission follows:
How-To-Repeat: Try to convert html source, containing 8-bit (or utf-8) characters, with
"links -dump source.html", and compare result with "links source.html"
The output from -dump will contain us-ascii characters only.
Comment 1 Edwin Groothuis freebsd_committer freebsd_triage 2008-04-07 11:40:11 UTC
Responsible Changed
From-To: freebsd-ports-bugs->demon

Over to maintainer (via the GNATS Auto Assign Tool)
Comment 2 Dmitry Sivachenko freebsd_committer freebsd_triage 2008-05-20 09:44:29 UTC
Why you can't use www/links port for that purpose?

It seems that links version 2 already contains the required functionality.

Thanks.
Comment 3 Alexander Zagrebin 2008-05-20 10:18:50 UTC
There are no reasons...
But FreeBSD Documentation Project uses links1 to build text-only
documentation and
so we have almost unreadable 7-bit russian koi8-r docs.
This problem exists some years...

-- 
Alexander Zagrebin

> -----Original Message-----
> From: Dmitry Sivachenko [mailto:demon@FreeBSD.org] 
> Sent: Tuesday, May 20, 2008 12:44 PM
> To: bug-followup@FreeBSD.org
> Cc: alexz@visp.ru
> Subject: Re: ports/122524: www/links1 uses 7-bit us-ascii codepage
> 
> Why you can't use www/links port for that purpose?
> 
> It seems that links version 2 already contains the required 
> functionality.
> 
> Thanks.
Comment 4 Carlo Strub freebsd_committer freebsd_triage 2014-09-07 17:47:31 UTC
Is this PR still relevant?
Comment 5 Jason Unovitch freebsd_committer freebsd_triage 2015-12-24 22:07:49 UTC
Closing based on feedback timeout (7 years).