Bug 50211

Summary: [patch] doc.docbook.mk: fix textfile creation
Product: Documentation Reporter: Jeroen Ruigrok van der Werven <asmodai>
Component: Books & ArticlesAssignee: freebsd-doc (Nobody) <doc>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: carlavilla
Priority: Normal    
Version: Latest   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff
none
file.diff none

Description Jeroen Ruigrok van der Werven 2003-03-23 13:50:11 UTC
A person might want to have the latest links installed, however, this messes
up textfile creation from html files since the latest links version does not
support -dump anymore.

Fix: Ditch links in favour of elinks, see attached patches.

How-To-Repeat: 
Install docproj, install ports/www/links.
Comment 1 Ceri Davies freebsd_committer freebsd_triage 2003-03-23 14:07:39 UTC
On Sun, Mar 23, 2003 at 02:41:18PM +0100, Jeroen Ruigrok van der Werven wrote:

> A person might want to have the latest links installed, however, this messes
> up textfile creation from html files since the latest links version does not
> support -dump anymore.

We discussed this on -doc a month or so ago, and were generally thinking of
going back to www/lynx, because this also gets localized text builds working.

Would you happen to know if elinks has this advantage too ?

Thanks,

Ceri
Comment 2 Jeroen Ruigrok/Asmodai 2003-03-23 17:03:54 UTC
-On [20030323 15:07], Ceri Davies (ceri@FreeBSD.org) wrote:
>We discussed this on -doc a month or so ago, and were generally thinking of
>going back to www/lynx, because this also gets localized text builds working.

Problem I had with lynx was that I was unable to make it parse
book.html-tex as text/html.
w3m has a -T flag for this, elinks just looks at the file itself, or
perhaps just assumes it is HTML.

>Would you happen to know if elinks has this advantage too ?

It does, but I don't know for certain for which languages it all works:

elinks -dump -dump-charset iso-8859-15 http://www.paris.fr/

gives me accent aigus, accent circumflexes, etc.

I would be interested in hearing about non-Latin-based examples and how
they work out.

-- 
Jeroen Ruigrok van der Werven <asmodai(at)wxs.nl> / asmodai / a capoeirista
PGP fingerprint: 2D92 980E 45FE 2C28 9DB7  9D88 97E6 839B 2EAC 625B
http://www.tendra.org/   | http://www.in-nomine.org/~asmodai/diary/
A kiss is a lovely trick designed by nature to stop speech when words
become superfluous...
Comment 3 Jeroen Ruigrok van der Werven 2007-05-13 15:59:23 UTC
A long overdue update I guess.

Neither links or elinks will help for the multibyte environments of Chinese,
Japanese, Korean and the likes. They simply do not understand encodings such
as EucJP, SJIS, GB18030, GB2312, EucKR, or UTF-8.

Using www/w3m-m17n I can at least view Japanese pages.
Using a 'w3m -dump http://website > dump.txt' of a EucJP encoded page the
resulting file is an UTF-8 encoded plain text file.

The same also works for (X-)SJIS (Japanese), GB2312 (Chinese/PRC), EucKR
(Korean), UTF-8, TIS-620 (Thai), Big5 (Taiwanese), VISCII (Vietnamese), and
KOI8-U (Russian).

I tried some ISO-8859 dumps as well (8859-6 for example as well as -7) and it
all works fine.

So my suggestion is to change HTML2TXT to use w3m from w3m-m17n.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
ã¤ã§ã«ã¼ã³ ã©ã¦ãã­ã㯠ã´ã¡ã³ ãã« ã¦ã§ã«ã´ã§ã³
http://www.in-nomine.org/ | http://www.rangaku.org/
Reality is an illusion, grimmer. The dreamlands are like masks within
masks, and Time has no dominion beyond the Shroud...
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:24 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 5 Sergio Carlavilla Delgado freebsd_committer freebsd_triage 2021-04-01 20:45:20 UTC
Right now the website uses Hugo/AsciiDoc.