A person might want to have the latest links installed, however, this messes
up textfile creation from html files since the latest links version does not
support -dump anymore.
Fix: Ditch links in favour of elinks, see attached patches.
Install docproj, install ports/www/links.
On Sun, Mar 23, 2003 at 02:41:18PM +0100, Jeroen Ruigrok van der Werven wrote:
> A person might want to have the latest links installed, however, this messes
> up textfile creation from html files since the latest links version does not
> support -dump anymore.
We discussed this on -doc a month or so ago, and were generally thinking of
going back to www/lynx, because this also gets localized text builds working.
Would you happen to know if elinks has this advantage too ?
-On [20030323 15:07], Ceri Davies (ceri@FreeBSD.org) wrote:
>We discussed this on -doc a month or so ago, and were generally thinking of
>going back to www/lynx, because this also gets localized text builds working.
Problem I had with lynx was that I was unable to make it parse
book.html-tex as text/html.
w3m has a -T flag for this, elinks just looks at the file itself, or
perhaps just assumes it is HTML.
>Would you happen to know if elinks has this advantage too ?
It does, but I don't know for certain for which languages it all works:
elinks -dump -dump-charset iso-8859-15 http://www.paris.fr/
gives me accent aigus, accent circumflexes, etc.
I would be interested in hearing about non-Latin-based examples and how
they work out.
Jeroen Ruigrok van der Werven <asmodai(at)wxs.nl> / asmodai / a capoeirista
PGP fingerprint: 2D92 980E 45FE 2C28 9DB7 9D88 97E6 839B 2EAC 625B
http://www.tendra.org/ | http://www.in-nomine.org/~asmodai/diary/
A kiss is a lovely trick designed by nature to stop speech when words
A long overdue update I guess.
Neither links or elinks will help for the multibyte environments of Chinese,
Japanese, Korean and the likes. They simply do not understand encodings such
as EucJP, SJIS, GB18030, GB2312, EucKR, or UTF-8.
Using www/w3m-m17n I can at least view Japanese pages.
Using a 'w3m -dump http://website > dump.txt' of a EucJP encoded page the
resulting file is an UTF-8 encoded plain text file.
The same also works for (X-)SJIS (Japanese), GB2312 (Chinese/PRC), EucKR
(Korean), UTF-8, TIS-620 (Thai), Big5 (Taiwanese), VISCII (Vietnamese), and
I tried some ISO-8859 dumps as well (8859-6 for example as well as -7) and it
all works fine.
So my suggestion is to change HTML2TXT to use w3m from w3m-m17n.
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
ã¤ã§ã«ã¼ã³ ã©ã¦ãããã¯ ã´ã¡ã³ ãã« ã¦ã§ã«ã´ã§ã³
http://www.in-nomine.org/ | http://www.rangaku.org/
Reality is an illusion, grimmer. The dreamlands are like masks within
masks, and Time has no dominion beyond the Shroud...
For bugs matching the following criteria:
Status: In Progress Changed: (is less than) 2014-06-01
Reset to default assignee and clear in-progress tags.
Mail being skipped