A person might want to have the latest links installed, however, this messes up textfile creation from html files since the latest links version does not support -dump anymore. Fix: Ditch links in favour of elinks, see attached patches. How-To-Repeat: Install docproj, install ports/www/links.
On Sun, Mar 23, 2003 at 02:41:18PM +0100, Jeroen Ruigrok van der Werven wrote: > A person might want to have the latest links installed, however, this messes > up textfile creation from html files since the latest links version does not > support -dump anymore. We discussed this on -doc a month or so ago, and were generally thinking of going back to www/lynx, because this also gets localized text builds working. Would you happen to know if elinks has this advantage too ? Thanks, Ceri
-On [20030323 15:07], Ceri Davies (ceri@FreeBSD.org) wrote: >We discussed this on -doc a month or so ago, and were generally thinking of >going back to www/lynx, because this also gets localized text builds working. Problem I had with lynx was that I was unable to make it parse book.html-tex as text/html. w3m has a -T flag for this, elinks just looks at the file itself, or perhaps just assumes it is HTML. >Would you happen to know if elinks has this advantage too ? It does, but I don't know for certain for which languages it all works: elinks -dump -dump-charset iso-8859-15 http://www.paris.fr/ gives me accent aigus, accent circumflexes, etc. I would be interested in hearing about non-Latin-based examples and how they work out. -- Jeroen Ruigrok van der Werven <asmodai(at)wxs.nl> / asmodai / a capoeirista PGP fingerprint: 2D92 980E 45FE 2C28 9DB7 9D88 97E6 839B 2EAC 625B http://www.tendra.org/ | http://www.in-nomine.org/~asmodai/diary/ A kiss is a lovely trick designed by nature to stop speech when words become superfluous...
A long overdue update I guess. Neither links or elinks will help for the multibyte environments of Chinese, Japanese, Korean and the likes. They simply do not understand encodings such as EucJP, SJIS, GB18030, GB2312, EucKR, or UTF-8. Using www/w3m-m17n I can at least view Japanese pages. Using a 'w3m -dump http://website > dump.txt' of a EucJP encoded page the resulting file is an UTF-8 encoded plain text file. The same also works for (X-)SJIS (Japanese), GB2312 (Chinese/PRC), EucKR (Korean), UTF-8, TIS-620 (Thai), Big5 (Taiwanese), VISCII (Vietnamese), and KOI8-U (Russian). I tried some ISO-8859 dumps as well (8859-6 for example as well as -7) and it all works fine. So my suggestion is to change HTML2TXT to use w3m from w3m-m17n. -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai ã¤ã§ã«ã¼ã³ ã©ã¦ããã㯠ã´ã¡ã³ ãã« ã¦ã§ã«ã´ã§ã³ http://www.in-nomine.org/ | http://www.rangaku.org/ Reality is an illusion, grimmer. The dreamlands are like masks within masks, and Time has no dominion beyond the Shroud...
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
Right now the website uses Hugo/AsciiDoc.