Created attachment 149493 [details] patch This is the library that is able to guess what encoding the text is in. Despite the gradual demise of non-UTF-* encodings, this library is very useful for programs that need to guess the encoding of some legacy text data (like subtitles in video players like mplayer, etc). This codebase was used for a long time within the Firefox browser (their "guess encoding" feature).
Created attachment 149494 [details] poudriere log
The related (by functionality) project is converters/enca It detects mostly European charsets. converters/ might be another category to consider, except textproc/uchardet doesn't convert anything, only outputs the detected charset type.
A commit references this bug: Author: pawel Date: Sat Dec 6 14:47:04 UTC 2014 New revision: 374113 URL: https://svnweb.freebsd.org/changeset/ports/374113 Log: uchardet is a C language binding of the original C++ implementation of the universal charset detection library by Mozilla. WWW: https://code.google.com/p/uchardet/ PR: 195083 Submitted by: Yuri Victorovich <yuri@rawbw.com> Changes: head/textproc/Makefile head/textproc/uchardet/ head/textproc/uchardet/Makefile head/textproc/uchardet/distinfo head/textproc/uchardet/files/ head/textproc/uchardet/files/patch-CMakeLists.txt head/textproc/uchardet/pkg-descr head/textproc/uchardet/pkg-plist
A commit references this bug: Author: pawel Date: Sat Dec 6 14:51:11 UTC 2014 New revision: 46066 URL: https://svnweb.freebsd.org/changeset/doc/46066 Log: For textproc/uchardet PR: 195083 Changes: head/en_US.ISO8859-1/articles/contributors/contrib.additional.xml