uname -a FreeBSD xxxx 10.2-PRERELEASE FreeBSD 10.2-PRERELEASE #8 r286147: export LANG=cs_CZ.ISO8859-2 echo "รก " >/tmp/x # a acute (0xE1) and space /usr/bin/vi /tmp/x Conversion error on line 1; hd /tmp/x 00000000 e1 20 0a It happens only when "a acute" is succeeded by "space".
Looking at /usr/src/contrib/nvi/common/exf.c file_encinit(SCR *sp) ... if (looks_utf8(buf, blen) > 1) o_set(sp, O_FILEENCODING, OS_STRDUP, "utf-8", 0); else if (!O_ISSET(sp, O_FILEENCODING) || !strncasecmp(O_STR(sp, O_FILEENCODING), "utf-8", 5)) o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0); conv_enc(sp, O_FILEENCODING, 0); } 1. There is no way how to disable auto detection of encoding, if looks_utf8() returns 2, then there you are lost!!! You can setup your .exrc, but it will be ignored!!! 2. But why looks_utf() detects 0xe1 0x20 as valid utf-8? IT IS NOT VALID! Looking at /usr/src/contrib/nvi/common/encoding.c looks_utf8(const char *ibuf, size_t nbytes) ... for (n = 0; n < following; n++) { i++; if (i >= nbytes) goto done; if (buf[i] & 0x40) /* 10xxxxxx */ return -1; } That's completely wrong, it doesn't test if bit 7 is set in succeeding bytes! It should be: for (n = 0; n < following; n++) { i++; if (i >= nbytes) goto done; if ((buf[i] & 0xc0) != 0x10) /* 10xxxxxx */ return -1; } This change is was tested and works. Please fix at least broken "auto detection" before 10.2-RELEASE! But some option to disable auto-detection or honor user setting in .exrc is also required.
@lampa, can you attach your proposed change in unified diff format against HEAD please?
Created attachment 165486 [details] patch
Created attachment 165487 [details] correct patch there is a typo in description (0x10 should be 0x80) and I've copied it in the first patch, this patch is the proper one.
*** Bug 203040 has been marked as a duplicate of this bug. ***
A commit references this bug: Author: yuripv Date: Mon Nov 26 15:33:56 UTC 2018 New revision: 340976 URL: https://svnweb.freebsd.org/changeset/base/340976 Log: vi: fix UTF-8 detection. PR: 202290 Submitted by: lampa@fit.vutbr.cz Reviewed by: bapt Approved by: kib (mentor, implicit) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D17950 Changes: head/contrib/nvi/common/encoding.c
A commit references this bug: Author: yuripv Date: Thu Nov 29 15:05:47 UTC 2018 New revision: 341234 URL: https://svnweb.freebsd.org/changeset/base/341234 Log: MFC r340976: vi: fix UTF-8 detection. PR: 202290 Submitted by: lampa@fit.vutbr.cz Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D17950 Changes: _U stable/12/ stable/12/contrib/nvi/common/encoding.c
A commit references this bug: Author: yuripv Date: Thu Nov 29 15:07:59 UTC 2018 New revision: 341235 URL: https://svnweb.freebsd.org/changeset/base/341235 Log: MFC r340976: vi: fix UTF-8 detection. PR: 202290 Submitted by: lampa@fit.vutbr.cz Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D17950 Changes: _U stable/11/ stable/11/contrib/nvi/common/encoding.c
Created attachment 214569 [details] Perhaps this is a test case for the same (already fixed?) bug?