Summary: | /usr/bin/vi conversion error on valid character | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | lampa | ||||||||
Component: | bin | Assignee: | Yuri Pankov <yuripv> | ||||||||
Status: | Closed FIXED | ||||||||||
Severity: | Affects Many People | CC: | bjornr, rfg-freebsd, yuripv | ||||||||
Priority: | --- | Keywords: | easy, needs-qa, patch | ||||||||
Version: | CURRENT | Flags: | koobs:
mfc-stable10?
koobs: mfc-stable9? |
||||||||
Hardware: | Any | ||||||||||
OS: | Any | ||||||||||
Attachments: |
|
Description
lampa
2015-08-13 12:43:30 UTC
Looking at /usr/src/contrib/nvi/common/exf.c file_encinit(SCR *sp) ... if (looks_utf8(buf, blen) > 1) o_set(sp, O_FILEENCODING, OS_STRDUP, "utf-8", 0); else if (!O_ISSET(sp, O_FILEENCODING) || !strncasecmp(O_STR(sp, O_FILEENCODING), "utf-8", 5)) o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0); conv_enc(sp, O_FILEENCODING, 0); } 1. There is no way how to disable auto detection of encoding, if looks_utf8() returns 2, then there you are lost!!! You can setup your .exrc, but it will be ignored!!! 2. But why looks_utf() detects 0xe1 0x20 as valid utf-8? IT IS NOT VALID! Looking at /usr/src/contrib/nvi/common/encoding.c looks_utf8(const char *ibuf, size_t nbytes) ... for (n = 0; n < following; n++) { i++; if (i >= nbytes) goto done; if (buf[i] & 0x40) /* 10xxxxxx */ return -1; } That's completely wrong, it doesn't test if bit 7 is set in succeeding bytes! It should be: for (n = 0; n < following; n++) { i++; if (i >= nbytes) goto done; if ((buf[i] & 0xc0) != 0x10) /* 10xxxxxx */ return -1; } This change is was tested and works. Please fix at least broken "auto detection" before 10.2-RELEASE! But some option to disable auto-detection or honor user setting in .exrc is also required. @lampa, can you attach your proposed change in unified diff format against HEAD please? Created attachment 165486 [details]
patch
Created attachment 165487 [details]
correct patch
there is a typo in description (0x10 should be 0x80) and I've copied it in the first patch, this patch is the proper one.
*** Bug 203040 has been marked as a duplicate of this bug. *** A commit references this bug: Author: yuripv Date: Mon Nov 26 15:33:56 UTC 2018 New revision: 340976 URL: https://svnweb.freebsd.org/changeset/base/340976 Log: vi: fix UTF-8 detection. PR: 202290 Submitted by: lampa@fit.vutbr.cz Reviewed by: bapt Approved by: kib (mentor, implicit) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D17950 Changes: head/contrib/nvi/common/encoding.c A commit references this bug: Author: yuripv Date: Thu Nov 29 15:05:47 UTC 2018 New revision: 341234 URL: https://svnweb.freebsd.org/changeset/base/341234 Log: MFC r340976: vi: fix UTF-8 detection. PR: 202290 Submitted by: lampa@fit.vutbr.cz Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D17950 Changes: _U stable/12/ stable/12/contrib/nvi/common/encoding.c A commit references this bug: Author: yuripv Date: Thu Nov 29 15:07:59 UTC 2018 New revision: 341235 URL: https://svnweb.freebsd.org/changeset/base/341235 Log: MFC r340976: vi: fix UTF-8 detection. PR: 202290 Submitted by: lampa@fit.vutbr.cz Reviewed by: bapt Differential Revision: https://reviews.freebsd.org/D17950 Changes: _U stable/11/ stable/11/contrib/nvi/common/encoding.c Created attachment 214569 [details]
Perhaps this is a test case for the same (already fixed?) bug?
|