Bug 202290

Summary: /usr/bin/vi conversion error on valid character
Product: Base System Reporter: lampa
Component: binAssignee: Yuri Pankov <yuripv>
Status: Closed FIXED    
Severity: Affects Many People CC: bjornr, rfg-freebsd, yuripv
Priority: --- Keywords: easy, needs-qa, patch
Version: CURRENTFlags: koobs: mfc-stable10?
koobs: mfc-stable9?
Hardware: Any   
OS: Any   
Attachments:
Description Flags
patch
none
correct patch
none
Perhaps this is a test case for the same (already fixed?) bug? none

Description lampa 2015-08-13 12:43:30 UTC
uname -a
FreeBSD xxxx 10.2-PRERELEASE FreeBSD 10.2-PRERELEASE #8 r286147:

export LANG=cs_CZ.ISO8859-2
echo "รก " >/tmp/x               # a acute (0xE1) and space
/usr/bin/vi /tmp/x
Conversion error on line 1;

hd /tmp/x
00000000  e1 20 0a

It happens only when "a acute" is succeeded by "space".
Comment 1 lampa 2015-08-13 19:57:48 UTC
Looking at /usr/src/contrib/nvi/common/exf.c

file_encinit(SCR *sp)
...
        if (looks_utf8(buf, blen) > 1)
                o_set(sp, O_FILEENCODING, OS_STRDUP, "utf-8", 0);
        else if (!O_ISSET(sp, O_FILEENCODING) ||
            !strncasecmp(O_STR(sp, O_FILEENCODING), "utf-8", 5))
                o_set(sp, O_FILEENCODING, OS_STRDUP, codeset(), 0);
        conv_enc(sp, O_FILEENCODING, 0);
}

1. There is no way how to disable auto detection of encoding, if looks_utf8()
returns 2, then there you are lost!!! You can setup your .exrc, but it
will be ignored!!!

2. But why looks_utf() detects 0xe1 0x20 as valid utf-8? IT IS NOT VALID!

Looking at  /usr/src/contrib/nvi/common/encoding.c

looks_utf8(const char *ibuf, size_t nbytes)
...

                        for (n = 0; n < following; n++) {
                                i++;
                                if (i >= nbytes)
                                        goto done;

                                if (buf[i] & 0x40)      /* 10xxxxxx */
                                        return -1;
                        }

That's completely wrong, it doesn't test if bit 7 is set in succeeding bytes! It should be:

                        for (n = 0; n < following; n++) {
                                i++;
                                if (i >= nbytes)
                                        goto done;

                                if ((buf[i] & 0xc0) != 0x10)      /* 10xxxxxx */
                                        return -1;
                        }

This change is was tested and works.

Please fix at least broken "auto detection" before 10.2-RELEASE! But some option
to disable auto-detection or honor user setting in .exrc is also required.
Comment 2 Kubilay Kocak freebsd_committer freebsd_triage 2016-01-13 07:49:22 UTC
@lampa, can you attach your proposed change in unified diff format against HEAD please?
Comment 3 lampa 2016-01-13 09:44:42 UTC
Created attachment 165486 [details]
patch
Comment 4 lampa 2016-01-13 09:51:50 UTC
Created attachment 165487 [details]
correct patch

there is a typo in description (0x10 should be 0x80) and I've copied it in the first patch, this patch is the proper one.
Comment 5 Bjorn Robertsson 2016-01-15 13:50:08 UTC
*** Bug 203040 has been marked as a duplicate of this bug. ***
Comment 6 commit-hook freebsd_committer freebsd_triage 2018-11-26 15:34:52 UTC
A commit references this bug:

Author: yuripv
Date: Mon Nov 26 15:33:56 UTC 2018
New revision: 340976
URL: https://svnweb.freebsd.org/changeset/base/340976

Log:
  vi: fix UTF-8 detection.

  PR:		202290
  Submitted by:	lampa@fit.vutbr.cz
  Reviewed by:	bapt
  Approved by:	kib (mentor, implicit)
  MFC after:	3 days
  Differential Revision:	https://reviews.freebsd.org/D17950

Changes:
  head/contrib/nvi/common/encoding.c
Comment 7 commit-hook freebsd_committer freebsd_triage 2018-11-29 15:06:34 UTC
A commit references this bug:

Author: yuripv
Date: Thu Nov 29 15:05:47 UTC 2018
New revision: 341234
URL: https://svnweb.freebsd.org/changeset/base/341234

Log:
  MFC r340976:
  vi: fix UTF-8 detection.

  PR:		202290
  Submitted by:	lampa@fit.vutbr.cz
  Reviewed by:	bapt
  Differential Revision:	https://reviews.freebsd.org/D17950

Changes:
_U  stable/12/
  stable/12/contrib/nvi/common/encoding.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2018-11-29 15:08:39 UTC
A commit references this bug:

Author: yuripv
Date: Thu Nov 29 15:07:59 UTC 2018
New revision: 341235
URL: https://svnweb.freebsd.org/changeset/base/341235

Log:
  MFC r340976:
  vi: fix UTF-8 detection.

  PR:		202290
  Submitted by:	lampa@fit.vutbr.cz
  Reviewed by:	bapt
  Differential Revision:	https://reviews.freebsd.org/D17950

Changes:
_U  stable/11/
  stable/11/contrib/nvi/common/encoding.c
Comment 9 Ronald F. Guilmette 2020-05-16 23:23:34 UTC
Created attachment 214569 [details]
Perhaps this is a test case for the same (already fixed?) bug?