Bug 196447 - vi(1) misbehavior when encountered invalid Unicode character
Summary: vi(1) misbehavior when encountered invalid Unicode character
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Baptiste Daroussin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-02 20:30 UTC by Xin LI
Modified: 2022-06-30 00:05 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Xin LI freebsd_committer freebsd_triage 2015-01-02 20:30:18 UTC
When there is an invalid UTF-8 character, for instance sys/dev/ata/atapi-cd.c (from stable/9), vi(1) would report:

sys/dev/ata/atapi-cd.c: unmodified: line 1; Conversion error on line 2

Then searching "geomf" in the file will not give any results, while it should.
Comment 1 lichray 2015-01-05 20:35:47 UTC
(In reply to Xin LI from comment #0)

We know these types of issues for quite a long time, but I don't have a desired behavior in mind.

For now, the workaround would be to set the correct locale:

  env LC_CTYPE=en_US.ISO8859-1 nvi /usr/src/sys/dev/ata/atapi-cd.c

or to make use of the 8-bit mode:

  env LC_CTYPE=C nvi /usr/src/sys/dev/ata/atapi-cd.c

Specific encoding can be set after the file is loaded, with ":se fe=iso-8859-1", but 8-bit mode cannot (unfortunately due to a display related bug which I cannot solve).
Comment 2 lichray 2015-01-05 20:52:40 UTC
(In reply to Xin LI from comment #0)

Forgot to answer your original question: why the search does not go across the first defected line.  The whole story is much worse than this: if you write the file, nothing is written after line 2: https://github.com/lichray/nvi2/issues/12  Right now I left them "consistently" awful.
Comment 3 Xin LI freebsd_committer freebsd_triage 2015-01-05 21:09:46 UTC
(In reply to lichray from comment #2)

Well, data corruption is much more serious than merely not having search working.

I think vi should probably ask the user whether they want to reload the file in 'C' locale when it encountered an error and quit if the user chooses not to.
Comment 4 lichray 2015-01-05 21:51:43 UTC
(In reply to Xin LI from comment #3)

It's not quite "data corruption", since an error will be shown, and your data is not immediately lost: just switch to the correct encoding (at runtime, after the error is shown) with ":se fe=iso-8859-1" and write again then your data is back.

Reload the file, like, as if ":e"?  Sounds interesting.  Added to the github issue.  But I need to implement the raw-write first, otherwise the data is really lost.  The change itself does not solve the problem.  For example, when you open a slightly larger file, and the conversion error is close to the end of the file, then during your editing no error is shown (the conversion is only needed when the line is needed); the error only happens when writing.
Comment 5 lichray 2015-04-03 20:18:04 UTC
Although not resolving this issue, FYI, the patch to prevent file truncation upon writing is merged:

  https://github.com/lichray/nvi2/commit/310d1e86c0b3db7f7e025e3092afc78e3d906fa2
Comment 6 Xin LI freebsd_committer freebsd_triage 2015-04-10 18:40:16 UTC
This should have been addressed in 281373.  Over to bapt@.
Comment 7 lichray 2015-04-10 18:47:16 UTC
(In reply to Xin LI from comment #6)

But don't close this bug.  The bug reported here is not resolved.  I plan to work on that later.
Comment 8 John Hein 2016-08-03 18:37:13 UTC
(In reply to lichray from comment #4)

If you don't notice the "conversion error" message or aren't aware of the ramifications, then you continue with your editing and write the file, then it does become "data corruption".

This is made worse because the later text is still visible in the editing buffer.  It's only after you save and quit is the missing data evident.

I just hit this with the nvi in freebsd 10 (ver 2.1.2).  There was a Makefile where someone entered their name with an o+umlaut encoded in iso-8859-1 in the header block on the first line.  I have LC_CTYPE=en_US.UTF-8 and edited the file without noticing the 'conversion error' message.  The resulting file after save/quit was empty.

No such problem on FreeBSD 9 (older nvi).

Good to hear a fix is available upstream (untested by me).  We should import the update into freebsd.
Comment 9 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:45:27 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.