Bug 205697

Summary: vi gets confused and corrupts file being edited
Product: Base System Reporter: heikki
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Some People CC: bjornr, dexter, xelalex_maker
Priority: --- Keywords: patch
Version: 10.2-RELEASE   
Hardware: Any   
OS: Any   

Description heikki 2015-12-29 15:35:49 UTC
I was editing a file with vi, and got

Error: ?!: Illegal byte sequence; ?!: WARNING: FILE TRUNCATED.

After this, it refused to save the file.  In middle of the file there was ~ one one line.  However, any attempt to edit that line caused error 

Error: unable to retrieve line 7

The line could not be removed or edited. 

This is nasty as it destroys the file being edited.

I recovered the file from backup, and I get 

paypal: unmodified: line 1; Conversion error on line 7


I might have missed that error when starting to edit.


This is plain text file.  If vi has some magic for UTF8 or whatever, it should
never go confused, and simply switch locate to C with appropriate warning message.
Comment 1 Bjorn Robertsson 2015-12-29 17:00:28 UTC
Hi, I created https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203040 a while back. This looks very similar.

I've downgraded some machines to nvi 1.79 from FreeBSD 9.3 source tree, which doesn't have iconv dependancy.

(In reply to heikki from comment #0)
Comment 2 Alexander Klein 2016-08-17 09:20:13 UTC
I see something similar in 10.3-RELEASE-p7 after playing around with Greek Unicode characters in zsh. I have the following line in my histfile, which contains a few Greek characters at the beginning, then the 0xb1, which I don't remember how to type, and then only ASCII-characters.

% sed -n -e 837p histfile | od -c
0000000    ρ  **   θ  **   θ  **   σ  ** 261               g   h   f   g
0000020   \n                                                            
0000021

% sed -n -e 837p histfile | hexdump -C
00000000  cf 81 ce b8 ce b8 cf 83  b1 20 20 20 67 68 66 67  |.........   ghfg|
00000010  0a                                                |.|
00000011

When writing the file, vi truncates it right at this point.
Comment 3 Michael Dexter freebsd_committer freebsd_triage 2017-03-22 16:45:48 UTC
Also seen in 11.0-RELEASE-p8, resulting in data loss.
Comment 4 Bjorn Robertsson 2017-03-23 10:09:59 UTC
(In reply to Michael Dexter from comment #3)

I have used this patch for 11.0, (from https://lists.freebsd.org/pipermail/freebsd-bugs/2015-August/063464.html), but note the couple more matches in the FreeBSD bug list:
New         |    202740 | vi/ex string substitution problem when there is m 
New         |    202290 | /usr/bin/vi conversion error on valid character   



Index: contrib/nvi/common/encoding.c
===================================================================
--- contrib/nvi/common/encoding.c       (revision 292832)
+++ contrib/nvi/common/encoding.c       (working copy)
@@ -96,7 +96,7 @@
                                if (i >= nbytes)
                                        goto done;

-                               if (buf[i] & 0x40)      /* 10xxxxxx */
+                               if ((buf[i] & 0xc0) != 0x80)    /* 10xxxxxx */
                                        return -1;
                        }
Comment 5 Alexander Klein 2017-03-24 08:28:00 UTC
(In reply to Bjorn Robertsson from comment #4)

There are even more bugs which are probably related to the same problem:

Bug 196447 - vi(1) misbehavior when encountered invalid Unicode character
Bug 203040 - Nvi truncates files with non-ASCII characters

It seems to be possible to get the data back in a number of cases, even after having written a corrupted file to disk:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196447#c4

The issue as a whole seems to be quite involved, however:

https://github.com/lichray/nvi2/issues/12
http://lists.suckless.org/dev/1312/18786.html