Our grep(1) is a bit broken with multi-bytes characters.
If byte sequence matches the searched pattern, grep(1) outputs the line
containing the sequence. Of course, this is fine for single-byte
characters, but may be wrong for multi-bytes characters. If matched
sequence is the second byte of a character and the first byte of the
next character, that is not matched and grep(1) should not output the
Since our grep(1) has support for multi-bytes characters (and locales),
it does not always behave as described above, but sometimes does.
Fix: Apply attached patch.
mbstate_t should be initialized whenever mbrlen() returns -2, I think.
For bugs matching the following criteria:
Status: In Progress Changed: (is less than) 2014-06-01
Reset to default assignee and clear in-progress tags.
Mail being skipped
For the record:
FreeBSD 6.2 shipped with GNU grep 2.5.1 whish is basically the same we are using as of FreeBSD 12.0:
% grep --version
grep (GNU grep) 2.5.1-FreeBSD
Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The patch applies almost cleanly:
% svn patch --dry-run gnu-grep.diff
> applied hunk @@ -400,9 +400,12 @@ with offset 1
> applied hunk @@ -462,9 +465,12 @@ with offset 1
> applied hunk @@ -925,15 +931,21 @@ with offset 1
> applied hunk @@ -1051,5 +1063,6 @@ with offset 1 and fuzz
I have compared to upstream and this code is actually different in our fork of the last GPLv2 release. Newer releases even broke the file in parts. It looks safe as the code paths generally involve a "no good" comment.
I am running the patch in my system to see that nothing goes wild and I will commit it afterwards.
Sorry it took so long, some of us don'tregularly use multibyte strings. :-/.
Committed as r342910.