Our grep(1) is a bit broken with multi-bytes characters. If byte sequence matches the searched pattern, grep(1) outputs the line containing the sequence. Of course, this is fine for single-byte characters, but may be wrong for multi-bytes characters. If matched sequence is the second byte of a character and the first byte of the next character, that is not matched and grep(1) should not output the line. Since our grep(1) has support for multi-bytes characters (and locales), it does not always behave as described above, but sometimes does. Fix: Apply attached patch. mbstate_t should be initialized whenever mbrlen() returns -2, I think.
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
For the record: FreeBSD 6.2 shipped with GNU grep 2.5.1 whish is basically the same we are using as of FreeBSD 12.0: % grep --version grep (GNU grep) 2.5.1-FreeBSD Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The patch applies almost cleanly: % svn patch --dry-run gnu-grep.diff U gnu/usr.bin/grep/search.c > applied hunk @@ -400,9 +400,12 @@ with offset 1 > applied hunk @@ -462,9 +465,12 @@ with offset 1 > applied hunk @@ -925,15 +931,21 @@ with offset 1 > applied hunk @@ -1051,5 +1063,6 @@ with offset 1 and fuzz
I have compared to upstream and this code is actually different in our fork of the last GPLv2 release. Newer releases even broke the file in parts. It looks safe as the code paths generally involve a "no good" comment. I am running the patch in my system to see that nothing goes wild and I will commit it afterwards. Sorry it took so long, some of us don'tregularly use multibyte strings. :-/.
Committed as r342910.
A commit references this bug: Author: pfg Date: Sun Feb 10 23:45:15 UTC 2019 New revision: 343988 URL: https://svnweb.freebsd.org/changeset/base/343988 Log: MFC r342910: grep(1) outputs NOT-matched lines with multi-byte characters PR: 113343 Changes: _U stable/12/ stable/12/gnu/usr.bin/grep/search.c
A commit references this bug: Author: pfg Date: Sun Feb 10 23:47:38 UTC 2019 New revision: 343989 URL: https://svnweb.freebsd.org/changeset/base/343989 Log: MFC r342910: grep(1) outputs NOT-matched lines with multi-byte characters PR: 113343 Changes: _U stable/11/ stable/11/gnu/usr.bin/grep/search.c
Committed .. thanks!