Bug 113343

Summary: [patch] grep(1) outputs NOT-matched lines (with multi-bytes characters)
Product: Base System Reporter: Kazuaki ODA <kazuaki>
Component: gnuAssignee: Pedro F. Giffuni <pfg>
Status: Closed FIXED    
Severity: Affects Only Me CC: kevans, pfg
Priority: Normal Keywords: patch
Version: 6.2-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
search.c.diff none

Description Kazuaki ODA 2007-06-04 19:30:05 UTC
	Our grep(1) is a bit broken with multi-bytes characters.
	If byte sequence matches the searched pattern, grep(1) outputs the line
	containing the sequence.  Of course, this is fine for single-byte
	characters, but may be wrong for multi-bytes characters.  If matched
	sequence is the second byte of a character and the first byte of the
	next character, that is not matched and grep(1) should not output the
	line.
	Since our grep(1) has support for multi-bytes characters (and locales),
	it does not always behave as described above, but sometimes does.

Fix: Apply attached patch.
	mbstate_t should be initialized whenever mbrlen() returns -2, I think.
Comment 1 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:19 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 2 Pedro F. Giffuni freebsd_committer freebsd_triage 2019-01-08 19:08:20 UTC
For the record:
FreeBSD 6.2 shipped with GNU grep 2.5.1 whish is basically the same we are using as of FreeBSD 12.0:

% grep --version
grep (GNU grep) 2.5.1-FreeBSD

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The patch applies almost cleanly:
 % svn patch --dry-run gnu-grep.diff
U         gnu/usr.bin/grep/search.c
>         applied hunk @@ -400,9 +400,12 @@ with offset 1
>         applied hunk @@ -462,9 +465,12 @@ with offset 1
>         applied hunk @@ -925,15 +931,21 @@ with offset 1
>         applied hunk @@ -1051,5 +1063,6 @@ with offset 1 and fuzz
Comment 3 Pedro F. Giffuni freebsd_committer freebsd_triage 2019-01-09 21:54:41 UTC
I have compared to upstream and this code is actually different in our fork of the last GPLv2 release. Newer releases even broke the file in parts. It looks safe as the code paths generally involve a "no good" comment.

I am running the patch in my system to see that nothing goes wild and I will commit it afterwards.

Sorry it took so long, some of us don'tregularly use multibyte strings. :-/.
Comment 4 Pedro F. Giffuni freebsd_committer freebsd_triage 2019-01-10 03:02:53 UTC
Committed as r342910.
Comment 5 commit-hook freebsd_committer freebsd_triage 2019-02-10 23:45:58 UTC
A commit references this bug:

Author: pfg
Date: Sun Feb 10 23:45:15 UTC 2019
New revision: 343988
URL: https://svnweb.freebsd.org/changeset/base/343988

Log:
  MFC r342910:
  grep(1) outputs NOT-matched lines with multi-byte characters

  PR:	113343

Changes:
_U  stable/12/
  stable/12/gnu/usr.bin/grep/search.c
Comment 6 commit-hook freebsd_committer freebsd_triage 2019-02-10 23:48:02 UTC
A commit references this bug:

Author: pfg
Date: Sun Feb 10 23:47:38 UTC 2019
New revision: 343989
URL: https://svnweb.freebsd.org/changeset/base/343989

Log:
  MFC r342910:
  grep(1) outputs NOT-matched lines with multi-byte characters

  PR:	113343

Changes:
_U  stable/11/
  stable/11/gnu/usr.bin/grep/search.c
Comment 7 Pedro F. Giffuni freebsd_committer freebsd_triage 2019-02-10 23:49:18 UTC
Committed .. thanks!