Bug 113343 - [patch] grep(1) outputs NOT-matched lines (with multi-bytes characters)
Summary: [patch] grep(1) outputs NOT-matched lines (with multi-bytes characters)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: gnu (show other bugs)
Version: 6.2-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Pedro F. Giffuni
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2007-06-04 19:30 UTC by Kazuaki ODA
Modified: 2019-02-10 23:49 UTC (History)
2 users (show)

See Also:


Attachments
search.c.diff (1.41 KB, patch)
2007-06-04 19:30 UTC, Kazuaki ODA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kazuaki ODA 2007-06-04 19:30:05 UTC
	Our grep(1) is a bit broken with multi-bytes characters.
	If byte sequence matches the searched pattern, grep(1) outputs the line
	containing the sequence.  Of course, this is fine for single-byte
	characters, but may be wrong for multi-bytes characters.  If matched
	sequence is the second byte of a character and the first byte of the
	next character, that is not matched and grep(1) should not output the
	line.
	Since our grep(1) has support for multi-bytes characters (and locales),
	it does not always behave as described above, but sometimes does.

Fix: Apply attached patch.
	mbstate_t should be initialized whenever mbrlen() returns -2, I think.
Comment 1 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:19 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 2 Pedro F. Giffuni freebsd_committer 2019-01-08 19:08:20 UTC
For the record:
FreeBSD 6.2 shipped with GNU grep 2.5.1 whish is basically the same we are using as of FreeBSD 12.0:

% grep --version
grep (GNU grep) 2.5.1-FreeBSD

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The patch applies almost cleanly:
 % svn patch --dry-run gnu-grep.diff
U         gnu/usr.bin/grep/search.c
>         applied hunk @@ -400,9 +400,12 @@ with offset 1
>         applied hunk @@ -462,9 +465,12 @@ with offset 1
>         applied hunk @@ -925,15 +931,21 @@ with offset 1
>         applied hunk @@ -1051,5 +1063,6 @@ with offset 1 and fuzz
Comment 3 Pedro F. Giffuni freebsd_committer 2019-01-09 21:54:41 UTC
I have compared to upstream and this code is actually different in our fork of the last GPLv2 release. Newer releases even broke the file in parts. It looks safe as the code paths generally involve a "no good" comment.

I am running the patch in my system to see that nothing goes wild and I will commit it afterwards.

Sorry it took so long, some of us don'tregularly use multibyte strings. :-/.
Comment 4 Pedro F. Giffuni freebsd_committer 2019-01-10 03:02:53 UTC
Committed as r342910.
Comment 5 commit-hook freebsd_committer 2019-02-10 23:45:58 UTC
A commit references this bug:

Author: pfg
Date: Sun Feb 10 23:45:15 UTC 2019
New revision: 343988
URL: https://svnweb.freebsd.org/changeset/base/343988

Log:
  MFC r342910:
  grep(1) outputs NOT-matched lines with multi-byte characters

  PR:	113343

Changes:
_U  stable/12/
  stable/12/gnu/usr.bin/grep/search.c
Comment 6 commit-hook freebsd_committer 2019-02-10 23:48:02 UTC
A commit references this bug:

Author: pfg
Date: Sun Feb 10 23:47:38 UTC 2019
New revision: 343989
URL: https://svnweb.freebsd.org/changeset/base/343989

Log:
  MFC r342910:
  grep(1) outputs NOT-matched lines with multi-byte characters

  PR:	113343

Changes:
_U  stable/11/
  stable/11/gnu/usr.bin/grep/search.c
Comment 7 Pedro F. Giffuni freebsd_committer 2019-02-10 23:49:18 UTC
Committed .. thanks!