Bug 166842 - grep(1) inconsistently handles ^ in non-anchoring positions
Summary: grep(1) inconsistently handles ^ in non-anchoring positions
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-11 14:50 UTC by Jim Pryor
Modified: 2018-05-20 23:50 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jim Pryor 2012-04-11 14:50:12 UTC
version line:
/*      $FreeBSD: src/usr.bin/grep/grep.c,v 1.11.2.3 2011/10/20 16:08:11 gabor Exp $

According to the POSIX-2008 standard, "^" and "$" should be ordinary characters in BREs (basic regexs) when they're not in anchoring positions (as contrasted to EREs, where they should always be anchors). Hence:

$ printf 'a^b$c' | grep -o 'a^b'

should match, and it does when I use Gnu grep (on Linux), and using BusyBox grep (again on Linux, built against uClibc). But it doesn't using the described version of FreeBSD grep. Curiously though:

$ printf 'a^b$c' | grep -o '[a]^b'

will match. And so too will 'b$c'.

One can't portably rely on '\^' here to specify the literal '^', because POSIX-2008 says that '^' in non-anchoring positions is not special in BREs, and that the combination of '\' and a non-special character is undefined. Of course, neither can one use '[^]'.

How-To-Repeat: See above.
Comment 1 Jim Pryor 2012-04-11 20:21:01 UTC
I've noticed some more issues with the same version of grep. I don't
know whether they're related, but I'll append them here for now.

$ printf abc | grep -o '^[a-c]'

should just print 'a', but instead gives three hits, against each letter
of the incoming text. The same issue occurs when handling multiline
buffers:

$ printf 'abc\ndef' | grep -o --null '^[a-f]'

incorrectly matches 6 times.

$ printf 'abc\ndef' | grep -o --null '[a-f]$'

correctly only matches 'c' and 'f'.


$ printf 'abc\ndef' | grep -o --null '\`[a-f]'

has the same issue as ^, whereas:

$ printf 'abc\ndef' | grep -o --null '[a-f]\'\'

matches 'c' and 'f'. To fix \` in a way that matches the behavior of \',
it should only match the 'a' and 'd'. In fact, though, both of these
should only match against a single character: 'a' for \` and 'f' for \'.
That's the specified behavior of these Gnu extensions, and how they
behave in the Gnu grep and BusyBox grep implementations I'm testing
against. If that behavior isn't going to be provided, then wouldn't it'd
be better for these extensions not even pretend to be present? And so,
just match against a literal ` or '?
-- 
dubiousjim@gmail.com
Comment 2 Jim Pryor 2012-04-12 05:00:46 UTC
On Wed, Apr 11, 2012, at 03:21 PM, Jim Pryor wrote:
> I've noticed some more issues with the same version of grep. I don't
> know whether they're related, but I'll append them here for now.
> 
> $ printf abc | grep -o '^[a-c]'

Some more observations that seem related:

$ printf 'abc def' | grep -o '^[a-z]'

will match against each of the letters in 'abc', but not against any of
the letters in 'def'.

On the other hand:

$ printf 'abc def' | grep -o '\b[a-z]'
$ printf 'abc def' | grep -o '\<[a-z]'

will each match against all six of the letters.

Matching against the patterns:
  '[a-z]\b'
  '[a-z]\>'
  '[a-z]$'
gives correct results.
-- 
dubiousjim@gmail.com
Comment 3 Kyle Evans freebsd_committer 2017-01-21 00:35:44 UTC
emaste@ - I think this one can just be closed. If I run all of these on an unsalted 11.0 machine, all of the examples in the above three posts yield the expected results rather than the observed results.

I do not have access to anything on 10.x or stable/10 to test it on and haven't built up the motivation to sort through commits and figure out why it seems to be working now, although I suppose it doesn't matter if we can see that it works on 10.x.
Comment 4 Ed Maste freebsd_committer 2017-01-21 00:59:37 UTC
It looks like at least some of these issues are reproducible with the GNU grep in FreeBSD 10 - for example:

% grep --version
grep (GNU grep) 2.5.1-FreeBSD

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% printf abc | grep -o '^[a-c]'                  
a
b
c

I was not able to reproduce any of the failures with bsdgrep in FreeBSD 10. I have updated the title to refer to non-BSD grep.
Comment 5 Kyle Evans freebsd_committer 2017-04-20 17:50:26 UTC
This is good to know. =) This one may be closed when bsdgrep becomes /usr/bin/grep.
Comment 6 Eitan Adler freebsd_committer freebsd_triage 2018-05-20 23:50:01 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"