Bug 201650

Summary: (e)grep handling regexp wrong
Product: Base System Reporter: Franz Bettag <franz>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Many People CC: ben.rubson, chris, emaste, feld, kevans, me, steve
Priority: --- Keywords: patch
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Patch for fixing anchor treatment in BSD grep. none

Description Franz Bettag 2015-07-17 14:51:13 UTC
Today, thanks to a tweet from @moo_pronto (https://twitter.com/moo_pronto/status/622035032463527936) i noticed that FreeBSD's egrep is handling the regexp wrong.

echo "abc" | egrep -o '^[a-z]' should never also find "b" and "c", as it only should find the first character after the line beginning (^).

It would be okay if it was ^[a-z]+ but it's not. :)

I am very certain that is an error in regexp handling.

(Sorry if i ruined anyone's weekend)

- Franz
Comment 1 Mark Felder freebsd_committer 2015-07-17 14:57:18 UTC
% echo "abc" | bsdgrep -e "^[a-z]" -o
a


% echo "abc" | grep -e "^[a-z]" -o
a
b
c


bsdgrep gets it right, but grep does not. Our GPL'd grep must be old and has this bug?
Comment 2 Mark Felder freebsd_committer 2015-07-17 15:33:57 UTC
RHEL 5

-bash-3.2$ echo "abc" | grep -e "^[a-z]" -o
a
b
c
-bash-3.2$ grep -V
grep (GNU grep) 2.5.1

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.



RHEL 6

-bash-4.1$ echo "abc" | grep -e "^[a-z]" -o
a
-bash-4.1$ grep -V
GNU grep 2.6.3

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


grep went GPLv3 at 2.5.3, so we might not be able to backport the fix. Note that -o was a brand new feature in 2.5.1.


Perhaps we should just flip the switch and move to bsdgrep for 11-RELEASE?
Comment 3 Daniël de Kok 2016-06-26 16:38:25 UTC
Matching any four characters at the beginning of a line also fails:

% echo "1234 1234 1234" | egrep -o '^....'
1234
 123
4 12

Interestingly enough, this also fails with bsdgrep:

% echo "1234 1234 1234" | bsdgrep -o '^....'
1234
 123
4 12
Comment 4 Daniël de Kok 2016-07-12 17:31:12 UTC
Created attachment 172418 [details]
Patch for fixing anchor treatment in BSD grep.

The attached patch solves the bsdgrep that I reported in the previous comment. bsdgrep tries to find multiple matches by updating the starting offset after finding a match. However, the REG_NOTBOL flag should be set for offsets beyond 0, to ensure that regexec treats anchors correctly.
Comment 5 Ed Maste freebsd_committer 2016-08-24 20:39:18 UTC
Patch for the same issue in OpenBSD's grep: https://marc.info/?l=openbsd-tech&m=147206598501753&w=2
Comment 6 Ben RUBSON 2016-09-20 16:50:19 UTC
grep also fails with case :

# echo AAA > test
# grep a test
# grep "[a-z]" test
AAA
# echo BBB | grep "[a-z]"
BBB

# grep -V
grep (GNU grep) 2.5.1-FreeBSD
# uname -r
11.0-RC3

Quite dangerous issue, could we think about having it corrected for FreeBSD 11 release ?

Many thanks,

Ben
Comment 7 Daniël de Kok 2016-09-20 17:17:01 UTC
Note that B can be in the range a-z depending on collation settings:

http://unix.stackexchange.com/questions/15980/does-should-lc-collate-affect-character-ranges
Comment 8 Ben RUBSON 2016-09-20 19:01:37 UTC
You're right Daniël, I must face this.
What is strange is that on 2 different systems with the same LC_COLLATE, results are not the same.
Really disturbing.
Thank you for the link, and sorry for the noise...
Comment 9 Kyle Evans freebsd_committer 2017-04-20 17:48:46 UTC
Issues expressed in this thread exist with gnugrep, but none exist with bsdgrep at this point in time. This should be able to be closed with the installation of bsdgrep as /usr/bin/grep.
Comment 10 commit-hook freebsd_committer 2020-12-08 14:05:54 UTC
A commit references this bug:

Author: kevans
Date: Tue Dec  8 14:05:26 UTC 2020
New revision: 368439
URL: https://svnweb.freebsd.org/changeset/base/368439

Log:
  src.opts.mk: switch to bsdgrep as /usr/bin/grep

  This has been years in the making, and we all knew it was bound to happen
  some day. Switch to the BSDL grep implementation now that it's been a
  little more thoroughly tested and theoretically supports all of the
  extensions that gnugrep in base had with our libregex(3).

  Folks shouldn't really notice much from this update; bsdgrep is slower than
  gnugrep, but this is currently the price to pay for fewer bugs. Those
  dissatisfied with the speed of grep and in need of a faster implementation
  should check out what textproc/ripgrep and textproc/the_silver_searcher
  can do for them.

  I have some WIP to make bsdgrep faster, but do not consider it a blocker
  when compared to the pros of switching now (aforementioned bugs, licensing).

  PR:		228798 (exp-run)
  PR:		128645, 156704, 166842, 166862, 180937, 193835, 201650
  PR:		232565, 242308, 246000, 251081, 191086, 194397
  Relnotes:	yes, please

Changes:
  head/share/mk/src.opts.mk