grep version line: /* $FreeBSD: src/usr.bin/grep/grep.c,v 1.11.2.3 2011/10/20 16:08:11 gabor Exp $ sed version line: $FreeBSD: src/usr.bin/sed/main.c,v 1.45.2.1 2011/09/23 00:51:37 kensmith Exp $ (1) FreeBSD grep without -E will reject unmatched '\{' as an invalid pattern, but treat unmatched '\}' as a literal '}'. So far, so good. This is also how Gnu grep and BusyBox grep handle these; POSIX-2008 doesn't specify what to do here. BusyBox's egrep sticks to the same pattern. But FreeBSD's egrep diverges: it treats unmatched { and unmatched } both as literals. These are perverse patterns and no one should be relying on this behavior; however, FreeBSD's change of behavior here seems unmotivated. Admittedly, Gnu egrep does the same as FreeBSD. (2) FreeBSD grep without -E follows the other greps in rejecting 'a\{1,2,3\}b' as an invalid pattern. The other egreps likewise reject 'a{1,2,3}b'. However, FreeBSD grep accepts 'a{1,2,3}b', and moreover will match it against the text "a{1,2,3}b"; however, the match is zero-length. Again, a perverse pattern whose interpretation no one should be relying on. However, FreeBSD's handling of it seems strange. (3) The pattern among other sed implementations is: without -r: reject unmatched \{ as error, accept unmatched \} as literal reject \{\}, \{2,1\}, and \{1,2,3\} with -r: reject unmatched { as error, accept unmatched } as literal reject {}, {2,1}, and {1,2,3} However, FreeBSD sed without -r diverges from the pattern in rejecting unmatched \} as error. (4) Also, FreeBSD sed with -r diverges from the pattern in accepting {} as those two literal characters. How-To-Repeat: See above.
(5) FreeBSD sed without -r, as well as the other sed implementations, reject unmatched \( and unmatched \) both as invalid patterns. So too do all these grep implementations without -E. Similarly, gnu sed with -r rejects unmatched ( and unmatched ) both as invalid patterns. And so too do all these egrep implementations. Like everyone else, FreeBSD sed with -r rejects unmatched ( as invalid. But it diverges in treating unmatched ) as literal. -- dubiousjim@gmail.com
(In reply to dubiousjim from comment #1) Summary of work needed at the bottom, feel free to skip ahead and only look back for intermediate results/notes. Some relevant notes: As of GNU grep 2.27, GNU SED 4.3 on Debian, and BSD grep @ r316566-ish: (1) and (2) behavior between the two seem to match (3) FreeBSD: $ echo "a{1,2,3}b" | sed -r "s/{/_/" a_1,2,3}b $ echo "a{1,2,3}b" | sed -r "s/}/_/" a{1,2,3_b Debian: $ echo "a{1,2,3}b" | sed -r "s/{/_/" # Error, invalid preceding expression # Whoops $ echo "a{1,2,3}b" | sed -r "s/a{/_/" # Error, unmatched \{ $ echo "a{1,2,3}b" | sed -r "s/}/_/" a{1,2,3_b We do have a test case for this at lib/libc/regex/grot/tests:205 where { is explicitly meant to be a literal match in both BREs and EREs. We have no case expression } being a literal match. FreeBSD: $ echo "a{1,2,3}b" | sed "s/\}/_/" # Error, parentheses not balanced Debian: $ echo "a{1,2,3}b" | sed "s/\}/_/" a{1,2,3_b # Ah, also prefer GNU behavior This one, it's worth noting, has no test either. It does have the obvious test for the other side, \{ alone, but no \}. (4) FreeBSD: $ echo "a{1,2,3}b" | sed -r "s/{}/_/" a{1,2,3}b Debian: $ echo "a{1,2,3}b" | sed -r "s/{}/_/" # Error, invalid preceding expression # Whoops $ echo "a{1,2,3}b" | sed -r "s/a{}/_/" # Error, invalid content # Reasonable This one is .... technically correct behavior. Technically, according to re_format(7), the following "}" is *not* a digit, and therefore this is not a bounds statement. I think this is really not correct, though. Letting {} take a literal interpretation leaves us too much room for error getting in if a digit was expected by the pattern-creator, and I would prefer the GNU approach on this matter. We'll probably want to update re_format(7) to be more explicit in this matter, as well as add a corresponding test case. (5) FreeBSD: $ echo "a{1,2,3}b" | sed -r "s/)/_/" a{1,2,3}b $ echo "a{1,2,3}b" | sed "s/\)/_/" # Error, parentheses not balanced This is clearly covered in tests:54 (silenced, though) and with slight anger expressed in the context around it. I lean towards taking the GNU/sane approach on this one and making this work as one probably expects nowadays. ===== Summary of work needed (3) Problem: { in ERE uses literal interpretation Needed: { throw error Needed: Fix test case at tests:205 to separate out BRE and ERE cases and adjust ERE case to meet expectations Problem: \} in BRE throws an error Needed: \} match literal (4) Problem: {} in ERE uses literal interpretation Needed: {} throw error Needed: Consider re_format(7) update to explicitly note {} as illegal Needed: Test case (5) Problem: ) in ERE should throw error Needed: ) throw error Needed: Adjust test cases (tests:54) I think that sums it up -- I'll take a look at these things in the next week or so.
I can't actually find any evidence that grot/ bits are still used/functional, so ignore those parts of last comment. Will look at the netbsd-tests instead.
A commit references this bug: Author: kevans Date: Tue Aug 8 04:10:47 UTC 2017 New revision: 322211 URL: https://svnweb.freebsd.org/changeset/base/322211 Log: regex(3): Handle invalid {} constructs consistently and adjust tests Currently, regex(3) exhibits the following wrong behavior as demonstrated with sed: - echo "a{1,2,3}b" | sed -r "s/{/_/" (1) - echo "a{1,2,3}b" | sed "s/\}/_/" (2) - echo "a{1,2,3}b" | sed -r "s/{}/_/" (3) Cases (1) and (3) should throw errors but they actually succeed, and (2) throws an error when it should match the literal '}'. The correct behavior was decided by comparing to the behavior with the equivalent BRE (1)(3) or ERE (2) and consulting POSIX, along with some reasonable evaluation. Tests were also adjusted/added accordingly. PR: 166861 Reviewed by: emaste, ngie, pfg Approved by: emaste (mentor) MFC after: never Differential Revision: https://reviews.freebsd.org/D10315 Changes: head/contrib/netbsd-tests/lib/libc/regex/data/repet_bounded.in head/contrib/netbsd-tests/lib/libc/regex/data/repet_multi.in head/lib/libc/regex/regcomp.c
For bugs matching the following conditions: - Status == In Progress - Assignee == "bugs@FreeBSD.org" - Last Modified Year <= 2017 Do - Set Status to "Open"
There was a commit referencing this bug, but it's still not closed and has been inactive for some time. Closing as fixed. Please re-open it if the issue hasn't been completely resolved.