bsdgrep(1) behaves weirdly when, apparently, an empty (sub)string is matched. === a session on FreeBSD (10.1-RELEASE) begins === $ echo '01:1:01' | grep -Eo '(^|:)0*' 0 $ # hey, what gives? $ echo '01:1:01' | bsdgrep -Eo '(^|:)0*' 0 : :0 $ # ah, old bugz :} $ echo '1:1:01' | grep -Eo '(^|:)0*' $ # surely, this is the same bug $ echo '1:1:01' | bsdgrep -Eo '(^|:)0*' $ # wtf? === a session on FreeBSD (10.1-RELEASE) ends === OK, let's see Linux. === a session on a GNU/Linux distrubution begins === $ echo '01:1:01' | grep -Eo '(^|:)0*' 0 : :0 $ # OK, just like BSD-grep $ echo '1:1:01' | grep -Eo '(^|:)0*' : :0 $ # correct! === a session on a GNU/Linux distrubution ends === More testing: === a session on FreeBSD (10.1-RELEASE) begins === $ echo 'bla bla' | bsdgrep -Eo '[[:alnum:]]*' bla $ # ??? $ printf 'bla\nbla\n' | bsdgrep -Eo '' bla bla $ # ... === a session on FreeBSD (10.1-RELEASE) ends === === a session on a GNU/Linux distrubution ends === $ echo 'bla bla' | bsdgrep -Eo '[[:alnum:]]*' bla bla $ # most sensible $ printf 'bla\nbla\n' | bsdgrep -Eo '' $ # also sensible === a session on a GNU/Linux distrubution ends ===
Created attachment 179126 [details] Proposed patch to address zero length match issues with bsdgrep Hi, Attached is a patch that seems to fix the behavior of zero-length matches. procline() was previously using something along the lines of 'next start (st) == match start' to indicate failure, but this isn't right when zero-length matches are possible. Therefore, we make a couple of critical changes: 1.) Keep track of the number of total matches we came across 2.) Don't add zero-length matches to our output 3.) If we had any matches, keep going. If st didn't advance beyond our first match, we only had zero-length matches To fix the empty expression case, I changed the argv pattern matching bits in main() to actually skip invalid expressions. Your test expressions now yield the following results: root@ghost:/usr/obj/usr/src/usr.bin/grep# echo '01:1:01' | ./bsdgrep -Eo '(^|:)0*' 0 : :0 # Good root@ghost:/usr/obj/usr/src/usr.bin/grep# echo '1:1:01' | ./bsdgrep -Eo '(^|:)0*' : :0 # Better root@ghost:/usr/obj/usr/src/usr.bin/grep# echo 'bla bla' | ./bsdgrep -Eo '[[:alnum:]]*' bla bla # Good root@ghost:/usr/obj/usr/src/usr.bin/grep# printf 'bla\nbla\n' | ./bsdgrep -Eo '' # Good I think this approach is mostly right, except for perhaps the bits in grep.c:main() to make sure we don't recognize invalid patterns as input patterns.
(In reply to Kyle Evans from comment #1) The 'ignore invalid patterns' change my previous patch makes seems definitively wrong -- I'll have to revise this part and I'm just going to leave this here: $ echo 'test' | grep '' test $ echo 'test' | grep -o '' $ echo 'test' | bsdgrep '' test $ echo 'test' | bsdgrep -o '' The regex bits declare this as a 4-character match, not sure off-hand how to treat this but I don't believe ignoring the pattern is the right thing to do.
Created attachment 179171 [details] Proposed patch to address line matching issues emaste@ - I believe this to be my final iteration of this patch. Because this and bug #209116 are both somewhat major issues with procline(), this patch addresses both PRs and obsoletes both previous patches. I've added to printline() a bail-out if -o and matchall (empty pattern) are set -- this coincides with GNU grep's behavior. Assuming I ran the tests correctly (make && make install && kyua test -k /usr/tests/usr.bin/grep/Kyuafile), I haven't yet introduced any regressions there. One of my next steps might be to review the unit test coverage and make sure it's sufficient, though, since it was passing before despite having all kinds of problems.
Created attachment 179173 [details] Proposed patch to address line matching issues The failure in bug #181263 case 1 also ended up being fairly straightforward to address -- pmatch gets clobbered by a non-matching case and ended up with bogus values, so now we look at the last *actual match* (if st isn't advancing) and see if that ends up being a zero-length match. This catches corner cases like the one found where .* can infinitely match but the 'a' pattern comes second. It shouldn't affect any legitimate cases, since those will cause an advancement in nst prior to this.
Created attachment 179175 [details] Proposed patch to address line matching issues Ugh, sorry -- this really should be the final version. Shortly after the last update, I caught a mailing list entry that mentioned broken beginning of line matching behavior and figured it would be wise to catch this as well. REG_NOTBOL needs to be manually set if we're past the beginning of the string because we also use REG_STARTEND to limit the scope of our matching, which makes the regex functions think we're beginning of line all the time. We do not need to do the same with EOL since we limit the scope by adjusting rm_so, resetting rm_eo to the end of the line. It might not be a bad idea to later set it anyways, just in case we change our mind, but for now that's out of scope.
Created attachment 179227 [details] Proposed patch to address line matching issues Address one last nit that I found in some more extensive testing -- if we're going to replace the last match made (due to overlap), it should be an earlier or strictly longer match than the current one. Test case: echo "abcdef" | grep -o -e "ab" -e "bc" Swap the order of -e arguments, "ab" and "bc" would previously yield different results when they should both yield "ab".
A commit references this bug: Author: emaste Date: Mon Apr 3 23:16:51 UTC 2017 New revision: 316477 URL: https://svnweb.freebsd.org/changeset/base/316477 Log: bsdgrep: fix matching behaviour - Set REG_NOTBOL if we've already matched beginning of line and we're examining later parts - For each pattern we examine, apply it to the remaining bits of the line rather than (potentially) smaller subsets - Check for REG_NOSUB after we've looked at all patterns initially matching the line - Keep track of the last match we made to later determine if we're simply not matching any longer or if we need to proceed another byte because we hit a zero-length match - Match the earliest and longest bit of each line before moving the beginning of what we match to further in the line, past the end of the longest match; this generally matches how gnugrep(1) seems to behave, and seems like pretty good behavior to me - Finally, bail out of printing any matches if we were set to print all (empty pattern) but -o (output matches) was set PR: 195763, 180990, 197555, 197531, 181263, 209116 Submitted by: "Kyle Evans" <kevans91@ksu.edu> Reviewed by: cem MFC after: 1 month Relnotes: Yes Differential Revision: https://reviews.freebsd.org/D10104 Changes: head/usr.bin/grep/util.c
A commit references this bug: Author: emaste Date: Wed Apr 5 18:41:46 UTC 2017 New revision: 316536 URL: https://svnweb.freebsd.org/changeset/base/316536 Log: bsdgrep: create additional tests for coverage on recent fixes Create additional tests to cover regressions that were discovered by PRs linked to reviews D10098, D10102, and D10104. It is worth noting that neither bsdgrep(1) nor gnugrep(1) in the base system currently pass all of these tests, and gnugrep(1) not quite being up to snuff was also noted in at least one of the PRs. PR: 175314 202022 195763 180990 197555 197531 181263 209116 Submitted by: Kyle Evans <kevans91@ksu.edu> Reviewed by: cem, ngie, emaste MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D10112 Changes: head/contrib/netbsd-tests/usr.bin/grep/d_color_a.in head/contrib/netbsd-tests/usr.bin/grep/d_color_a.out head/contrib/netbsd-tests/usr.bin/grep/d_color_b.in head/contrib/netbsd-tests/usr.bin/grep/d_color_b.out head/contrib/netbsd-tests/usr.bin/grep/d_color_c.out head/contrib/netbsd-tests/usr.bin/grep/d_escmap.in head/contrib/netbsd-tests/usr.bin/grep/d_f_file_empty.in head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_a.in head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_a.out head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_b.in head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_b.out head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_c.in head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_c.out head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_d.in head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_e.in head/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_e.out head/contrib/netbsd-tests/usr.bin/grep/t_grep.sh head/usr.bin/grep/tests/Makefile
A commit references this bug: Author: kevans Date: Wed Aug 16 00:12:25 UTC 2017 New revision: 322555 URL: https://svnweb.freebsd.org/changeset/base/322555 Log: bsdgrep: Fix matching behavior and add regression tests MFC r316477: bsdgrep: fix matching behaviour - Set REG_NOTBOL if we've already matched beginning of line and we're examining later parts - For each pattern we examine, apply it to the remaining bits of the line rather than (potentially) smaller subsets - Check for REG_NOSUB after we've looked at all patterns initially matching the line - Keep track of the last match we made to later determine if we're simply not matching any longer or if we need to proceed another byte because we hit a zero-length match - Match the earliest and longest bit of each line before moving the beginning of what we match to further in the line, past the end of the longest match; this generally matches how gnugrep(1) seems to behave, and seems like pretty good behavior to me - Finally, bail out of printing any matches if we were set to print all (empty pattern) but -o (output matches) was set MFC r316489: bsdgrep: Initialize vars to avoid a false positive GCC warning MFC r316491: bsdgrep: revert color changes from r316477 r316477 changed the color output to match exactly the in-tree GNU grep, but introduces unnecessary escape sequences. MFC r316536: bsdgrep: create additional tests for coverage on recent fixes Create additional tests to cover regressions that were discovered by PRs linked to reviews D10098, D10102, and D10104. It is worth noting that neither bsdgrep(1) nor gnugrep(1) in the base system currently pass all of these tests, and gnugrep(1) not quite being up to snuff was also noted in at least one of the PRs. MFC r317052: bsdgrep: fix zero-length matches without the -o flag r316477 broke zero-length matches when not using the -o flag, by skipping over them entirely. Add a regression test so that it doesn't break again in the future. PR: 175314, 180990, 181263, 195763, 197531, 197555, 202022, 209116 Approved by: emaste (mentor, blanket MFC) Relnotes: yes Changes: _U stable/11/ stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_a.in stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_a.out stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_b.in stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_b.out stable/11/contrib/netbsd-tests/usr.bin/grep/d_color_c.out stable/11/contrib/netbsd-tests/usr.bin/grep/d_escmap.in stable/11/contrib/netbsd-tests/usr.bin/grep/d_f_file_empty.in stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_a.in stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_a.out stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_b.in stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_b.out stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_c.in stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_c.out stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_d.in stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_e.in stable/11/contrib/netbsd-tests/usr.bin/grep/d_oflag_zerolen_e.out stable/11/contrib/netbsd-tests/usr.bin/grep/t_grep.sh stable/11/usr.bin/grep/tests/Makefile stable/11/usr.bin/grep/util.c
MFCed Wed Aug 16 00:12:25 UTC 2017.