Bug 223553

Summary:

bsdgrep in -current is 10 times slower than before

Product:

Base System

Reporter:

Wolfram Schneider <wosch>

Component:

bin

Assignee:

freebsd-bugs (Nobody) <bugs>

Status:

Open ---

Severity:

Affects Many People

CC:

chris, crest, emaste, freebsd, grahamperrin, kevans, olivierw1+bugzilla-freebsd, se

Priority:

---

Keywords:

performance, regression

Version:

CURRENT

Hardware:

Any

OS:

Any

See Also:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254763
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271906
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255525

Bug Depends on:

223532

Bug Blocks:

230332

Attachments:

Description	Flags
Speed-up fgrep -i	none

Description Wolfram Schneider freebsd_committer

2017-11-09 09:18:05 UTC

While working on bug #223532 I noticed that bsdgrep on a recent FreeBSD12-current is much slower than on FreeBSD11-stable

On both machines runs the same bsdgrep version 2.6.0
$ /usr/bin/bsdgrep -V
bsdgrep (BSD grep, GNU compatible) 2.6.0-FreeBSD

How to repeat:

First, we create a 100MB text file:
for i in $(seq 1 20);do man tcsh;done > /tmp/tcsh20;
for i in $(seq 1 20); do cat /tmp/tcsh20;done > /tmp/tcsh400

$ du -hs /tmp/tcsh400
 99M    /tmp/tcsh400

# FreeBSD11-stable
LANG=en_CA.UTF-8 time /usr/bin/bsdgrep  -ic  foobar /tmp/tcsh400
0
        2.06 real         2.00 user         0.05 sys


# FreeBSD12-current 
LANG=en_CA.UTF-8 time /usr/bin/bsdgrep  -ic  foobar /tmp/tcsh400
0
       19.27 real        19.17 user         0.05 sys

Comment 1 Wolfram Schneider freebsd_committer

2017-11-09 09:39:13 UTC

It is also slow for LANG=C. Search times goes from 0.5 seconds to 4.3 seconds.

# FreeBSD 11-stable
LANG=C time /usr/bin/bsdgrep  -ic  foobar /tmp/tcsh400
0
        0.53 real         0.50 user         0.03 sys

# FreeBSD 12-current
LANG=C time /usr/bin/bsdgrep  -ic  foobar /tmp/tcsh400
0
        4.33 real         4.26 user         0.06 sys

Comment 2 Kyle Evans freebsd_committer

2017-11-09 13:16:59 UTC

Now that's odd- the difference between these two is that -HEAD defaults to WITHOUT_BSD_GREP_FASTMATCH (TRE). In all of my testing on three or four amd64 boxes, performance difference between the two never differed more than ~4%.

What kind of performance do you get on the -HEAD box if you flip BSD_GREP_FASTMATCH back on? With that kind of discrepancy, it might be worth fixing up TRE after all.

Comment 3 Kyle Evans freebsd_committer

2017-11-14 16:59:49 UTC

In testing, I've found that flipping GNU_GREP_COMPAT off by default in -HEAD is what caused the slowdown. Comparable performance can be achieved by turning it back on, which leads me to the conclusion that our regex(3) is the source of slowdown in this instance.

I think this leads me to two conclusions:

1.) Our regex(3) implementation needs optimized or replaced
2.) We need to improve our usage of regex(3)

These are both important, but #2 is something we've been needing to do anyways and probably the lower-hanging fruit. I think this would be accomplished by calling regex(3) less with larger chunks of text, rather than breaking up text line-by-line. This would allow us to take advantage of the optimizations to be had with larger subject strings and reduce the overhead (in most cases) by no longer having to search for all newlines.

Comment 4 Eitan Adler freebsd_committer

2018-05-20 23:53:01 UTC

For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"

Comment 5 Stefan Eßer freebsd_committer

2021-06-02 20:24:53 UTC

*** Bug 223532 has been marked as a duplicate of this bug. ***

Comment 6 Stefan Eßer freebsd_committer

2021-06-02 20:33:07 UTC

Created attachment 225513 [details]
Speed-up fgrep -i

This patch has originally been attached to PR 223532, but is more relevant to this PR.

The patch improves performance of "fgrep -i" in some tests by a factor of 30 to 40.

I have run "kyua test" on fgrep built with and without this patch and got the same 4 failed test cases for either version.

The patch does not resolve the performance issue observed with grep/egrep -i (which still is slower by about a factor of 100 compared to the same command executed without -i).