Bug 254763

Summary: grep very slow with 13.0-RC4
Product: Base System Reporter: Christos Chatzaras <chris>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: Open ---    
Severity: Affects Some People CC: Mark.Martinec, ak, dchagin, grahamperrin, kevans, marklmi26-fbsd, matteo, olivierw1+bugzilla-freebsd, parv.0zero9+freebsd, rihad, stefanrink, tatsuki_makino
Priority: --- Keywords: performance, regression
Version: 13.0-STABLE   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223553
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271906
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230332
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255525
Attachments:
Description Flags
access_log.txt and exclude_ips.txt files none

Description Christos Chatzaras 2021-04-04 14:08:31 UTC
Created attachment 223801 [details]
access_log.txt and exclude_ips.txt files

With 13.0-RC4 I notice that I get very slow speed using grep with the follow parameters. With 12.2-RELEASE-p4 it is fast. Files used for the test are included in the attachment.

-------------

FreeBSD 13.0-RC4:

/usr/bin/time -l cat access_log.txt | grep -F -v -f exclude_ips.txt > output.log
      358.92 real         0.00 user         0.00 sys
      2200  maximum resident set size
         8  average shared memory size
         4  average unshared data size
       128  average unshared stack size
       148  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
      1104  messages sent
         0  messages received
         0  signals received
       138  voluntary context switches
         0  involuntary context switches

-------------

FreeBSD 12.2-RELEASE-p4:

/usr/bin/time -l cat access_log.txt | grep -F -v -f exclude_ips.txt > output.log
        0.18 real         0.00 user         0.00 sys
         0  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
       137  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
         0  messages sent
         0  messages received
         0  signals received
        70  voluntary context switches
         0  involuntary context switches
Comment 1 parv 2021-05-10 09:32:26 UTC
Another case of too slow a search is with -i option. Should I file a separate PR?


On FreeBSD 13-STABLE, grep 2.6.0 search of /usr/ports/INDEX-13 with -i option is very slow compared to without ...

# \time egrep -i --color '^yubi.?oath' INDEX-13
...
47.53 real 44.65 user 0.41 sys

# \time egrep --color '^yubi.?oath' INDEX-13
...
0.14 real 0.01 user 0.12 sys


... option --color does not affect the time as there was only 1 result.

Perl 5.32.1 (from Ports) does not color but the time output is similar to the second case of grep search above, but with case insensitive search ... 

# \time perl -n -E 'm{^yubi.?oath}i and say' INDEX-13
Comment 2 Mark Millard 2021-05-10 11:29:26 UTC
(In reply to parv from comment #1)

Are you being careful to avoid the first grep from
loading the file contents into OS buffers so the
next grep does not have to read the media (as much)?
For example, rebooting between tests? Or using files
that are much larger than the RAM?

One option is to reboot after the sequence and
redo the sequence in a different order, such as
after what you wrote doing the order:

# \time egrep --color '^yubi.?oath' INDEX-13
# \time egrep -i --color '^yubi.?oath' INDEX-13

Variations in the result might give clues about
the various contributions to the time taken.
Comment 3 Kyle Evans freebsd_committer freebsd_triage 2021-05-10 12:02:59 UTC
(In reply to parv from comment #1)

No, please. All of the slowness will have the same root cause for the most part.
Comment 4 parv 2021-05-10 13:05:03 UTC
(In reply to Mark Millard from comment #2)

(A tolerable search would finish within ~1-2 second.)

Re: caching of file resulting in faster search ...
I created a 12 GB file filled with 0; searched for single letter via fgrep; caused swap to increase to ~1 GB (swap size is 8 GB); lots of crunchy noise ensued.

When I had enough, I ran egrep sans -i option ...

  1.16 real  0.07 user  0.30 sys


... and result after loading RAM as mentioned above for "egrep -i" ...

  49.13 real  45.28 user  0.63 sys


Relative time difference between two type of searches is rather consistent.


There is ~4-5 second improvement (time result hover around 43-45 second) for "egrep -i" if it is run just after "egrep" or "egrep -i".


FWIW, configuration here ...

  - FreeBSD 13-STABLE as guest of Windows 10 host in VirtualBox 5.2.44;
  - /src-build/ports/INDEX-13 file is on separate pool of ZFS than the pool of /usr/bin/egrep;
  - file systems are file-backed, allocated as SATA disks;
  - real memory: 8663 MB, avail memory: 7809 MB (of 16 GB RAM);
  - 2 "CPU"s have been allocated of Intel i5 6300U (Thinkpad X260).
Comment 5 parv 2021-05-10 13:37:02 UTC
(In reply to parv from comment #4)

I should have also mentioned that above egrep were tested with 2 pairs of locale configuration (time result is same enough) ...

  # root user.
  LANG=C.UTF-8

  # Not root user.
  LANG=en_US.UTF-8
  LC_COLLATE=C
  LC_CTYPE=en_US.UTF-8
  LC_MESSAGES=en_US.UTF-8
  LC_MONETARY=en_US.UTF-8


User CPU time of "egrep -i" improves with C locale ("env LANG=C LC_ALL=C egrep -i ...") ...

  38.24 real  36.44 user  0.61 sys  # Just after loading RAM as mentioned in #4.
  44.32 real  38.10 user  0.59 sys  # Successive runs ...
  39.88 real  39.00 user  0.59 sys  # ... after few minutes.
Comment 6 Mark Millard 2021-05-10 16:08:43 UTC
(In reply to parv from comment #4)

Intersting.

Another comparison/contrast would be to try a fairly large
file but one that would fit in RAM buffers/caching, say 4
GB based on your description, again both orders, with a PC
reboot between.

Going in a different direction . . .

My vague memories lead to looking up:

VBoxManage storagectl "VM name" --name <controllername> --hostiocache off

Have you tried both hostiocache off and on? If I remember
right on was a default.

There might be more virtualbox settings that might contribute to
how the I/O is handled. I do not remember much about such things.
But such settings might be appropriate to explore for if they make
any notable differences.
Comment 7 Mark Millard 2021-05-10 18:43:57 UTC
(In reply to Mark Millard from comment #6)

For the first part of comment #6: I think I misinterpreted
your test notes last night. Ignore it. Sorry.

The "--hostiocache off" part of the note would still apply.
(Possibly other configuration settings as well.)
Comment 8 Stefan Rink 2021-05-11 00:12:58 UTC
If you compile grep with gnuregex (like FreeBSD 12) it's okay.

With gnuregex; (edit grep.h to include gnuregex.h instead of gnuregex.h)
[root@SuperBSD /tmp/test]# cc -o grep *.c -lgnuregex -L/usr/local/lib -I/usr/local/include
[root@SuperBSD /tmp/test]# time ./grep -i 'bla' /usr/ports/INDEX-13 | wc -l
    1040

real    0m0.388s
user    0m0.385s
sys     0m0.008s
[root@SuperBSD /tmp/test]# time grep -i 'bla' /usr/ports/INDEX-13 | wc -l
    1040

real    0m3.428s
user    0m3.430s
sys     0m0.008s
Comment 9 parv 2021-06-01 10:25:08 UTC
For not being noted earlier here, I will paraphrase Kyle Envans kaevans91 on Twitter ...

- root issue is known in regex(3) library (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223553 also);
- grep(1) looks at each byte & for new lines;
- replacing regex(3) is a huge task, requires massive time & motivation.


... I would be happy to be corrected if I had misstated.


*In the mean time*, one could use instead ...

- ack, textproc/p5-ack (Perl), https://github.com/beyondgrep/ack3;

- ag, textproc/the_silver_searcher (C), https://github.com/ggreer/the_silver_searcher;

- GNU grep, textproc/gnugrep (C), https://www.gnu.org/software/grep/;

- ripgrep, textproc/ripgrep (Rust), https://github.com/BurntSushi/ripgrep.


Other workaround option is to build system grep with one of external regex libraries. For that matter, one could also install GNU grep at /usr/bin.
Comment 10 Rihad 2021-08-09 06:26:00 UTC
Same issue here.

grep -vE -f whitelist.re /path/to/nginx-access.log

Whitelist.re is a huge compiled exclusion list. In the form:
^(128\.177\.243\.|128\.177\.244\.100 | etc. etc.)

This was done in about 5 seconds with a pretty large access.log in FreeBSD 11.4
Still not finished after 30 minutes on FreeBSD 13.0.
Comment 11 Mark.Martinec 2022-01-18 23:07:15 UTC
I was about to open a new PR, but I realized the problem
has already been reported, so I'm commenting here.

My particular issue: fgrep -i is terribly slow on 13.0.

More in detail:

Searching for a string though a one day's worth of
mail log (100 MB, postfix) takes unreasonably long time
with BSD grep when option -i (case-insensitive search)
is specified:

$ time /usr/bin/fgrep -i fwzhkqwfoherbfqo mail.log
real 59.664s, user 58.907s, sys 0.222s

$ time /usr/bin/fgrep fwzhkqwfoherbfqo mail.log
real 0.276s, user 0.097s, sys 0.173s

With option -i it takes 200-times longer than without -i.
Compared to pcregrep or GNU grep, it takes 500-times longer.

fgrep and grep and egrep are the same in this respect.

An interesting observation: as the length of a search
string increases, so does the run time.

It would be worthwhile investing some time in finding
and fixing this particular hotspot in BSD grep.

(the machine above was a slightly older 13.0-RELEASE-p6 amd64,
with SSD disks and ZFS, the file is cached (cache warmed up))

----------------------

The rest below is some more benchmarking, this time
on a faster machine with nvme disks, ZFS, same log file.
Absolute times are shorter, but the ratio is about the same.

$ uname -a
FreeBSD xxx 13.0-RELEASE-p6 FreeBSD ... amd64/sys/GENERIC  amd64

$ ls -l mail.log
xxx  99999998 Jan 18 22:49 mail.log


BSD grep:
=========

$ time /usr/bin/fgrep -i fwzh mail.log
real 0m8.733s, user 0m8.664s, sys 0m0.061s

$ time /usr/bin/fgrep -i fwzhkqwf mail.log
real 0m12.759s, user 0m12.666s, sys 0m0.057s

$ time /usr/bin/fgrep -i fwzhkqwfoherbfqo mail.log
real 0m18.922s, user 0m18.813s, sys 0m0.064s

$ time /usr/bin/fgrep -i fwzhkqwfoherbfqojhkqnsazmzlwknhg mail.log
real 0m32.593s, user 0m32.438s, sys 0m0.056s


BSD grep without -i:

$ time /usr/bin/fgrep fwzhkqwfoherbfqojhkqnsazmzlwknhg mail.log
real 0m0.112s, user 0m0.033s, sys 0m0.079s


GNU grep (textutils/gnugrep):
=============================

$ time /usr/local/bin/fgrep -i fwzh mail.log
real 0m0.204s, user 0m0.073s, sys 0m0.022s

$ time /usr/local/bin/fgrep -i fwzhkqwf mail.log
real 0m0.203s, user 0m0.060s, sys 0m0.015s

$ time /usr/local/bin/fgrep -i fwzhkqwfoherbfqo mail.log
real 0m0.175s, user 0m0.036s, sys 0m0.017s

$ time /usr/local/bin/fgrep -i fwzhkqwfoherbfqojhkqnsazmzlwknhg mail.log
real 0m0.217s, user 0m0.083s, sys 0m0.006s


pcregrep:
=========

$ time pcregrep -i fwzh mail.log
real 0m0.249s, user 0m0.114s, sys 0m0.026s

$ time pcregrep -i fwzhkqwf mail.log
real 0m0.227s, user 0m0.094s, sys 0m0.013s

$ time pcregrep -i fwzhkqwfoherbfqo mail.log
real 0m0.128s, user 0m0.073s, sys 0m0.055s

$ time pcregrep -i fwzhkqwfoherbfqojhkqnsazmzlwknhg mail.log
real 0m0.126s, user 0m0.079s, sys 0m0.047s
Comment 12 Kyle Evans freebsd_committer freebsd_triage 2022-01-18 23:23:05 UTC
Improving the speed of grep(1) is largely rototilling it (probably a 20+ hour job, easily, once you've made sure you didn't break some weird corner cases); this isn't something I can do in my spare time, and I've not found anyone willing to fund it.
Comment 13 Mark Millard 2022-01-20 05:23:55 UTC
(In reply to Kyle Evans from comment #12)

So far as I know, the one area that I've see the behavior's
consequences somewhat systematically is for poudriere bulk
activity were it tries to go through the log to identify the
"reason" for stoppage --when the log happens to be huge, as it
can be for some port builds.

/usr/local/share/poudriere/processonelog.sh and its use of
bzgrep has a couple of examples of using -qi . Lots more
use of -qE and -q . One -v . One plain grep.

May be someday this issue will lead to some funding for
spending less time on the kind of activity.
Comment 14 Mark Millard 2022-01-20 05:28:15 UTC
(In reply to Mark Millard from comment #13)

Indicating a wording correction/intended-interpretation:

"plain grep": no "bz". Actually: grep -qE
Comment 15 Matteo Riondato freebsd_committer freebsd_triage 2022-03-17 11:51:40 UTC
Is this a duplicate of bug #223553 ?
Comment 16 Mark Millard 2022-06-27 00:40:46 UTC
(In reply to Mark Millard from comment #13 and #14)

poudriere-devel has gained:

# Grep build logs to determine a possible build failure reason.  This is
# only shown on the web interface.
# Default: yes
DETERMINE_BUILD_FAILURE_REASON=no

for use in /usr/local/etc/poudriere.conf as a way to avoid
the time involved. This is in:

# poudriere version
poudriere-git-3.3.99.20220617
Comment 17 Tatsuki Makino 2022-06-27 04:59:42 UTC
(In reply to Matteo Riondato from comment #15)

If this still occurs with /usr/bin/bsdgrep in 12.x, I think this is a duplication.

However, I continue to feel that in 12.3-STABLE the performance of concurrent storage usage by multiple processes has been poor...


> /usr/bin/time -l cat access_log.txt | grep -F -v -f exclude_ips.txt > output.log
                   ~~~~~~~~~~~~~~~~~~                                 ~~~~~~~~~~~~
                        |                                                \__ writing
                         \__ reading

Thus, this is reading and writing at the same time.
Doesn't it matter?
Comment 18 Mark Millard 2022-06-27 06:55:42 UTC
(In reply to Matteo Riondato from comment #15)

Well, bug #223553 was specifically about /usr/bin/bsdgrep
(for how things were at the time). The environment has
progressed and the issues are certainly not limited to
only the specific path /usr/bin/bsdgrep at this point.

But things certainly are related. Kyle Evans may well have
the best context to make a judgment about duplication
status --and, if a duplicate, which is better to keep as
the active one.
Comment 19 Mark Millard 2022-06-27 06:59:34 UTC
(In reply to Tatsuki Makino from comment #17)

/usr/bin/time -l cat access_log.txt > output.log

would also mix reading (access_log.txt) and writing (output.log).
Does it produce odd timings? Or is the grep essential to getting
odd timings?
Comment 20 Mark Millard 2022-06-27 07:09:58 UTC
(In reply to Mark Millard from comment #19)

Even better for comparison with grep use would be:

/usr/bin/time -l cat access_log.txt | cat > output.log

because it still has the pipe and a right hand side process
involved.
Comment 21 Mark Millard 2022-06-27 07:44:02 UTC
(In reply to Mark Millard from comment #20)

I time both the left hand side and right hand side processes
here:

# /usr/bin/time -l cat access_log.txt | time -l cat > output.log
        0.65 real         0.00 user         0.00 sys
         0  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
       154  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
      1104  messages sent
         0  messages received
         0  signals received
       233  voluntary context switches
         1  involuntary context switches
        0.65 real         0.00 user         0.00 sys
      2440  maximum resident set size
         8  average shared memory size
         8  average unshared data size
       128  average unshared stack size
       146  page reclaims
         0  page faults
         0  swaps
         0  block input operations
       549  block output operations
         0  messages sent
       549  messages received
         0  signals received
       357  voluntary context switches
         0  involuntary context switches

# /usr/bin/time -l cat access_log.txt | time -l grep -F -v -f exclude_ips.txt > output2.log
      422.70 real         0.00 user         0.00 sys
      2360  maximum resident set size
         8  average shared memory size
         8  average unshared data size
       128  average unshared stack size
       155  page reclaims
         0  page faults
         0  swaps
         0  block input operations
         0  block output operations
      1104  messages sent
         0  messages received
         0  signals received
       138  voluntary context switches
         0  involuntary context switches
      431.63 real       431.57 user         0.04 sys
    168164  maximum resident set size
        20  average shared memory size
         8  average unshared data size
       128  average unshared stack size
     41463  page reclaims
         0  page faults
         0  swaps
         0  block input operations
        32  block output operations
         0  messages sent
       138  messages received
         0  signals received
       168  voluntary context switches
      4969  involuntary context switches

So the "grep -F -v -f exclude_ips.txt" is compute bound (user), not I/O bound.
The "cat" right hand side, by contrast, was not compute bound (user) --and was
not made to be time consuming overall.
Comment 22 Tatsuki Makino 2022-06-27 08:26:04 UTC
Even in 12.3-STABLE, it was a bsdgrep issue :)

> /usr/bin/time -l cat access_log.txt | /usr/bin/time -l grep -F -v -f exclude_ips.txt > output.log
        0.45 real         0.00 user         0.00 sys
(omitted)
        0.46 real         0.39 user         0.06 sys
(omitted)
> /usr/bin/time -l cat access_log.txt | /usr/bin/time -l bsdgrep -F -v -f exclude_ips.txt > output.log
      377.78 real         0.00 user         0.00 sys
(omitted)
      385.78 real       385.47 user         0.11 sys
(omitted)
Comment 23 Alex Kozlov freebsd_committer freebsd_triage 2023-02-11 17:44:31 UTC
Just another data point:
Infamous dougb@ test which lead to revert of first gnu to bsdgrep conversion attempt in 2010:

uname:
FreeBSD freefall.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n259006-a6346c02f64: Sun Nov 6 02:50:08 UTC 2022

urls:
https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019005.html
https://web.archive.org/web/20100820074922/http://people.freebsd.org/~dougb/grep-time-trial.sh.txt [*]
https://www.FreeBSD.org/ports/INDEX-9.bz2

[*]: changed paths, number tries increased to 1000

results:
GNU grep
Elapsed time: 13 seconds

BSD grep
Elapsed time: 17 seconds

As you see, bsdgrep is still 30% slower than gnurep in this most common and trivial case.
Comment 24 Dmitry Chagin freebsd_committer freebsd_triage 2023-04-04 13:31:57 UTC
gnu grep uses splice syscall to avoid data transition between userspace and the kernel, https://endler.dev/2018/fastcat