Summary: | grep very slow with 13.0-RC4 | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Christos Chatzaras <chris> | ||||
Component: | bin | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||
Status: | Open --- | ||||||
Severity: | Affects Some People | CC: | Mark.Martinec, ak, dchagin, freebsd, freebsd, grahamperrin, kevans, marklmi26-fbsd, matteo, olivierw1+bugzilla-freebsd, rihad, stefanrink, tatsuki_makino, wosch | ||||
Priority: | --- | Keywords: | performance, regression | ||||
Version: | 13.0-STABLE | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
See Also: |
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223553 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271906 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230332 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255525 |
||||||
Bug Depends on: | |||||||
Bug Blocks: | 230332 | ||||||
Attachments: |
|
Another case of too slow a search is with -i option. Should I file a separate PR? On FreeBSD 13-STABLE, grep 2.6.0 search of /usr/ports/INDEX-13 with -i option is very slow compared to without ... # \time egrep -i --color '^yubi.?oath' INDEX-13 ... 47.53 real 44.65 user 0.41 sys # \time egrep --color '^yubi.?oath' INDEX-13 ... 0.14 real 0.01 user 0.12 sys ... option --color does not affect the time as there was only 1 result. Perl 5.32.1 (from Ports) does not color but the time output is similar to the second case of grep search above, but with case insensitive search ... # \time perl -n -E 'm{^yubi.?oath}i and say' INDEX-13 (In reply to parv from comment #1) Are you being careful to avoid the first grep from loading the file contents into OS buffers so the next grep does not have to read the media (as much)? For example, rebooting between tests? Or using files that are much larger than the RAM? One option is to reboot after the sequence and redo the sequence in a different order, such as after what you wrote doing the order: # \time egrep --color '^yubi.?oath' INDEX-13 # \time egrep -i --color '^yubi.?oath' INDEX-13 Variations in the result might give clues about the various contributions to the time taken. (In reply to parv from comment #1) No, please. All of the slowness will have the same root cause for the most part. (In reply to Mark Millard from comment #2) (A tolerable search would finish within ~1-2 second.) Re: caching of file resulting in faster search ... I created a 12 GB file filled with 0; searched for single letter via fgrep; caused swap to increase to ~1 GB (swap size is 8 GB); lots of crunchy noise ensued. When I had enough, I ran egrep sans -i option ... 1.16 real 0.07 user 0.30 sys ... and result after loading RAM as mentioned above for "egrep -i" ... 49.13 real 45.28 user 0.63 sys Relative time difference between two type of searches is rather consistent. There is ~4-5 second improvement (time result hover around 43-45 second) for "egrep -i" if it is run just after "egrep" or "egrep -i". FWIW, configuration here ... - FreeBSD 13-STABLE as guest of Windows 10 host in VirtualBox 5.2.44; - /src-build/ports/INDEX-13 file is on separate pool of ZFS than the pool of /usr/bin/egrep; - file systems are file-backed, allocated as SATA disks; - real memory: 8663 MB, avail memory: 7809 MB (of 16 GB RAM); - 2 "CPU"s have been allocated of Intel i5 6300U (Thinkpad X260). (In reply to parv from comment #4) I should have also mentioned that above egrep were tested with 2 pairs of locale configuration (time result is same enough) ... # root user. LANG=C.UTF-8 # Not root user. LANG=en_US.UTF-8 LC_COLLATE=C LC_CTYPE=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 User CPU time of "egrep -i" improves with C locale ("env LANG=C LC_ALL=C egrep -i ...") ... 38.24 real 36.44 user 0.61 sys # Just after loading RAM as mentioned in #4. 44.32 real 38.10 user 0.59 sys # Successive runs ... 39.88 real 39.00 user 0.59 sys # ... after few minutes. (In reply to parv from comment #4) Intersting. Another comparison/contrast would be to try a fairly large file but one that would fit in RAM buffers/caching, say 4 GB based on your description, again both orders, with a PC reboot between. Going in a different direction . . . My vague memories lead to looking up: VBoxManage storagectl "VM name" --name <controllername> --hostiocache off Have you tried both hostiocache off and on? If I remember right on was a default. There might be more virtualbox settings that might contribute to how the I/O is handled. I do not remember much about such things. But such settings might be appropriate to explore for if they make any notable differences. (In reply to Mark Millard from comment #6) For the first part of comment #6: I think I misinterpreted your test notes last night. Ignore it. Sorry. The "--hostiocache off" part of the note would still apply. (Possibly other configuration settings as well.) If you compile grep with gnuregex (like FreeBSD 12) it's okay. With gnuregex; (edit grep.h to include gnuregex.h instead of gnuregex.h) [root@SuperBSD /tmp/test]# cc -o grep *.c -lgnuregex -L/usr/local/lib -I/usr/local/include [root@SuperBSD /tmp/test]# time ./grep -i 'bla' /usr/ports/INDEX-13 | wc -l 1040 real 0m0.388s user 0m0.385s sys 0m0.008s [root@SuperBSD /tmp/test]# time grep -i 'bla' /usr/ports/INDEX-13 | wc -l 1040 real 0m3.428s user 0m3.430s sys 0m0.008s For not being noted earlier here, I will paraphrase Kyle Envans kaevans91 on Twitter ... - root issue is known in regex(3) library (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223553 also); - grep(1) looks at each byte & for new lines; - replacing regex(3) is a huge task, requires massive time & motivation. ... I would be happy to be corrected if I had misstated. *In the mean time*, one could use instead ... - ack, textproc/p5-ack (Perl), https://github.com/beyondgrep/ack3; - ag, textproc/the_silver_searcher (C), https://github.com/ggreer/the_silver_searcher; - GNU grep, textproc/gnugrep (C), https://www.gnu.org/software/grep/; - ripgrep, textproc/ripgrep (Rust), https://github.com/BurntSushi/ripgrep. Other workaround option is to build system grep with one of external regex libraries. For that matter, one could also install GNU grep at /usr/bin. Same issue here. grep -vE -f whitelist.re /path/to/nginx-access.log Whitelist.re is a huge compiled exclusion list. In the form: ^(128\.177\.243\.|128\.177\.244\.100 | etc. etc.) This was done in about 5 seconds with a pretty large access.log in FreeBSD 11.4 Still not finished after 30 minutes on FreeBSD 13.0. I was about to open a new PR, but I realized the problem has already been reported, so I'm commenting here. My particular issue: fgrep -i is terribly slow on 13.0. More in detail: Searching for a string though a one day's worth of mail log (100 MB, postfix) takes unreasonably long time with BSD grep when option -i (case-insensitive search) is specified: $ time /usr/bin/fgrep -i fwzhkqwfoherbfqo mail.log real 59.664s, user 58.907s, sys 0.222s $ time /usr/bin/fgrep fwzhkqwfoherbfqo mail.log real 0.276s, user 0.097s, sys 0.173s With option -i it takes 200-times longer than without -i. Compared to pcregrep or GNU grep, it takes 500-times longer. fgrep and grep and egrep are the same in this respect. An interesting observation: as the length of a search string increases, so does the run time. It would be worthwhile investing some time in finding and fixing this particular hotspot in BSD grep. (the machine above was a slightly older 13.0-RELEASE-p6 amd64, with SSD disks and ZFS, the file is cached (cache warmed up)) ---------------------- The rest below is some more benchmarking, this time on a faster machine with nvme disks, ZFS, same log file. Absolute times are shorter, but the ratio is about the same. $ uname -a FreeBSD xxx 13.0-RELEASE-p6 FreeBSD ... amd64/sys/GENERIC amd64 $ ls -l mail.log xxx 99999998 Jan 18 22:49 mail.log BSD grep: ========= $ time /usr/bin/fgrep -i fwzh mail.log real 0m8.733s, user 0m8.664s, sys 0m0.061s $ time /usr/bin/fgrep -i fwzhkqwf mail.log real 0m12.759s, user 0m12.666s, sys 0m0.057s $ time /usr/bin/fgrep -i fwzhkqwfoherbfqo mail.log real 0m18.922s, user 0m18.813s, sys 0m0.064s $ time /usr/bin/fgrep -i fwzhkqwfoherbfqojhkqnsazmzlwknhg mail.log real 0m32.593s, user 0m32.438s, sys 0m0.056s BSD grep without -i: $ time /usr/bin/fgrep fwzhkqwfoherbfqojhkqnsazmzlwknhg mail.log real 0m0.112s, user 0m0.033s, sys 0m0.079s GNU grep (textutils/gnugrep): ============================= $ time /usr/local/bin/fgrep -i fwzh mail.log real 0m0.204s, user 0m0.073s, sys 0m0.022s $ time /usr/local/bin/fgrep -i fwzhkqwf mail.log real 0m0.203s, user 0m0.060s, sys 0m0.015s $ time /usr/local/bin/fgrep -i fwzhkqwfoherbfqo mail.log real 0m0.175s, user 0m0.036s, sys 0m0.017s $ time /usr/local/bin/fgrep -i fwzhkqwfoherbfqojhkqnsazmzlwknhg mail.log real 0m0.217s, user 0m0.083s, sys 0m0.006s pcregrep: ========= $ time pcregrep -i fwzh mail.log real 0m0.249s, user 0m0.114s, sys 0m0.026s $ time pcregrep -i fwzhkqwf mail.log real 0m0.227s, user 0m0.094s, sys 0m0.013s $ time pcregrep -i fwzhkqwfoherbfqo mail.log real 0m0.128s, user 0m0.073s, sys 0m0.055s $ time pcregrep -i fwzhkqwfoherbfqojhkqnsazmzlwknhg mail.log real 0m0.126s, user 0m0.079s, sys 0m0.047s Improving the speed of grep(1) is largely rototilling it (probably a 20+ hour job, easily, once you've made sure you didn't break some weird corner cases); this isn't something I can do in my spare time, and I've not found anyone willing to fund it. (In reply to Kyle Evans from comment #12) So far as I know, the one area that I've see the behavior's consequences somewhat systematically is for poudriere bulk activity were it tries to go through the log to identify the "reason" for stoppage --when the log happens to be huge, as it can be for some port builds. /usr/local/share/poudriere/processonelog.sh and its use of bzgrep has a couple of examples of using -qi . Lots more use of -qE and -q . One -v . One plain grep. May be someday this issue will lead to some funding for spending less time on the kind of activity. (In reply to Mark Millard from comment #13) Indicating a wording correction/intended-interpretation: "plain grep": no "bz". Actually: grep -qE Is this a duplicate of bug #223553 ? (In reply to Mark Millard from comment #13 and #14) poudriere-devel has gained: # Grep build logs to determine a possible build failure reason. This is # only shown on the web interface. # Default: yes DETERMINE_BUILD_FAILURE_REASON=no for use in /usr/local/etc/poudriere.conf as a way to avoid the time involved. This is in: # poudriere version poudriere-git-3.3.99.20220617 (In reply to Matteo Riondato from comment #15) If this still occurs with /usr/bin/bsdgrep in 12.x, I think this is a duplication. However, I continue to feel that in 12.3-STABLE the performance of concurrent storage usage by multiple processes has been poor... > /usr/bin/time -l cat access_log.txt | grep -F -v -f exclude_ips.txt > output.log ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ | \__ writing \__ reading Thus, this is reading and writing at the same time. Doesn't it matter? (In reply to Matteo Riondato from comment #15) Well, bug #223553 was specifically about /usr/bin/bsdgrep (for how things were at the time). The environment has progressed and the issues are certainly not limited to only the specific path /usr/bin/bsdgrep at this point. But things certainly are related. Kyle Evans may well have the best context to make a judgment about duplication status --and, if a duplicate, which is better to keep as the active one. (In reply to Tatsuki Makino from comment #17) /usr/bin/time -l cat access_log.txt > output.log would also mix reading (access_log.txt) and writing (output.log). Does it produce odd timings? Or is the grep essential to getting odd timings? (In reply to Mark Millard from comment #19) Even better for comparison with grep use would be: /usr/bin/time -l cat access_log.txt | cat > output.log because it still has the pipe and a right hand side process involved. (In reply to Mark Millard from comment #20) I time both the left hand side and right hand side processes here: # /usr/bin/time -l cat access_log.txt | time -l cat > output.log 0.65 real 0.00 user 0.00 sys 0 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 154 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 1104 messages sent 0 messages received 0 signals received 233 voluntary context switches 1 involuntary context switches 0.65 real 0.00 user 0.00 sys 2440 maximum resident set size 8 average shared memory size 8 average unshared data size 128 average unshared stack size 146 page reclaims 0 page faults 0 swaps 0 block input operations 549 block output operations 0 messages sent 549 messages received 0 signals received 357 voluntary context switches 0 involuntary context switches # /usr/bin/time -l cat access_log.txt | time -l grep -F -v -f exclude_ips.txt > output2.log 422.70 real 0.00 user 0.00 sys 2360 maximum resident set size 8 average shared memory size 8 average unshared data size 128 average unshared stack size 155 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 1104 messages sent 0 messages received 0 signals received 138 voluntary context switches 0 involuntary context switches 431.63 real 431.57 user 0.04 sys 168164 maximum resident set size 20 average shared memory size 8 average unshared data size 128 average unshared stack size 41463 page reclaims 0 page faults 0 swaps 0 block input operations 32 block output operations 0 messages sent 138 messages received 0 signals received 168 voluntary context switches 4969 involuntary context switches So the "grep -F -v -f exclude_ips.txt" is compute bound (user), not I/O bound. The "cat" right hand side, by contrast, was not compute bound (user) --and was not made to be time consuming overall. Even in 12.3-STABLE, it was a bsdgrep issue :) > /usr/bin/time -l cat access_log.txt | /usr/bin/time -l grep -F -v -f exclude_ips.txt > output.log 0.45 real 0.00 user 0.00 sys (omitted) 0.46 real 0.39 user 0.06 sys (omitted) > /usr/bin/time -l cat access_log.txt | /usr/bin/time -l bsdgrep -F -v -f exclude_ips.txt > output.log 377.78 real 0.00 user 0.00 sys (omitted) 385.78 real 385.47 user 0.11 sys (omitted) Just another data point: Infamous dougb@ test which lead to revert of first gnu to bsdgrep conversion attempt in 2010: uname: FreeBSD freefall.freebsd.org 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n259006-a6346c02f64: Sun Nov 6 02:50:08 UTC 2022 urls: https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019005.html https://web.archive.org/web/20100820074922/http://people.freebsd.org/~dougb/grep-time-trial.sh.txt [*] https://www.FreeBSD.org/ports/INDEX-9.bz2 [*]: changed paths, number tries increased to 1000 results: GNU grep Elapsed time: 13 seconds BSD grep Elapsed time: 17 seconds As you see, bsdgrep is still 30% slower than gnurep in this most common and trivial case. gnu grep uses splice syscall to avoid data transition between userspace and the kernel, https://endler.dev/2018/fastcat *** Bug 223553 has been marked as a duplicate of this bug. *** |
Created attachment 223801 [details] access_log.txt and exclude_ips.txt files With 13.0-RC4 I notice that I get very slow speed using grep with the follow parameters. With 12.2-RELEASE-p4 it is fast. Files used for the test are included in the attachment. ------------- FreeBSD 13.0-RC4: /usr/bin/time -l cat access_log.txt | grep -F -v -f exclude_ips.txt > output.log 358.92 real 0.00 user 0.00 sys 2200 maximum resident set size 8 average shared memory size 4 average unshared data size 128 average unshared stack size 148 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 1104 messages sent 0 messages received 0 signals received 138 voluntary context switches 0 involuntary context switches ------------- FreeBSD 12.2-RELEASE-p4: /usr/bin/time -l cat access_log.txt | grep -F -v -f exclude_ips.txt > output.log 0.18 real 0.00 user 0.00 sys 0 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 137 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 70 voluntary context switches 0 involuntary context switches