Bug 255525 - grep performance problem
Summary: grep performance problem
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2021-05-01 09:52 UTC by Edward Tomasz Napierala
Modified: 2023-06-10 05:34 UTC (History)
6 users (show)

See Also:


Attachments
Replace gnugrep with pcregrep (1.51 KB, patch)
2022-04-14 14:19 UTC, Duane
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Edward Tomasz Napierala freebsd_committer freebsd_triage 2021-05-01 09:52:05 UTC
The sysutils/debootstrap port seems to trigger a pathological corner case in BSD grep, where it's about two orders of magnitude slower than GNU grep.

To reproduce: grab https://people.freebsd.org/~trasz/debian_dists_buster_main_binary-amd64_Packages and do:

$ grep -E '^$|^Architecture:|^Filename:|^MD5sum:|^Package:|^Priority:|^SHA256:|^Size:|^Version:|^Depends:|^Pre-Depends:'
Comment 1 Duane 2022-04-14 13:54:30 UTC
I see this was addressed here:

https://cgit.freebsd.org/ports/commit/?id=ea62bacb8ac8978cd8f265cc385fd55cec051d1a

I was wondering if it would be possible to please solve this using pcregrep instead of gnugrep.

pcregrep uses the same engine as gnugrep and so has the same performance, but it is included as part of the pcre library which is much more likely to be required by other utilities, and also it doesn't replace any of the existing BSD commands.
Comment 2 Duane 2022-04-14 14:19:57 UTC
Created attachment 233215 [details]
Replace gnugrep with pcregrep
Comment 3 Duane 2022-04-14 14:23:23 UTC
Sorry, I've just realised that this bug relates to the performance of libc/regex in the base system, not the debootstrap port performance.  I've come into this backwards through a search online.

I'll resubmit against a separate bug request, please ignore my comments here.
Comment 4 Duane 2022-04-15 04:57:33 UTC
Just out of interest I had a look around at the available regex libraries and I think probably the way forward for base would be to update to the newer Henry Spencer regex library from within the TCL codebase.

It's one of the smaller regex libraries around but uses NFA/DFA for speed.

I would propose using https://github.com/garyhouston/hsrex as a starting point (this has been extracted from TCL by patching over the TCL dependencies).
Comment 5 John Hein 2023-02-24 17:51:23 UTC
(In reply to Duane from comment #4)
See also bug 269584, comment 23 and review D38754
Comment 6 John Hein 2023-02-24 17:53:51 UTC
(In reply to John Hein from comment #5)
That was supposed to be 'in reply to comment 0 and comment 1' (not comment 4 regarding possible changes to base regex library) - for the choice of pcregrep over ggrep for the port in question.