Bug 14342

Summary: [PATCH] Speed ups for regex!
Product: Base System Reporter: howardjp <howardjp>
Component: binAssignee: dcs <dcs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.0-CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff none

Description howardjp 1999-10-15 05:50:00 UTC
Regex is horribly, horribly slow.  

There is code in lib/libc/regex/engine.c to handle searching for
a known constant string which has to be present in any matches. 
This is meant to cut down the number of strings regexec() actually
has to search.  However, it uses a simple search to find it and 
it takes forever.

I replaced it with a Boyer-Moore search.  I stole the code from
4.4BSD's egrep.  I have trimed it down as much as I can and it 
roughly halves the speed of calls to regexec on mis-matches.

Apply the patch below to lib/libc/regex to get add the new code.

How-To-Repeat: 
N/A
Comment 1 cpiazza freebsd_committer freebsd_triage 2000-06-05 01:00:22 UTC
Responsible Changed
From-To: freebsd-bugs->dcs

dcs is working on this
Comment 2 dcs 2000-06-05 01:27:49 UTC
I can see a number of bugs with the patch. Cmap isn't properly
initialized, there is a reference to a "string" not mentioned elsewhere,
there is a reference to altlen[N], altpat[N] but only altlen[0] and
altpat[0] are actually used. The code compares must to cmap at regexec()
time, when they are both defined at regcomp() and not changed after
that! In other words... you trimmed too much. :-)

On the more problematic side, this imposes a limit on the size of both
text and pattern, something which the present code has not.

Finally, this is not, strictly speaking, the Boyer-Moore algorithm. It
only checks the last character of the must pattern, and, obviously
doesn't even have a provision for match jumps. And it must be pointed
out that the Boyer-Moore algorithm is much worse than the
straight-forward algorithm for patterns <= 3 characters in length.

-- 
Daniel C. Sobral			(8-DCS)

dcs@newsguy.com
dcs@freebsd.org
capo@yet.another.bsdconspiracy.org

	Hmmm - I have to go check this. My reality assumptions are shattered.
Comment 3 dcs freebsd_committer freebsd_triage 2000-06-29 05:42:24 UTC
State Changed
From-To: open->closed

A Boyer-Moore algorithm was added. Thanks.