There seem to be a regression in version 13 of FreeBSD affecting both basic and extended regular expression processing. Illustration of problem (grep): $ uname -a FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC $ time grep -E '(wordorphrase|differentword)' 150MB-file >/dev/null real 0m54.565s user 0m54.372s sys 0m0.173s should not take almost a minute to search 150MB file! Even worse is $ time grep -i 'differentword' 150MB-file >/dev/null real 0m28.060s user 0m28.016s sys 0m0.038s almost 30 sek to do a case-insensitive search on a 150MB text file - compared to: $ time grep 'differentword' 150MB-file >/dev/null real 0m0.210s user 0m0.178s sys 0m0.032s which runs at normal speed. This all was fine on 12.3 and 12.4 - For example: $ uname -a FreeBSD 12.3-RELEASE-p11 GENERIC $ time grep -E '(wordorphrase|differentword)' 150MB-file >/dev/null real 0m0.290s user 0m0.219s sys 0m0.071s
It's not a regression in regex implementation per se and rather a switch from (very outdated) gnu grep which used bundled libgnuregex to bsdgrep which is using in-base regex implementation. See also bug 223553, bug 254763, bug 255525.
*** Bug 271904 has been marked as a duplicate of this bug. ***
*** Bug 271905 has been marked as a duplicate of this bug. ***
So the in-base regex implementation is badly broken -- Additional problems (presumably with the regex lib also) are illustrated with the following: 'þ' is the LATIN1 character THORN (0xfe) $ env LC_CTYPE=is_IS.ISO8859-1 grep 'þ' grep: trailing backslash (\) $ env LC_CTYPE=is_IS.ISO8859-1 sed 's/þ/th/' sed: 1: "s/þ/th/": RE error: trailing backslash (\) $ env LC_CTYPE=is_IS.ISO8859-1 expr "abcþdef" : '...þ...' expr: trailing backslash (\) Any plans to fix this - or revert the change in in-base regex?
The trailing backslash error for ISO locales has been reported at bug #264275. The performance issue comes up periodically. For example, it is discussed at length in bug #254763.
(In reply to marius from comment #4) I think you misunderstood my reply. The libc (that is in-base) regex implementation was always that way (and got quite a few fixes in last years actually), there is nothing to revert -- whatever issues there are need to be fixed.
Is deduplication appropriate for any of the four non-tracking bug reports?