Summary: | Aparent performance problem with basic and extended regular expressions | ||
---|---|---|---|
Product: | Base System | Reporter: | marius |
Component: | bin | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | Open --- | ||
Severity: | Affects Many People | CC: | grahamperrin, tamelingdaniel |
Priority: | --- | Keywords: | performance |
Version: | 13.2-RELEASE | ||
Hardware: | amd64 | ||
OS: | Any | ||
See Also: |
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230332 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223553 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254763 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255525 |
Description
marius
2023-06-08 13:40:53 UTC
It's not a regression in regex implementation per se and rather a switch from (very outdated) gnu grep which used bundled libgnuregex to bsdgrep which is using in-base regex implementation. See also bug 223553, bug 254763, bug 255525. *** Bug 271904 has been marked as a duplicate of this bug. *** *** Bug 271905 has been marked as a duplicate of this bug. *** So the in-base regex implementation is badly broken -- Additional problems (presumably with the regex lib also) are illustrated with the following: 'þ' is the LATIN1 character THORN (0xfe) $ env LC_CTYPE=is_IS.ISO8859-1 grep 'þ' grep: trailing backslash (\) $ env LC_CTYPE=is_IS.ISO8859-1 sed 's/þ/th/' sed: 1: "s/þ/th/": RE error: trailing backslash (\) $ env LC_CTYPE=is_IS.ISO8859-1 expr "abcþdef" : '...þ...' expr: trailing backslash (\) Any plans to fix this - or revert the change in in-base regex? The trailing backslash error for ISO locales has been reported at bug #264275. The performance issue comes up periodically. For example, it is discussed at length in bug #254763. (In reply to marius from comment #4) I think you misunderstood my reply. The libc (that is in-base) regex implementation was always that way (and got quite a few fixes in last years actually), there is nothing to revert -- whatever issues there are need to be fixed. Is deduplication appropriate for any of the four non-tracking bug reports? |