I'm the maintainer of security/logcheck port. It detects log messages that should be reported by applying regular expressions to each messages with egrep. After grep is switched to bsdgrep on 13-CURRENT I noticed logcheck produces some errors caused by egrep. I investigate them and found most of them happens because regular expressions are invalid according to the definition on re_format(7). But I also found there is one case that caused by bug of bsdgrep. In re_format(7) 'bound' is defined as following. "A bound is '{' followed by an unsigned decimal integer, possibly followed by ',' possibly followed by another unsigned decimal integer, always followed by '}'. The integers must lie between 0 and RE_DUP_MAX (255) inclusive, and if there are two of them, the first may not exceed the second." And there is also following clarification. " A '{' followed by a character other than a digit is an ordinary character, not the beginning of a bound. " So '{100}' is regarded as bound and therefore is invalid as regular expression. But '{foo}' isn't regarded as bound. Hence it's a valid regular expression and matches itself. Gnugrep's egrep works fine with this regular expression. ---------------------------------------------------------------------- yasu@eastasia[1575]% uname -U 1202000 yasu@eastasia[1614]% type egrep egrep is /usr/bin/egrep yasu@eastasia[1615]% egrep --version egrep (GNU grep) 2.5.1-FreeBSD Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. yasu@eastasia[1616]% echo '{foo}' | egrep '{foo}' {foo} yasu@eastasia[1617]% ---------------------------------------------------------------------- But bsdgrep's egrep results in error. ---------------------------------------------------------------------- yasu@rolling-vm-freebsd1[1135]% uname -U 1300131 yasu@rolling-vm-freebsd1[1142]% type egrep egrep is /usr/bin/egrep yasu@rolling-vm-freebsd1[1143]% egrep --version egrep (BSD grep, GNU compatible) 2.6.0-FreeBSD yasu@rolling-vm-freebsd1[1144]% echo '{foo}' | egrep '{foo}' egrep: repetition-operator operand invalid yasu@rolling-vm-freebsd1[1145]% ----------------------------------------------------------------------
It's not grep itself, it's our regex implementation. POSIX says it's UB: *+?{ The asterisk, plus-sign, question-mark, and left-brace shall be special except when used in a bracket expression. Any of the following uses produce undefined results: - If a left-brace is not part of a valid interval expression re_format(7) is free interpretation of the standard. FWIW, the best way is to fix your regular expressions.
(In reply to Yuri Pankov from comment #1) +1 for fixing (+ upstreaming) it to be POSIX compliant
(In reply to Yuri Pankov from comment #1) Thanks for explanation. Then please let me ask one more question. While investigating I also found following two patterns are accepted by gnugrep but rejected bsdgrep. * '(foo|)' * '.{0,256}' According to re_format(7) both of them are invalid as regular expression. Then are they also invalid with POSIX standard?
(In reply to Yasuhiro KIMURA from comment #3) As a matter of fact, yes: * '(foo|)': ---------------------------------------------------------------------- | The vertical-line is special except when used in a bracket expression. A vertical-line appearing first or last in an ERE, or immediately following a vertical-line or a left-parenthesis, or immediately preceding a right-parenthesis, produces undefined results. ---------------------------------------------------------------------- * '.{0,256}': We have RE_DUP_MAX defined to 255 (see /usr/include/limits.h). For reference: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
(In reply to Yuri Pankov from comment #4) I see. Thank you for answering my question.