Bug 251775 - bsdgrep: egrep regards '{foo}' as invalid regular expression
Summary: bsdgrep: egrep regards '{foo}' as invalid regular expression
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-12 06:07 UTC by Yasuhiro Kimura
Modified: 2020-12-12 09:37 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yasuhiro Kimura freebsd_committer 2020-12-12 06:07:16 UTC
I'm the maintainer of security/logcheck port. It detects log messages that should be reported by applying regular expressions to each messages with egrep.

After grep is switched to bsdgrep on 13-CURRENT I noticed logcheck produces some errors caused by egrep. I investigate them and found most of them happens because regular expressions are invalid according to the definition on re_format(7). But I also found there is one case that caused by bug of bsdgrep.

In re_format(7) 'bound' is defined as following.

"A bound is '{' followed by an unsigned decimal integer, possibly followed by ',' possibly followed by another unsigned decimal integer, always followed by '}'.  The integers must lie between 0 and RE_DUP_MAX (255) inclusive, and if there are two of them, the first may not exceed the second."

And there is also following clarification.

" A '{' followed by a character other than a digit is an ordinary character, not the beginning of a bound. "

So '{100}' is regarded as bound and therefore is invalid as regular expression. But '{foo}' isn't regarded as bound. Hence it's a valid regular expression and matches itself.

Gnugrep's egrep works fine with this regular expression.

----------------------------------------------------------------------
yasu@eastasia[1575]% uname -U
1202000
yasu@eastasia[1614]% type egrep
egrep is /usr/bin/egrep
yasu@eastasia[1615]% egrep --version
egrep (GNU grep) 2.5.1-FreeBSD

Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

yasu@eastasia[1616]% echo '{foo}' | egrep '{foo}'
{foo}
yasu@eastasia[1617]%
----------------------------------------------------------------------

But bsdgrep's egrep results in error.

----------------------------------------------------------------------
yasu@rolling-vm-freebsd1[1135]% uname -U
1300131
yasu@rolling-vm-freebsd1[1142]% type egrep
egrep is /usr/bin/egrep
yasu@rolling-vm-freebsd1[1143]% egrep --version
egrep (BSD grep, GNU compatible) 2.6.0-FreeBSD
yasu@rolling-vm-freebsd1[1144]% echo '{foo}' | egrep '{foo}'
egrep: repetition-operator operand invalid
yasu@rolling-vm-freebsd1[1145]%
----------------------------------------------------------------------
Comment 1 Yuri Pankov freebsd_committer 2020-12-12 06:22:24 UTC
It's not grep itself, it's our regex implementation.

POSIX says it's UB:

*+?{
    The asterisk, plus-sign, question-mark, and left-brace shall be special except when used in a bracket expression. Any of the following uses produce undefined results:
    - If a left-brace is not part of a valid interval expression

re_format(7) is free interpretation of the standard.

FWIW, the best way is to fix your regular expressions.
Comment 2 Kyle Evans freebsd_committer 2020-12-12 06:26:30 UTC
(In reply to Yuri Pankov from comment #1)

+1 for fixing (+ upstreaming) it to be POSIX compliant
Comment 3 Yasuhiro Kimura freebsd_committer 2020-12-12 06:45:06 UTC
(In reply to Yuri Pankov from comment #1)

Thanks for explanation. Then please let me ask one more question.

While investigating I also found following two patterns are accepted by gnugrep but rejected bsdgrep.

* '(foo|)'
* '.{0,256}'

According to re_format(7) both of them are invalid as regular expression. Then are they also invalid with POSIX standard?
Comment 4 Yuri Pankov freebsd_committer 2020-12-12 07:14:08 UTC
(In reply to Yasuhiro KIMURA from comment #3)
As a matter of fact, yes:

* '(foo|)':
----------------------------------------------------------------------
|
    The vertical-line is special except when used in a bracket expression. A vertical-line appearing first or last in an ERE, or immediately following a vertical-line or a left-parenthesis, or immediately preceding a right-parenthesis, produces undefined results.
----------------------------------------------------------------------

* '.{0,256}':
We have RE_DUP_MAX defined to 255 (see /usr/include/limits.h).


For reference: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
Comment 5 Yasuhiro Kimura freebsd_committer 2020-12-12 07:22:59 UTC
(In reply to Yuri Pankov from comment #4)

I see. Thank you for answering my question.