Bug 54410 - one-true-awk not POSIX compliant (no extended REs)
Summary: one-true-awk not POSIX compliant (no extended REs)
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: standards (show other bugs)
Version: 5.1-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: Warner Losh
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-07-12 11:40 UTC by Jens Schweikhardt
Modified: 2020-03-26 11:48 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Schweikhardt 2003-07-12 11:40:09 UTC
	Our /usr/bin/awk understands only basic RE, not Extended RE, as
	required by IEEE Std 1003.1-2001:

References:

<quote std="IEEE Std 1003.1-2001" section=awk>
...
Regular Expressions

The awk utility shall make use of the extended regular expression
notation (see the Base Definitions volume of IEEE Std 1003.1-2001,
Section 9.4, Extended Regular Expressions)
</quote>

<quote std="IEEE Std 1003.1-2001" section=ere>
EREs Matching Multiple Characters
...
5. When an ERE matching a single character or an ERE enclosed in
parentheses is followed by an interval expression of the format "{m}" ,
"{m,}" , or "{m,n}" , together with that interval expression it shall
match what repeated consecutive occurrences of the ERE would match. The
values of m and n are decimal integers in the range 0 <= m<= n<=
{RE_DUP_MAX}, where m specifies the exact or minimum number of
occurrences and n specifies the maximum number of occurrences. The
expression "{m}" matches exactly m occurrences of the preceding ERE,
"{m,}" matches at least m occurrences, and "{m,n}" matches any number of
occurrences between m and n, inclusive.
</quote>

Fix: 

It's probaly POLA violation to change the default RE style from
	BRE to ERE, but we should add a POSIX mode that uses BRE (e.g.
	gawk needs --posix to be compliant).
How-To-Repeat: 	echo e | /usr/bin/awk '/e{1}/'          # should print e, but prints nothing
Comment 1 Jens Schweikhardt 2004-10-13 20:51:12 UTC
Upon further investigation, awk appears only to be missing the {} ERE
operator in its variations. Other ERE operators like + and | work as
expected. I've sent a bug report to bwk asking if he wants to fix it or
would be happy with a patch.
Comment 2 Jens Schweikhardt 2004-10-13 20:57:29 UTC
Upon further investigation, awk appears only to be missing the {} ERE
operator in its variations. Other ERE operators like + and | work as
expected. I've sent a bug report to bwk asking if he wants to fix it or
would be happy with a patch.

Jens
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:29 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 4 Martijn Dekker 2019-02-04 09:48:32 UTC
Apple's branch of onetrueawk has supported ERE bounds (a.k.a. interval/repetition expressions) since at least 2009, judging by the date stamp on their src/b.c in:
https://opensource.apple.com/tarballs/awk/awk-24.tar.gz

I reported this to NetBSD:
http://gnats.netbsd.org/53885

They swiftly integrated the ERE bounds code into their onetrueawk branch. The NetBSD diff can be seen at:
https://github.com/NetBSD/src/commit/f3e4c4ca1dfcdd939a2e33ebfe708f01e25b3bae

I hope FreeBSD will follow suit and add this required POSIX feature as well.
Comment 5 Warner Losh freebsd_committer 2019-02-04 17:30:51 UTC
I have a bunch of changes for awk in my queue, and I'll add this one as well.
Comment 6 Martijn Dekker 2019-03-23 12:58:15 UTC
FYI, the ERE bounds (a.k.a. interval/repetition expressions) code has now been merged upstream. See https://github.com/onetrueawk/awk
Comment 7 Marcin Cieślak 2020-03-26 11:48:45 UTC
Currently our manpage is wrong:

awk supports extended regular expressions (EREs).  
See re_format(7) for more information on regular expressions.  

Shall we fix the manpage or import newer one true awk?