Bug 235887 - awk fails to replace "/ere/" with "$0 ~ /ere/" according to POSIX
Summary: awk fails to replace "/ere/" with "$0 ~ /ere/" according to POSIX
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: standards (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Warner Losh
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-20 16:51 UTC by Tim Chase
Modified: 2021-07-11 14:40 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Chase 2019-02-20 16:51:16 UTC
I've hit a case in which /ere/ doesn't expand the same as
"$0 ~ /ere/" which it should do according to the POSIX spec[0].

The goal was to meet the criterion "one and only one of multiple
regex matches", so I used

  jot 20 | awk '/1/ + /5/ == 1'

(this can be expanded for any number of expressions, e.g. 
"/1/ + /5/ + /7/ == 1", but the example using `jot 20` makes it
easier to demonstrate the problem, looking for lines containing 1 or 5
but not 15)

This gives a parse error:

  $ jot 20 | awk '/1/ + /5/ == 1'
  awk: syntax error at source line 1
   context is
          /1/ + >>>  / <<< 
  awk: bailing out at source line 1

Strangely, wrapping the expressions in parens works as expected:

  $ jot 20 | awk '(/1/) + (/5/) == 1'

However manually performing the replacement documented above
according to the POSIX spec:

  $ jot 20 | awk '$0 ~ /1/ + $0 ~ /5/ == 1'

parses fine (instead of giving the syntax error), so awk isn't doing the
"/ere/ -> $0 ~ /ere/" replacement POSIXly.  However, this also doesn't
give results I'd consider correct (it returns "5" and "15").  Again,
wrapping those expansions in parens gives the expected/correct results:

  $ jot 20 | awk '($0 ~ /1/) + ($0 ~ /5/) == 1'

As a side note, gawk parses the original notation ('/1/ + /5/ == 1')
fine and it does the same as the parenthesized versions above.

-tkc

[0] """

When an ERE token appears as an expression in any context other than
as the right-hand of the '˜' or "!˜" operator or as one of the
built-in function arguments described below, the value of the
resulting expression shall be the equivalent of:

$0 ˜ /ere/

"""
http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
Comment 1 Warner Losh freebsd_committer freebsd_triage 2019-02-20 22:56:36 UTC
I'll look into this. It seems related to other bugs in awk we have already
Comment 2 Warner Losh freebsd_committer freebsd_triage 2019-06-03 19:25:44 UTC
(In reply to Warner Losh from comment #1)
still a bug after the last import
Comment 3 Warner Losh freebsd_committer freebsd_triage 2021-07-08 01:51:42 UTC
Also, upstream has indicated that this may not be fixed.
Comment 4 Warner Losh freebsd_committer freebsd_triage 2021-07-10 19:59:33 UTC
https://github.com/onetrueawk/awk/issues/122

is the issue I've filed upstream.
Comment 5 Warner Losh freebsd_committer freebsd_triage 2021-07-11 14:40:40 UTC
I've reported this bug twice to upstream now https://github.com/onetrueawk/awk/issues/49 and https://github.com/onetrueawk/awk/issues/122. Both times they have rejected it. I'm closing this bug as it's not FreeBSD specific. If you are interested in pursing it, I can only recommend you find a fix for this and work with upstream to get it accepted. I'm sorry I don't have a better outcome here, though.