Bug 171725 - awk(1) does not support word-boundary metacharacters
Summary: awk(1) does not support word-boundary metacharacters
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 9.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Warner Losh
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-18 01:00 UTC by Devin Teske
Modified: 2023-09-17 14:12 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Devin Teske freebsd_committer freebsd_triage 2012-09-18 01:00:21 UTC
The awk(1) manual describes (quote) "regular expressions are as in egrep; see grep(1)."

This leads one to believe that awk(1) supports both basic REs and extended REs.

A discrepancy/gap has been found in this stated coverage.

one-true-awk (our awk(1)) does not support the word-boundary metacharacters (\< and \>) for matching beginning- and ending-of-word.

Fix: 

There are two proposed solutions.

1. Add a note to the awk(1) manual stating it does not support all RE metacharacters (note that '\<' and '\>' are valid both as BRE and ERE).

or

2. Enhance awk(1) to support these BRE/ERE metacharacters so that the awk(1) manual is accurate without a patch.
How-To-Repeat: $ echo xxxa | awk '/xxx\>/{print}'
### this produces no output, as-is expected since
### the word "xxxa" does not end in "xxx"

$ echo xxx | awk '/xxx\>/{print}'
### this too produces no output, indicating that
### \> is not a valid metacharacter for "end-of-word"
Comment 1 devin.teske 2012-10-12 17:18:34 UTC
Swapping \< and \> (GNU syntax) for [[:<:]] and [[:>:]] (POSIX syntax) resp=
ectively.

This makes no difference as one-true-awk uses its own regular expression co=
de (read: does not use libc which already supports [[:</>:]] word-bounding).
--=20
Devin

_____________
The information contained in this message is proprietary and/or confidentia=
l. If you are not the intended recipient, please: (i) delete the message an=
d all copies; (ii) do not disclose, distribute or use the message in any ma=
nner; and (iii) notify the sender immediately. In addition, please be aware=
 that any message addressed to our domain is subject to archiving and revie=
w by persons other than the intended recipient. Thank you.
Comment 2 devin.teske 2012-10-12 17:24:00 UTC
Enhancing awk(1), we should add support for "\<" and "\>" (not "[[:<:]]" an=
d "[[:>:]]"). The reason we should support the former syntax is due to the =
awk(1) manual which says (rather sparsely) "regular expressions are as-in e=
grep(1)".

If we are to rectify the functionality-gap between the awk(1) manual and aw=
k(1) functionality, we should either fix the awk(1) manual (for example, to=
 say "regular expressions are as-in egrep(1) except for \< and \>") or we s=
hould (preferred) add support for \< and \> so that the manual becomes accu=
rate without modification.

That is to say, that awk(1) should _not_ be enhanced to support [[:<:]] and=
 [[:>:]] as this would only further add to the functionality-gap between wh=
at is documented and what is expected.
--=20
Devin

_____________
The information contained in this message is proprietary and/or confidentia=
l. If you are not the intended recipient, please: (i) delete the message an=
d all copies; (ii) do not disclose, distribute or use the message in any ma=
nner; and (iii) notify the sender immediately. In addition, please be aware=
 that any message addressed to our domain is subject to archiving and revie=
w by persons other than the intended recipient. Thank you.
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:55 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 4 Warner Losh freebsd_committer freebsd_triage 2021-07-31 05:47:07 UTC
This is still a bug after the latest one true awk import.
Comment 5 Warner Losh freebsd_committer freebsd_triage 2023-09-17 14:12:59 UTC
Still a bug in 14.0
Upstream has indicated a reluctance to fix this.