Bug 265480 - std::regex constructor throwing an exception at backslash-underscore but not other invalid escapes
Summary: std::regex constructor throwing an exception at backslash-underscore but not ...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: standards (show other bugs)
Version: 13.0-RELEASE
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-standards (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-28 17:10 UTC by Milo Cooper
Modified: 2022-07-28 17:17 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Milo Cooper 2022-07-28 17:10:15 UTC
When passing to the std::regex constructor a pattern that include an escaped underscore ("\_"), an exception is thrown with the explanation "The expression contained an invalid escaped character, or a trailing escape."

This does not occur on non-FreeBSD systems, and it does not occur for several other escapes that are normally invalid by regex convention (e.g., \& \= \/).

Is this intentional?
Comment 1 Kyle Evans freebsd_committer freebsd_triage 2022-07-28 17:17:27 UTC
(In reply to Milo Cooper from comment #0)

If std::regex is just a shim for libc regex(3), then yes, this is expected. As of 13.0-ish, we've started rejecting many escapes of ordinary characters (this is UB according to POSIX, and shouldn't be considered portable).

Some of them are imbued with special meaning by libregex to match GNU expectations (e.g., \w/\W, \b/\B, \s/\S, \<, \>), so we're trying to avoid some confusion by not having a "working" expression in both libc and libregex -- it either compiles and you get the GNU behavior, or it doesn't compile and you're made aware that you need libregex if you expected GNU behavior.