> locale LANG=C LC_CTYPE=de_DE.ISO8859-1 LC_COLLATE="C" LC_TIME="C" LC_NUMERIC="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= > sed s/ä/ae/ sed: 1: "s/ä/ae/": RE error: trailing backslash (\) > echo -n ä | od -x 0000000 00e4 0000001 Also affects characters like ßçáàîÆ Does not happen with LC_CTYPE=de_DE.UTF-8
The error comes from trying to compile the umlaut as a regex. I managed to create a small reproducer that just calls regcomp. The error seems to come from this snippet in the p_simp_re function in lib/libc/regex/regcomp.c: if ((c & BACKSL) == 0 || may_escape(p, wc)) ordinary(p, wc); else SETERROR(REG_EESCAPE); Both checks in the if statement are false and thus we end up with the trailing backslash error. In may_escape this is the return statement that gets taken: if (isalpha(ch) || ch == '\'' || ch == '`') return (false); ch is the wint_t representation of the umlaut, which is 0xe4. In de_DE.ISO8859-1, the isalpha call returns true. (If I do it with an UTF8 ä in an UTF8 locale, ch becomes also 0xe4, but the isalpha call returns false, so this doesn't trigger the trailing backslash error.)
Created attachment 237678 [details] small reproducer that calls regcomp with an umlaut.
Try the patch https://bugs.freebsd.org/bugzilla/attachment.cgi?id=245157 from the PR 274032
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3fb80f1476c7776f04ba7ef6d08397cef6abcfb0 commit 3fb80f1476c7776f04ba7ef6d08397cef6abcfb0 Author: Christos Zoulas <christos@NetBSD.org> AuthorDate: 2023-08-30 20:37:24 +0000 Commit: Yuri Pankov <yuripv@FreeBSD.org> CommitDate: 2023-09-25 22:49:14 +0000 regcomp: use unsigned char when testing for escapes - cast GETNEXT to unsigned where it is being promoted to int to prevent sign-extension (really it would have been better for PEEK*() and GETNEXT() to return unsigned char; this would have removed a ton of (uch) casts, but it is too intrusive for now). - fix an isalpha that should have been iswalpha PR: 264275, 274032 Reviewed by: kevans, eugen (previous version) Obtained from: NetBSD Differential Revision: https://reviews.freebsd.org/D41947 lib/libc/regex/regcomp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
*** Bug 274032 has been marked as a duplicate of this bug. ***
Taking this as I've agreed to handle the MFC and whatnot -- I tentatively plan to take this one all the way back to 12 and, given that it's reasonably severe for non-C locales, EN it to 13.2 and 12.4. MFC will likely be in ~3-5 days.
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=25307d6c927934dd44628e06cbc7047415fb6931 commit 25307d6c927934dd44628e06cbc7047415fb6931 Author: Christos Zoulas <christos@NetBSD.org> AuthorDate: 2023-08-30 20:37:24 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2023-09-30 01:41:57 +0000 regcomp: use unsigned char when testing for escapes - cast GETNEXT to unsigned where it is being promoted to int to prevent sign-extension (really it would have been better for PEEK*() and GETNEXT() to return unsigned char; this would have removed a ton of (uch) casts, but it is too intrusive for now). - fix an isalpha that should have been iswalpha PR: 264275, 274032 Reviewed by: kevans, eugen (previous version) Obtained from: NetBSD (cherry picked from commit 3fb80f1476c7776f04ba7ef6d08397cef6abcfb0) lib/libc/regex/regcomp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ac695744e2cfb461a64018276fb94999fb0cad9c commit ac695744e2cfb461a64018276fb94999fb0cad9c Author: Christos Zoulas <christos@NetBSD.org> AuthorDate: 2023-08-30 20:37:24 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2023-09-30 01:41:23 +0000 regcomp: use unsigned char when testing for escapes - cast GETNEXT to unsigned where it is being promoted to int to prevent sign-extension (really it would have been better for PEEK*() and GETNEXT() to return unsigned char; this would have removed a ton of (uch) casts, but it is too intrusive for now). - fix an isalpha that should have been iswalpha PR: 264275, 274032 Reviewed by: kevans, eugen (previous version) Obtained from: NetBSD (cherry picked from commit 3fb80f1476c7776f04ba7ef6d08397cef6abcfb0) lib/libc/regex/regcomp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=56b09feb23d98fcd0c4aed8d4f907a5a2f6b5ea9 commit 56b09feb23d98fcd0c4aed8d4f907a5a2f6b5ea9 Author: Christos Zoulas <christos@NetBSD.org> AuthorDate: 2023-08-30 20:37:24 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2023-09-30 01:40:59 +0000 regcomp: use unsigned char when testing for escapes - cast GETNEXT to unsigned where it is being promoted to int to prevent sign-extension (really it would have been better for PEEK*() and GETNEXT() to return unsigned char; this would have removed a ton of (uch) casts, but it is too intrusive for now). - fix an isalpha that should have been iswalpha PR: 264275, 274032 Reviewed by: kevans, eugen (previous version) Obtained from: NetBSD (cherry picked from commit 3fb80f1476c7776f04ba7ef6d08397cef6abcfb0) lib/libc/regex/regcomp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
I've submitted this for EN consideration.
A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=408daf2caa9273b1cbdc8223a3da6e179e922fc2 commit 408daf2caa9273b1cbdc8223a3da6e179e922fc2 Author: Christos Zoulas <christos@NetBSD.org> AuthorDate: 2023-08-30 20:37:24 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2023-10-01 04:46:02 +0000 regcomp: use unsigned char when testing for escapes - cast GETNEXT to unsigned where it is being promoted to int to prevent sign-extension (really it would have been better for PEEK*() and GETNEXT() to return unsigned char; this would have removed a ton of (uch) casts, but it is too intrusive for now). - fix an isalpha that should have been iswalpha PR: 264275, 274032 Reviewed by: kevans, eugen (previous version) Approved by: re (gjb) Obtained from: NetBSD (cherry picked from commit 3fb80f1476c7776f04ba7ef6d08397cef6abcfb0) (cherry picked from commit 56b09feb23d98fcd0c4aed8d4f907a5a2f6b5ea9) lib/libc/regex/regcomp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
Going to go ahead and close this; this has been MFC'd to all supported branches and will appear in the next 14.0 beta.
A commit in branch releng/13.2 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=67264bfe499223cd9864b53975462e3eb57cde2c commit 67264bfe499223cd9864b53975462e3eb57cde2c Author: Christos Zoulas <christos@NetBSD.org> AuthorDate: 2023-08-30 20:37:24 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2023-11-08 00:59:51 +0000 regcomp: use unsigned char when testing for escapes - cast GETNEXT to unsigned where it is being promoted to int to prevent sign-extension (really it would have been better for PEEK*() and GETNEXT() to return unsigned char; this would have removed a ton of (uch) casts, but it is too intrusive for now). - fix an isalpha that should have been iswalpha PR: 264275, 274032 Reviewed by: kevans, eugen (previous version) Obtained from: NetBSD (cherry picked from commit 3fb80f1476c7776f04ba7ef6d08397cef6abcfb0) (cherry picked from commit ac695744e2cfb461a64018276fb94999fb0cad9c) Approved by: so Security: FreeBSD-EN-23:14 lib/libc/regex/regcomp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
A commit in branch releng/12.4 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5e0387e2ec6ed9b14b3c6088c19079db15a52eae commit 5e0387e2ec6ed9b14b3c6088c19079db15a52eae Author: Christos Zoulas <christos@NetBSD.org> AuthorDate: 2023-08-30 20:37:24 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2023-11-08 01:02:08 +0000 regcomp: use unsigned char when testing for escapes - cast GETNEXT to unsigned where it is being promoted to int to prevent sign-extension (really it would have been better for PEEK*() and GETNEXT() to return unsigned char; this would have removed a ton of (uch) casts, but it is too intrusive for now). - fix an isalpha that should have been iswalpha PR: 264275, 274032 Reviewed by: kevans, eugen (previous version) Obtained from: NetBSD (cherry picked from commit 3fb80f1476c7776f04ba7ef6d08397cef6abcfb0) (cherry picked from commit 56b09feb23d98fcd0c4aed8d4f907a5a2f6b5ea9) Approved by: so Security: FreeBSD-EN-23:14 lib/libc/regex/regcomp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)