Bug 252403

Summary: unsafe pointer arithmetic in regcomp()
Product: Base System Reporter: Miod Vallat <miod>
Component: binAssignee: Kyle Evans <kevans>
Status: Closed FIXED    
Severity: Affects Only Me CC: bugs, emaste, kevans
Priority: --- Flags: kevans: mfc-stable12+
kevans: mfc-stable11-
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
suggested patch to fix the issue none

Description Miod Vallat 2021-01-04 07:57:01 UTC
Created attachment 221266 [details]
suggested patch to fix the issue

regcomp.c uses the "start + count < end" idiom to check that there are "count" bytes available in an array of char "start" and "end" both point to.

This is fine, unless "start + count" goes beyond the last element of the array. In this case, pedantic interpretation of the C standard makes the comparison of such a pointer against "end" undefined, and optimizers from hell will happily remove as much code as possible because of this.

An example of this occurs in regcomp.c's bothcases(), which defines bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...

Because bothcases() and p_bracket() are static functions in regcomp.c, there is a real risk of miscompilation if aggressive inlining happens.

The following diff rewrites the "start + count < end" constructs into "end - start > count". Assuming "end" and "start" are always pointing in the array (such as "bracket[3]" above), "end - start" is well-defined and can be compared without trouble.

As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a bit.
Comment 1 commit-hook freebsd_committer 2021-01-08 19:59:50 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d36b5dbe28d8ebab219fa29db533734d47f0c4a3

commit d36b5dbe28d8ebab219fa29db533734d47f0c4a3
Author:     Miod Vallat <miod@online.fr>
AuthorDate: 2021-01-08 18:59:00 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2021-01-08 19:58:35 +0000

    libc: regex: rework unsafe pointer arithmetic

    regcomp.c uses the "start + count < end" idiom to check that there are
    "count" bytes available in an array of char "start" and "end" both point to.

    This is fine, unless "start + count" goes beyond the last element of the
    array. In this case, pedantic interpretation of the C standard makes the
    comparison of such a pointer against "end" undefined, and optimizers from
    hell will happily remove as much code as possible because of this.

    An example of this occurs in regcomp.c's bothcases(), which defines
    bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it
    invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...

    Because bothcases() and p_bracket() are static functions in regcomp.c, there
    is a real risk of miscompilation if aggressive inlining happens.

    The following diff rewrites the "start + count < end" constructs into "end -
    start > count". Assuming "end" and "start" are always pointing in the array
    (such as "bracket[3]" above), "end - start" is well-defined and can be
    compared without trouble.

    As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a
    bit.

    PR:             252403

 lib/libc/regex/regcomp.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)
Comment 2 Kyle Evans freebsd_committer 2021-01-08 20:17:01 UTC
Thanks for the submission! I set the authorship based on your Bugzilla information. I'll mfc this to stable/12 in ~5 days.
Comment 3 commit-hook freebsd_committer 2021-01-24 03:05:55 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=912086c27f9ab75253af8ae7914ae6001035a1b2

commit 912086c27f9ab75253af8ae7914ae6001035a1b2
Author:     Miod Vallat <miod@online.fr>
AuthorDate: 2021-01-08 18:59:00 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2021-01-24 03:04:58 +0000

    libc: regex: rework unsafe pointer arithmetic

    regcomp.c uses the "start + count < end" idiom to check that there are
    "count" bytes available in an array of char "start" and "end" both point to.

    This is fine, unless "start + count" goes beyond the last element of the
    array. In this case, pedantic interpretation of the C standard makes the
    comparison of such a pointer against "end" undefined, and optimizers from
    hell will happily remove as much code as possible because of this.

    An example of this occurs in regcomp.c's bothcases(), which defines
    bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it
    invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...

    Because bothcases() and p_bracket() are static functions in regcomp.c, there
    is a real risk of miscompilation if aggressive inlining happens.

    The following diff rewrites the "start + count < end" constructs into "end -
    start > count". Assuming "end" and "start" are always pointing in the array
    (such as "bracket[3]" above), "end - start" is well-defined and can be
    compared without trouble.

    As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a
    bit.

    PR:             252403
    (cherry picked from commit d36b5dbe28d8ebab219fa29db533734d47f0c4a3)

 lib/libc/regex/regcomp.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)
Comment 4 Kyle Evans freebsd_committer 2021-01-24 03:06:56 UTC
Thanks!