I would like to request an exp-run to validate the SIMD libc enhancements ported from amd64 to aarch64 as part of Google Summer of Code 2024. I was asked by my mentor fuz to request this exp-run to validate the contents of D46459, D46452, D46417, D46399, D46398, D46396, D46292, D46272, D46251, D46243, D46170, D45943, D45839, D45623, and D45621 by building all ports, once without, and once with the patch set applied. The results can then be compared to see if the patch set caused any new failures. The repository https://github.com/soppelmann/freebsd-src has been prepared for this purpose. Tag <reference> points to the last commit prior to the SIMD enhancements and tag <exprunSIMDports> points to the end of the branch branched off <reference> with the patches applied. Please run two exp-runs on aarch64 based on these two source trees with a current ports tree (the same tree for both) and indicate if there were any changes in build failures between the two runs.
Please attach the full patch set as produced by git-format-patch to the PR so it's clear what the exp-run should be run on. I confirm that this has been discussed and requested by me.
We do not have hardware to do exp-run on aarch64
Ok. I will perform the exp-run on my own hardware and report the result here.
exp-run is fine. Will proceed with commit following final approval from emaste.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=79e01e7e643c9337d8d6046b6db7df674475a099 commit 79e01e7e643c9337d8d6046b6db7df674475a099 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-28 13:13:45 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: add bcopy & bzero wrapper This patch enabled usage of SIMD enhanced functions to implement bcopy and bzero. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46459 lib/libc/aarch64/string/Makefile.inc | 4 +++- lib/libc/aarch64/string/bcopy.c (new) | 14 ++++++++++++++ lib/libc/aarch64/string/bzero.c (new) | 14 ++++++++++++++ 3 files changed, 31 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3dc5429158cf221374cdbd0bbb728962bff4fb76 commit 3dc5429158cf221374cdbd0bbb728962bff4fb76 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:15:34 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: add strncat SIMD implementation This patch requires D46170 as it depends on strlcpy being labeled __memccpy. It's a direct copy from the amd64 string functions. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46292 lib/libc/aarch64/string/Makefile.inc | 3 ++- lib/libc/aarch64/string/strncat.c (new) | 29 +++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3863fec1ce2dc6033f094a085118605ea89db9e2 commit 3863fec1ce2dc6033f094a085118605ea89db9e2 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 19:54:32 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: add strlen SIMD implementation Adds a SIMD enhanced strlen for Aarch64. It takes inspiration from the amd64 implementation but I struggled getting the performance I had hoped for on cores like the Graviton3 when compared to the existing implementation from Arm Optimized Routines. See the DR for bechmark results. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D45623 lib/libc/aarch64/string/Makefile.inc | 4 +-- lib/libc/aarch64/string/strlen.S (new) | 46 ++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+), 2 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=89b3872376cbb6e8ab53cb50fa8c4c6d14e2d405 commit 89b3872376cbb6e8ab53cb50fa8c4c6d14e2d405 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:14:08 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:39 +0000 lib/libc/aarch64/string: add optimized strpbrk & strsep implementations These are direct copies from the amd64 string functions using the optimized strcspn from D46398 Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46399 lib/libc/aarch64/string/Makefile.inc | 4 ++- lib/libc/aarch64/string/strpbrk.c (new) | 43 +++++++++++++++++++++++++ lib/libc/aarch64/string/strsep.c (new) | 57 +++++++++++++++++++++++++++++++++ 3 files changed, 103 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b91003acffe7b50dd6506be15116c6b42fc512c6 commit b91003acffe7b50dd6506be15116c6b42fc512c6 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:13:54 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:39 +0000 lib/libc/aarch64/string: add strspn optimized implementation This is a port of the Scalar optimized variant of strspn for amd64 to aarch64. It utilizes a LUT to speed up the function, a SIMD variant is still under development. See the DR for benchmark results. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46396 lib/libc/aarch64/string/Makefile.inc | 4 +- lib/libc/aarch64/string/strspn.S (new) | 111 +++++++++++++++++++++++++++++++++ 2 files changed, 114 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ce6af7a49ec7949c70f144f1b461b587ca7efd32 commit ce6af7a49ec7949c70f144f1b461b587ca7efd32 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-28 13:13:55 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 share/man/man7/simd.7: document SIMD-enhanced aarch64 functions This documents all the newly ported SIMD-enhanced string functions for the aarch64 platform. Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) Relnotes: yes PR: 281175 Differential Revision: https://reviews.freebsd.org/D46452 share/man/man7/simd.7 | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=bea89d038ac54048bb7dcb149cabd99067e5a3a9 commit bea89d038ac54048bb7dcb149cabd99067e5a3a9 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 21:10:16 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: add strlcat SIMD implementation This patch requires D46243 as it depends on strlcpy being labeled __strlcpy. It's a direct copy from the amd64 string functions using memchr and strlcpy to implement strlcat. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46272 lib/libc/aarch64/string/Makefile.inc | 3 ++- lib/libc/aarch64/string/memchr.S (new) | 4 ++++ lib/libc/aarch64/string/strlcat.c (new) | 25 +++++++++++++++++++++++++ 3 files changed, 31 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=25c485e147691f3929b0b5029bab58bf56d3606b commit 25c485e147691f3929b0b5029bab58bf56d3606b Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:14:37 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: add strncmp SIMD implementation This changeset includes a port of the SIMD implementation of strncmp for amd64 to Aarch64. It is based on D45839 with added handling for the limit. An extended unit test for strncmp is currently being written to make sure the bounds checks for page crossings work as expected. Performance is significantly better than the existing implementation from the Arm Optimized Routines repository. Benchmark results are generated by the strperf utility by fuz. See the DR for benchmark results. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D45943 lib/libc/aarch64/string/Makefile.inc | 4 +- lib/libc/aarch64/string/strncmp.S (new) | 569 ++++++++++++++++++++++++++++++++ 2 files changed, 571 insertions(+), 2 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=756b7fc80837567d114a3c93e9bb987e219a1b23 commit 756b7fc80837567d114a3c93e9bb987e219a1b23 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:14:31 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: add strlcpy SIMD implementation This changeset includes a port of the SIMD implementation of strlcpy for amd64 to Aarch64. It is based on memccpy (D46170) with some minor differences. Performance is significantly better than the scalar implementation. Benchmark results are as usual generated by the strperf utility written by fuz. See the DR for benchmark results. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46243 lib/libc/aarch64/string/Makefile.inc | 3 +- lib/libc/aarch64/string/strlcpy.S (new) | 316 ++++++++++++++++++++++++++++++++ 2 files changed, 318 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=bad17991c06d684e9053938d00a07b962e2fd31c commit bad17991c06d684e9053938d00a07b962e2fd31c Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:15:13 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: add memccpy SIMD implementation This changeset includes a port of the SIMD implementation of memccpy for amd64 to Aarch64. Performance is significantly better than the scalar implementation except for short strings. Benchmark results are as usual generated by the strperf utility written by fuz. See the DR for benchmark results. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46170 lib/libc/aarch64/string/Makefile.inc | 3 +- lib/libc/aarch64/string/memccpy.S (new) | 271 ++++++++++++++++++++++++++++++++ 2 files changed, 273 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5e7d93a604400ca3c9db3be1df82ce963527740c commit 5e7d93a604400ca3c9db3be1df82ce963527740c Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:13:31 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:39 +0000 lib/libc/aarch64/string: add strcmp SIMD implementation This changeset includes a port of the SIMD implementation of strcmp for amd64 to Aarch64. Below is a description of its method as described in D41971. The basic idea is to process the bulk of the string in aligned blocks of 16 bytes such that one string runs ahead and the other runs behind. The string that runs ahead is checked for NUL bytes, the one that runs behind is compared with the corresponding chunk of the string that runs ahead. This trades an extra load per iteration for the very complicated block-reassembly needed in the other implementations (bionic, glibc). On the flip side, we need two code paths depending on the relative alignment of the two buffers. The initial part of the string is compared directly if it is known not to cross a page boundary. Otherwise, a complex slow path to avoid crossing into unmapped memory commences. Performance is better in most cases than the existing implementation from the Arm Optimized Routines repository. See the DR for benchmark results. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D45839 lib/libc/aarch64/string/Makefile.inc | 4 +- lib/libc/aarch64/string/strcmp.S (new) | 350 +++++++++++++++++++++++++++++++++ 2 files changed, 353 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5ebd4d0dd2f45040aa5e5b028a4b93163aea6899 commit 5ebd4d0dd2f45040aa5e5b028a4b93163aea6899 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:13:44 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: add memcpy SIMD implementation I noticed that we have a SIMD optimized memcpy in the arm-optimized-routines in /contrib. This patch ensures we use the SIMD variant as opposed to the Scalar optimized variant. Benchmarks are generated by fuz' strperf utility. See the DR for benchmark results. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46251 lib/libc/aarch64/string/memcpy.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=79287d783c72f95eb47c26dbfdfca279086e16a9 commit 79287d783c72f95eb47c26dbfdfca279086e16a9 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:14:15 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:40 +0000 lib/libc/aarch64/string: strcat enable use of SIMD Call into SIMD strlen and stpcpy for an optimized strcat. Port of D42600 for amd64. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46417 lib/libc/aarch64/string/Makefile.inc | 3 ++- lib/libc/aarch64/string/strcat.c (new) | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f2bd390a54f183f85dd7faab815740fb3bea9591 commit f2bd390a54f183f85dd7faab815740fb3bea9591 Author: Getz Mikalsen <getz@FreeBSD.org> AuthorDate: 2024-08-26 18:14:01 +0000 Commit: Robert Clausecker <fuz@FreeBSD.org> CommitDate: 2025-01-10 15:02:39 +0000 lib/libc/aarch64/string: add strcspn optimized implementation This is a port of the Scalar optimized variant of strcspn for amd64 to aarch64 It utilizes a LUT to speed up the function, a SIMD variant is still under development. Performance benchmarks are as usual generated by strperf. See the DR for benchmark results. Tested by: fuz (exprun) Reviewed by: fuz, emaste Sponsored by: Google LLC (GSoC 2024) PR: 281175 Differential Revision: https://reviews.freebsd.org/D46398 lib/libc/aarch64/string/Makefile.inc | 3 +- lib/libc/aarch64/string/strcspn.S (new) | 109 ++++++++++++++++++++++++++++++++ 2 files changed, 111 insertions(+), 1 deletion(-)
Landed all changes except the memcmp ones, pending rework of the patch for better performance.