Using optimized assembly language versions of string functions on amd64 (this is already done on other architectures) would probably be a good idea. Functions adapted from NetBSD (which can be extracted into /usr/src for inclusion in a buildworld) are available at the following URL: http://will.iki.fi/patches/libc-amd64-string.tar.gz I mentioned the issue on -current, but this didn't result in any discussion. The NetBSD functions may or may not all be desirable in their current form; some of the functions have extensive unrolling and may produce a performance penalty for short strings (but a huge advantage for longer ones).
Ville-Pertti Keinonen wrote this message on Mon, Oct 25, 2004 at 11:53 +0000: > >Description: > Using optimized assembly language versions of string functions on amd64 (this is already done on other architectures) would probably be a good idea. > > Functions adapted from NetBSD (which can be extracted into /usr/src for inclusion in a buildworld) are available at the following URL: > http://will.iki.fi/patches/libc-amd64-string.tar.gz > > I mentioned the issue on -current, but this didn't result in any discussion. > > The NetBSD functions may or may not all be desirable in their current form; some of the functions have extensive unrolling and may produce a performance penalty for short strings (but a huge advantage for longer ones). Have you run some benchmarks to validate that the assembly optimized code is faster than the c generated? if you need help analyzing the benchmark numbers, look at ministat in src/tools/tools/ministat... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Responsible Changed From-To: freebsd-amd64->obrien I've got some input from AMD on these.
John-Mark Gurney wrote: >Have you run some benchmarks to validate that the assembly optimized >code is faster than the c generated? if you need help analyzing the >benchmark numbers, look at ministat in src/tools/tools/ministat... > > Yes. There is little enough variation to make statistical analysis pretty much unnecessary. All of the functions are faster on strings >=64 bytes in length (with aligned buffers, but that's fairly typical in real-world use). For large enough strings, all of the functions are at least 5 times as fast, and at best over 10 times as fast. The C functions are pretty bad, especially the block copy/set functions (they're using , at best, int-sized -- 32-bit -- operations). A simplistic test I used (including results) can be found at http://will.iki.fi/misc/strtest.tar.gz
State Changed From-To: open->closed Fixed and MFCed. See: http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/amd64/string/memcmp.S http://lists.freebsd.org/mailman/htdig/cvs-src/2005-April/044356.html