Created attachment 249266 [details] Example reproducing the fma() zero sign issue Hi, I exposed the libc fma() function in Python: * https://github.com/python/cpython/commit/8e3c953b3acd7c656f66696806c9ae917e816492 * https://github.com/python/cpython/pull/116667 * https://github.com/python/cpython/issues/73468 The problem is that the Python test suite fails on FreeBSD x86-64 on tests on the zero sign: test_fma_zero_result() https://github.com/python/cpython/blob/cd2ed917801b93fb46d1dcf19dd480e5146932d8/Lib/test/test_math.py#L2694-L2748 Example: --- #include <math.h> #include <string.h> #include <stdio.h> void set_double(double *x, const char *bytes) { memcpy(x, bytes, sizeof(*x)); } int main(int argc, char **argv) { double x = 0; //float.fromhex('0x1p-500') double y = 0; //float.fromhex('0x1p-550') double z = 0; //float.fromhex('0x1p-1000') #if 1 // condition always true, but trick the compiler, otherwise it will not // call fma() at runtime, but just compute the result at build time. if (argc) { set_double(&x, "\x00\x00\x00\x00\x00\x00\xb0\x20"); } else { x = 123; } #else set_double(&x, "\x00\x00\x00\x00\x00\x00\xb0\x20"); #endif set_double(&y, "\x00\x00\x00\x00\x00\x00\x90\x1d"); set_double(&z, "\x00\x00\x00\x00\x00\x00p\x01"); printf("x %g\n", x); printf("y %g\n", y); printf("z %g\n", z); double a, b, c, r; a = x-y; b = x+y; c = -z; r = fma(a, b, c); printf("fma(x-y, x+y, -z):\n"); printf("fma(%+g, %+g, %+g) = %+g\n", a, b, c, r); return 0; } --- Output on x86-64: --- $ clang x.c -Og -o x -lm && ./x x 3.05494e-151 y 2.71333e-166 z 9.33264e-302 fma(x-y, x+y, -z): fma(+3.05494e-151, +3.05494e-151, -9.33264e-302) = +0 --- fma() result is "+0" which is wrong. For example, if you replace "#if 1" with "#if 0", you get: --- (...) fma(+3.05494e-151, +3.05494e-151, -9.33264e-302) = -0 --- fma() result is now "-0" which is correct. This time, the result was computed at build time. Extract of the Python tests which fail on FreeBSD: --- # Corner case where rounding the multiplication would # give the wrong result. x = float.fromhex('0x1p-500') y = float.fromhex('0x1p-550') z = float.fromhex('0x1p-1000') self.assertIsNegativeZero(math.fma(x-y, x+y, -z)) self.assertIsPositiveZero(math.fma(y-x, x+y, z)) self.assertIsNegativeZero(math.fma(y-x, -(x+y), -z)) self.assertIsPositiveZero(math.fma(x-y, -(x+y), z)) --- For example, on Linux x86-64, the example works as expected with GCC and clang: --- vstinner@mona$ gcc x.c -o x -O2 -lm && ./x x 3.05494e-151 y 2.71333e-166 z 9.33264e-302 fma(x-y, x+y, -z): fma(+3.05494e-151, +3.05494e-151, -9.33264e-302) = -0 vstinner@mona$ clang x.c -o x -O2 -lm && ./x x 3.05494e-151 y 2.71333e-166 z 9.33264e-302 fma(x-y, x+y, -z): fma(+3.05494e-151, +3.05494e-151, -9.33264e-302) = -0 --- fma() returns -0 which is correct. On the bug, I selected "kern" component. I expect that the bug is more in the libc, but I cannot see "libc" choice.
Thank you for the report and C reproduction case. Do you know which other implementations are being tested? (macOS, musl, bionic?) Steve would you be able to take an initial look at this?
So, I don't loose a reduced version of the code: #include <math.h> #include <stdio.h> int main(void) { volatile double x = 0x1p-500, y = 0x1p-550, z = 0x1p-1000; double a, b, c, r; a = x-y; b = x+y; c = -z; r = fma(a, b, c); /* * Should print -0 instead of +0. */ printf("fma(%+a, %+a, %+a) = %+g\n", a, b, c, r); return 0; } % cc -o z -O3 a.c -lm && ./z fma(+0x1.ffffffffffff8p-501, +0x1.0000000000004p-500, -0x1p-1000) = +0 % cc -o z -O3 a.c -mfma -lm && ./z fma(+0x1.ffffffffffff8p-501, +0x1.0000000000004p-500, -0x1p-1000) = -0
If one looks in src/s_fma.c, one sees #ifdef USE_BUILTIN_FMA double fma(double x, double y, double z) { return (__builtin_fma(x, y, z)); } #else If libm is built with -DUSE_BUILTIN_FMA, then one get -0.
Ok, with the given input to fma, one arrives at line 271 return (xy.hi + vzs + ldexp(xy.lo, spread)); (gdb) p vzs $12 = -0.5 (gdb) p xy.hi $13 = 0.5 (gdb) p xy.lo $14 = -3.944304526105059e-31 (gdb) p spread $15 = -999 Turns out that ldexp(3.944304526105059e-31, -999) = -0x0p+0. So, you have 0.5 - 0.5 - 0. = 0. - 0. = +0 So, a pessimestic patch is Index: src/s_fma.c =================================================================== --- src/s_fma.c (revision 2834) +++ src/s_fma.c (working copy) @@ -267,7 +267,9 @@ */ fesetround(oround); volatile double vzs = zs; /* XXX gcc CSE bug workaround */ - return (xy.hi + vzs + ldexp(xy.lo, spread)); + xs = ldexp(xy.lo, spread); + xy.hi += vzs; + return (xy.hi == 0 ? xs : xy.hi + xs); } if (oround != FE_TONEAREST) {
Ed: > Thank you for the report and C reproduction case. Do you know which other implementations are being tested? (macOS, musl, bionic?) Python test_math checks fma() on Windows, Linux, macOS, WASI, FreeBSD, etc. The tests pass on all platforms but FreeBSD and WASI. Apparently, the WASI libc is based musl, but I failed to reproduce the issue using musl-gcc on Linux. bionic: Python does not officially support Android yet, see: https://peps.python.org/pep-0738/
(In reply to Steve Kargl from comment #4) A similar issue is present in src/s_fmal.c. The patch in comment #4 can be adapted for fmal.
Proposed patch for fma and fmal in https://reviews.freebsd.org/D44433 We can switch to BUILTIN_FMA in a later change
> Python does not officially support Android yet, see: > https://peps.python.org/pep-0738/ I'm working on adding Android support to Python, and I've found that this issue exists on Android x86_64, but ARM64 is OK.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=888796ade2842486d3167067e8034254c38aadd3 commit 888796ade2842486d3167067e8034254c38aadd3 Author: Ed Maste <emaste@FreeBSD.org> AuthorDate: 2024-03-19 14:31:39 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2024-06-08 15:55:36 +0000 libm: fma: correct zero sign with small inputs PR: 277783 Reported by: Victor Stinner Submitted by: kargl MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D44433 lib/msun/src/s_fma.c | 4 +++- lib/msun/src/s_fmal.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e77ad954bb825983b4346b9cc646c9c910b1be24 commit e77ad954bb825983b4346b9cc646c9c910b1be24 Author: Ed Maste <emaste@FreeBSD.org> AuthorDate: 2024-06-12 01:34:02 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2024-06-12 01:36:12 +0000 Revert "libm: fma: correct zero sign with small inputs" This change introduced a test failure, so revert until that can be addressed. This reverts commit 888796ade2842486d3167067e8034254c38aadd3. PR: 277783 Reported by: rlibby Sponsored by: The FreeBSD Foundation lib/msun/src/s_fma.c | 4 +--- lib/msun/src/s_fmal.c | 4 +--- 2 files changed, 2 insertions(+), 6 deletions(-)
Someone smarter than me may need to look at this! With the original src/s_fma.c, I see dble (x,y,z): -0x1p+0 0x1p+0 0x1p+0 mpfr (x,y,z): -0x1p+0 0x1p+0 0x1p+0 mpfr libm RNDN: 0x0p+0 0x0p+0 RNDU: 0x0p+0 0x0p+0 RNDD: -0x0p+0 -0x0p+0 RNDZ: 0x0p+0 0x0p+0 dble (x,y,z): 0x1p+0 -0x1p+0 0x1p+0 mpfr (x,y,z): 0x1p+0 -0x1p+0 0x1p+0 mpfr libm RNDN: 0x0p+0 0x0p+0 RNDU: 0x0p+0 0x0p+0 RNDD: -0x0p+0 -0x0p+0 RNDZ: 0x0p+0 0x0p+0 dble (x,y,z): -0x1p+0 -0x1p+0 -0x1p+0 mpfr (x,y,z): -0x1p+0 -0x1p+0 -0x1p+0 mpfr libm RNDN: 0x0p+0 0x0p+0 RNDU: 0x0p+0 0x0p+0 RNDD: -0x0p+0 -0x0p+0 RNDZ: 0x0p+0 0x0p+0 dble (x,y,z): 0x1.ffffffffffff8p-501 0x1.0000000000004p-500 -0x1p-1000 mpfr (x,y,z): 0xf.fffffffffffcp-504 0x1.0000000000004p-500 -0x1p-1000 mpfr libm RNDN: -0x0p+0 0x0p+0 RNDU: -0x0p+0 0x0p+0 RNDD: -0x1p-1074 -0x1p-1074 RNDZ: -0x0p+0 0x0p+0 The first groups of three are fma(-1.,1.,1.), fma(1.,-1.,1.), and fma(-1.,-1.,-1.) with the four rounding modes. The group is Victor's example where the rounding is wrong. If one looks in src/s_fma.c, there is a special case for addends that sum to zero. This is the 'if (r.hi == 0.0) {}' block of code (lines 263-274). If I comment out this block, I get wrong results for fma(-1.,1.,1.), fma(1.,-1.,1.), and fma(-1.,-1.,-1.), but correct results for Victor's input. :( dble (x,y,z): -0x1p+0 0x1p+0 0x1p+0 mpfr (x,y,z): -0x1p+0 0x1p+0 0x1p+0 mpfr libm RNDN: 0x0p+0 0x0p+0 RNDU: 0x0p+0 0x0p+0 RNDD: -0x0p+0 0x0p+0 RNDZ: 0x0p+0 0x0p+0 dble (x,y,z): 0x1p+0 -0x1p+0 0x1p+0 mpfr (x,y,z): 0x1p+0 -0x1p+0 0x1p+0 mpfr libm RNDN: 0x0p+0 0x0p+0 RNDU: 0x0p+0 0x0p+0 RNDD: -0x0p+0 0x0p+0 RNDZ: 0x0p+0 0x0p+0 dble (x,y,z): -0x1p+0 -0x1p+0 -0x1p+0 mpfr (x,y,z): -0x1p+0 -0x1p+0 -0x1p+0 mpfr libm RNDN: 0x0p+0 0x0p+0 RNDU: 0x0p+0 0x0p+0 RNDD: -0x0p+0 0x0p+0 RNDZ: 0x0p+0 0x0p+0 dble (x,y,z): 0x1.ffffffffffff8p-501 0x1.0000000000004p-500 -0x1p-1000 mpfr (x,y,z): 0xf.fffffffffffcp-504 0x1.0000000000004p-500 -0x1p-1000 mpfr libm RNDN: -0x0p+0 -0x0p+0 RNDU: -0x0p+0 -0x0p+0 RNDD: -0x1p-1074 -0x1p-1074 RNDZ: -0x0p+0 -0x0p+0
Created attachment 251421 [details] Test code that shows results compared to MPFR This is the test code that includes the comparisons for the testsuite and Victor's input under the four rounding modes. It requires MPFR and GMP to compile and execute.
Created attachment 251488 [details] testcase for long double This contains a test that shows the issue for long double (on amd64).
Created attachment 251489 [details] New patch The code for fma() and fmal() contain a special case when the addends sum to zero, but it seems that it mishandles small low order bits especially if a subnormal condition occurs. One of the testcases in msun/tests (that I broke with the original patch) is fma(-1.,1.,1.) with the intermediate results (x,y,z): -0x1p+0 0x1p+0 0x1p+0 xy = {-0x1p-2, 0x0p+0} r = {0x0p+0, 0x0p+0} spread = 2 zs = 0x1p-2 fma returns (xy.hi + zs +ldexp(xy.lo,spread)), which is fine. For the case that Victor submitted one has (x,y,z): 0x1.ffffffffffff8p-501 0x1.0000000000004p-500 -0x1p-1000 xy = {5.000000e-01, -3.944305e-31} r = {0.000000e+00 0.000000e+00} spread = -999 zs = -5.000000e-01 For the original code, the special leads to ldexp(xy.lo,spread)=ldexp(-3.944305e-31,-999), is not good. The new patch skips the special case if xy.lo != 0. This then allows fma() to call its add_and_denormalize() function.
(In reply to Steve Kargl from comment #14) I think the comment may need an update too: /* * When the addends cancel to 0, ensure that the result has * the correct sign. */
Also for reference I have staged this in my working tree as: commit fa22569968175ac79486ee467e39f570fe4b09cd (HEAD -> wipbsd) Author: Steve Kargl <kargl@FreeBSD.org> AuthorDate: Sun Jun 16 19:41:38 2024 -0400 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: Mon Jun 17 09:29:51 2024 -0400 libm: fma: correct zero sign with small inputs This is a fixed version of 888796ade284. PR: 277783 Reported by: Victor Stinner Reviewed by: emaste MFC after: 1 week
Perhaps, the comments needs an update, but I'm not sure to what. I did look at https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libm/src/s_fma.c?annotate=1.7 which is based on das@ code. This code does not have the special, but if I read it correctly would fall through to the subnormal case in the lines 174-189 block of code with Victor's values. netbsd uses essentially freebsd's code with a few minor changes, so likely has the same issue as in this bug report.
(In reply to Steve Kargl from comment #17) > Perhaps, the comments needs an update, but I'm not sure to what. The comment has "When the addends cancel to 0", which exactly mirrored the if condition. We could extend it to something like "When the addends cancel to 0 and ..." but that seems excessive. Perhaps just "Ensure the result has the correct sign ." which leaves a clue as to why the special case exists.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=34f746cc7f8a8dd261027a8b392b76e70adc8438 commit 34f746cc7f8a8dd261027a8b392b76e70adc8438 Author: Steve Kargl <kargl@FreeBSD.org> AuthorDate: 2024-06-16 23:41:38 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2024-07-28 21:37:45 +0000 libm: fma: correct zero sign with small inputs This is a fixed version of 888796ade284. PR: 277783 Reported by: Victor Stinner Reviewed by: emaste MFC after: 1 week lib/msun/src/s_fma.c | 4 ++-- lib/msun/src/s_fmal.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=dc39004bc670fe33ae6759816380c93a37268dd6 commit dc39004bc670fe33ae6759816380c93a37268dd6 Author: Steve Kargl <kargl@FreeBSD.org> AuthorDate: 2024-06-16 23:41:38 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2024-08-06 21:25:10 +0000 libm: fma: correct zero sign with small inputs This is a fixed version of 888796ade284. PR: 277783 Reported by: Victor Stinner Reviewed by: emaste MFC after: 1 week (cherry picked from commit 888796ade2842486d3167067e8034254c38aadd3) (cherry picked from commit e77ad954bb825983b4346b9cc646c9c910b1be24) (cherry picked from commit 34f746cc7f8a8dd261027a8b392b76e70adc8438) lib/msun/src/s_fma.c | 4 ++-- lib/msun/src/s_fmal.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
^Triage: assign to committer for MFC review.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3d77026d8a48a8d056a145b56f288f838c159afc commit 3d77026d8a48a8d056a145b56f288f838c159afc Author: Steve Kargl <kargl@FreeBSD.org> AuthorDate: 2024-06-16 23:41:38 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2024-10-01 01:19:13 +0000 libm: fma: correct zero sign with small inputs This is a fixed version of 888796ade284. PR: 277783 Reported by: Victor Stinner Reviewed by: emaste MFC after: 1 week (cherry picked from commit 888796ade2842486d3167067e8034254c38aadd3) (cherry picked from commit e77ad954bb825983b4346b9cc646c9c910b1be24) (cherry picked from commit 34f746cc7f8a8dd261027a8b392b76e70adc8438) (cherry picked from commit dc39004bc670fe33ae6759816380c93a37268dd6) lib/msun/src/s_fma.c | 4 ++-- lib/msun/src/s_fmal.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)