| Summary: | libm's log1p not working as designed on Intel architectures. | ||
|---|---|---|---|
| Product: | Base System | Reporter: | neis <neis> |
| Component: | i386 | Assignee: | freebsd-bugs (Nobody) <bugs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | Unspecified | ||
| Hardware: | Any | ||
| OS: | Any | ||
On Mon, 15 May 2000 neis@cdc.informatik.tu-darmstadt.de wrote: > While "porting" current libm to OS/2 (i.e. recompiling and running > various tests), I noticed that assembler code for log1p(x) is basically > as follows: Compute x+1, than call fyl2x. This is only an efficient bug under FreeBSD. log1p.S is too broken to use, so FreeBSD doesn't use it: RCS file: /home/ncvs/src/lib/msun/Makefile,v Working file: Makefile head: 1.23 ... ---------------------------- revision 1.14 date: 1997/02/15 05:21:16; author: bde; state: Exp; lines: +4 -2 Disabled the i387 version if log1p(). It just evaluates log(1 + x). This defeats the point of log1p(). ucbtest reports errors of +-5e+15 ULPs. A correct version would use the i387 fyl2xp1 instruction for small x and maybe scale to small x. The C version does the scaling reasonably efficiently, and fyl2px1 is slow (at least on P5s), so not much is lost by always using the C version (only 25% for small x even with the broken i387 version; 50% for large x). ---------------------------- You can find a correct version in glibc (version 2.1.1. at least). > On a related issue, the various wrapper functions around assembler > code cause an additional function call which really causes a > performance loss. > I have been able to speed up e.g. "acos" by more than then percent > by replacing the assembler routine __ieee754_acos with inline > assembler code. A non-inline version of (the i387 version of) __ieee754_acos() is only about 2% slower than the inline version. (Inlining acos doesn't help much because the inlined code is quite large and slow; the speedup for sqrt() is relatively much larger.) I've never worried much about even 10% speedups for inlining. Usuually you only get the 10% speedups for simplistic benchmarks where everything is cached. Bruce Hi, > This is only an efficient bug under FreeBSD. log1p.S is too broken to > use, so FreeBSD doesn't use it: > > You can find a correct version in glibc (version 2.1.1. at least). I see. > (Inlining acos doesn't help > much because the inlined code is quite large and slow; Interesting. I was under the impression that the inlined code is rather small. :-? double acos_inline(double x) { register double z; asm("fmul": "=t" (z) : "0" (1+x), "u" (1-x) ); asm("fsqrt": "=t" (z) : "0" (z) ); asm("fpatan": "=t" (z) : "0" (1.0), "u" (z) ); if(_LIB_VERSION == _IEEE_ || isnan(x)) return z; if(fabs(x)>1.0) { return __kernel_standard(x,x,1); /* acos(|x|>1) */ } else return z; } Anyway, thanks for taking the time to answer a not strictly BSD related question. Regards, Stefan State Changed From-To: open->feedback What was the conclusion here - is the current behaviour a bug or not? State Changed From-To: feedback->closed My interpretation of Bruce's reply below is that this should be closed :-) In message <20011204122432.S4382-100000@gamplex.bde.org>, Bruce Evans writes: |
While "porting" current libm to OS/2 (i.e. recompiling and running various tests), I noticed that assembler code for log1p(x) is basically as follows: Compute x+1, than call fyl2x. This plainly contradicts the man page: > log1p(x) returns a value equivalent to `log (1 + x)'. It > is computed in a way that is accurate even if the value of > x is near zero. IMHO, you really need to use fyl2xp1, if x is sufficiently close to 0 for that instruction to work (unfortunately is working only in a rather small interval). On a related issue, the various wrapper functions around assembler code cause an additional function call which really causes a performance loss. I have been able to speed up e.g. "acos" by more than then percent by replacing the assembler routine __ieee754_acos with inline assembler code.