The j0 bessel function supplied by libm is fairly inaccurate at arguments at and near a zero of the function. Here's the first 30 zeros computed by j0f, my implementation of j0f, a 4000-bit significand computed via MPFR (the assumed exact value), and the relative absolute error. x my j0f(x) libm j0f(x) MPFR j0 my err libm err 2.404825 5.6434434E-08 5.9634296E-08 5.6434400E-08 1.64 152824.59 5.520078 2.4476664E-08 2.4153294E-08 2.4476659E-08 0.31 18878.52 8.653728 1.0355323E-07 1.0359805E-07 1.0355306E-07 6.36 1694.47 11.791534 -2.4511966E-09 -3.5193941E-09 -3.5301714E-09 78243.14 781.53 14.930918 -6.6019510E-09 -6.3911618E-09 -6.4815052E-09 8962.94 6722.88 18.071064 5.1734359E-09 5.3149818E-09 5.1532318E-09 1362.83 10910.50 21.211637 -1.5023797E-07 -1.5002509E-07 -1.5023348E-07 1213.07 56347.01 24.352472 1.2524691E-07 1.2516310E-07 1.2524570E-07 41.65 2834.53 27.493479 5.4330716E-08 5.4263626E-08 5.4331104E-08 18.77 3261.75 30.634607 1.2205560E-07 1.2203689E-07 1.2205546E-07 4.85 645.39 33.775822 -2.0213101E-07 -2.0206903E-07 -2.0213095E-07 5.48 6263.95 36.917099 8.4751605E-08 8.4749573E-08 8.4751581E-08 0.99 82.59 40.058426 -1.7484851E-08 -1.7475532E-08 -1.7484840E-08 0.91 767.56 43.199791 -9.2091398E-08 -9.2135146E-08 -9.2091406E-08 2.47 13530.51 46.341187 2.1663259E-07 2.1664336E-07 2.1663259E-07 0.16 268.90 49.482609 -1.2502527E-07 -1.2504157E-07 -1.2502526E-07 2.69 23512.60 52.624050 1.8706569E-07 1.8707487E-07 1.8706569E-07 0.01 251.43 55.765511 -2.0935554E-08 -2.0932896E-08 -2.0935556E-08 0.19 227.03 58.906982 1.5637660E-07 1.5634730E-07 1.5637661E-07 0.26 892.22 62.048470 3.5779891E-08 3.5787338E-08 3.5779888E-08 0.14 403.17 Note, my j0f(x) currently uses double precision to accumulate intermediate values. Below x = 10, I use the ascending series to compute the value. Above x = 10, I'm using an asymptotic approximation. I haven't investigated whether additional terms in an asymptotic approximation would pull 'my err' for x = 11, 14, 18, and 21 closer to the exact value. Fix: I'll get to it eventually.

I forgot to mention that I believe this issue will be found in j1(3) as well. I haven't tried any tests with y0, y1, jn or yn. I'd be might surprised if these were robust. -- Steve http://troutmask.apl.washington.edu/~kargl/

Well, I just checked the bug report on the web, and the nicely formatted table looks like garbage due to the stupidity of crushing consecutive spaces down to a single space. -- Steve http://troutmask.apl.washington.edu/~kargl/

On Wed, 13 Jan 2010, Steven G. Kargl wrote: >> Description: > > The j0 bessel function supplied by libm is fairly inaccurate at > arguments at and near a zero of the function. Here's the first > 30 zeros computed by j0f, my implementation of j0f, a 4000-bit > significand computed via MPFR (the assumed exact value), and the > relative absolute error. This is a very hard and relatively unimportant problem. The corresponding problem for trigonometric functions is fairly hard and relatively very important. It is solved in fdlibm (and this in FreeBSD libm) essentially by using a table of all the relevant zeros, with the necessary several thousands of binary digits of accuracy for many of the zeros. Since trigonometric functions are periodic, it is not necessary to have a table of approximations (e.g., polynomials) for each of the zeros of interest. There are O(2**100) zeros of interest for 128-bit long doubles, so simple tables wouldn't work even for the zeros. The corresponding problem for lgamma() is relatively simple and relatively important. It is solved in places like the Intel ia64 libm by using a table of all the relevant zeros and a table of approximations for each of the zeros. There are only about 70 relevant zeros (since zeroes less than about -70 are so close to the (pseudo-)poles that they cannot be represented in double precision (unlike the poles which, since they are at the negative integers, can be represented down to about -10**53 in double precision)). For the corresponding problem for at least the simplest bessel function j0(), the zeros are distributed like the zeros of sin(), but they aren't exactly at multiples of Pi like those of sin(), so just finding all the relevant ones to the necessary thousands of binary digits of accuracy is a hard problem. At least j0() is bounded like sin() (and unlike lgamma()); thus we know that there are O(2**100) relevant zeros, so it is impossible to find them all in advance. Since bessel functions aren't periodic, it is also unclear how they could be calculated accurately near the zeros without using a different special approximation near every zero. In general, such calculations need to be done in extra precision even for the linear term, giving relatively minor extra complications. I once tried to caclulate lgamma(z) near just the first zero of lgamma(), using only extra precision applied to standard formalas (Stirling approximation and not reflection since extra precision was only easy to obtain than for the former), to see how far I could get using only extra precision. It took approximately doubled double precision (double precision with most intermediate calculations done in extra precision) to get near float precision for lgamma(). It might be worth making bessel functions accurate near just the first few zeros. > x my j0f(x) libm j0f(x) MPFR j0 my err libm err > 2.404825 5.6434434E-08 5.9634296E-08 5.6434400E-08 1.64 152824.59 > 5.520078 2.4476664E-08 2.4153294E-08 2.4476659E-08 0.31 18878.52 > 8.653728 1.0355323E-07 1.0359805E-07 1.0355306E-07 6.36 1694.47 > 11.791534 -2.4511966E-09 -3.5193941E-09 -3.5301714E-09 78243.14 781.53 Hmm. > ... > 62.048470 3.5779891E-08 3.5787338E-08 3.5779888E-08 0.14 403.17 The largest errors across all 2**32 float values for the 1-parameter float precision bessel functions in libm is about 3.7 Mulps for j0() and about 17 Gulps for y1(): %%% j0: max_er = 0x6f8686c5c9f26 3654467.3863, avg_er = 0.418, #>=1:0.5 = 671562746:1332944948 j1: max_er = 0x449026e9c286f 2246675.4566, avg_er = 0.425, #>=1:0.5 = 674510414:1347770568 lgamma:max_er = 0xe4cf242400000 7497618.0703, avg_er = 0.274, #>=1:0.5 = 70033452:508702222 y0: max_er = 0x93a2340c00000 4837658.0234, avg_er = 0.380, #>=1:0.5 = 594207097:1303826516 y1: max_er = 0x7ffa2068256f72ab 17176789825.1699, avg_er = 5.188, #>=1:0.5 = 459137173:1213136103 %%% > Note, my j0f(x) currently uses double precision to accumulate intermediate > values. Below x = 10, I use the ascending series to compute the value. > Above x = 10, I'm using an asymptotic approximation. I haven't investigated > whether additional terms in an asymptotic approximation would pull 'my err' > for x = 11, 14, 18, and 21 closer to the exact value. Stirling asymptotic is what barely worked (with significant extra precision) for the first zero of lgamma(). lgamma() is especially bad for higher zeros of lgamma(), since there is a pole very near the zero. I guess the behaviour for j0() is not so bad since there are no poles, but the asymptotic method also seems to work not so well (compared with a power series method) on lgamma(x) for x > 1 where there are no poles. Anyway, if you can get anywhere near < 10 ulp error near all zeros using only an asymptotic method, then that would be good. Then the asymptotic method would also be capable of locating the zeros very accurately. But I would be very surprised if this worked. I know of nothing similar for reducing mod Pi for trigonometric functions, which seems a simpler problem. I would expect it to at best involve thousands of binary digits in the tables for the asymptotic method, and corresponding thousands of digits of precision in the calculation (4000 as for mfpr enough for the 2**100th zero?). Bruce

Bruce Evans wrote: > On Wed, 13 Jan 2010, Steven G. Kargl wrote: > > >> Description: > > > > The j0 bessel function supplied by libm is fairly inaccurate at > > arguments at and near a zero of the function. Here's the first > > 30 zeros computed by j0f, my implementation of j0f, a 4000-bit > > significand computed via MPFR (the assumed exact value), and the > > relative absolute error. > > This is a very hard and relatively unimportant problem. Yes, it is very hard, but apparently you do not use bessel functions in your everyday life. :) I only discover this issue because I need bessel functions of complex arguments and I found my routines have issues in the vicinity of zeros. So, I decided to look at the libm routines. > > x my j0f(x) libm j0f(x) MPFR j0 my err libm err > > 2.404825 5.6434434E-08 5.9634296E-08 5.6434400E-08 1.64 152824.59 > > 5.520078 2.4476664E-08 2.4153294E-08 2.4476659E-08 0.31 18878.52 > > 8.653728 1.0355323E-07 1.0359805E-07 1.0355306E-07 6.36 1694.47 > > 11.791534 -2.4511966E-09 -3.5193941E-09 -3.5301714E-09 78243.14 781.53 > > Hmm. I forgot to mention that 'my err' and 'libm err' are in units of epsilon (ie, FLT_EPSILON for j0f). > > Note, my j0f(x) currently uses double precision to accumulate intermediate > > values. Below x = 10, I use the ascending series to compute the value. > > Above x = 10, I'm using an asymptotic approximation. I haven't investigated > > whether additional terms in an asymptotic approximation would pull 'my err' > > for x = 11, 14, 18, and 21 closer to the exact value. > (snip) > Anyway, if you can get anywhere near < 10 ulp error near all zeros using > only an asymptotic method, then that would be good. Then the asymptotic > method would also be capable of locating the zeros very accurately. But > I would be very surprised if this worked. I know of nothing similar for > reducing mod Pi for trigonometric functions, which seems a simpler problem. > I would expect it to at best involve thousands of binary digits in the > tables for the asymptotic method, and corresponding thousands of digits > of precision in the calculation (4000 as for mfpr enough for the 2**100th > zero?). The 4000-bit setting for mpfr was a hold over from testing mpfr_j0 against my ascending series implementation of j0 with mpfr primitives. As few as 128-bits is sufficient to achieve the following: 2.404825 5.6434398E-08 5.9634296E-08 5.6434400E-08 0.05 152824.59 5.520078 2.4476657E-08 2.4153294E-08 2.4476659E-08 0.10 18878.52 8.653728 1.0355303E-07 1.0359805E-07 1.0355306E-07 0.86 1694.47 11.791534 -3.5291243E-09 -3.5193941E-09 -3.5301714E-09 75.93 781.53 14.930918 -6.4815082E-09 -6.3911618E-09 -6.4815052E-09 0.23 6722.88 18.071064 5.1532352E-09 5.3149818E-09 5.1532318E-09 0.23 10910.50 21.211637 -1.5023349E-07 -1.5002509E-07 -1.5023348E-07 2.70 56347.01 24.352472 1.2524569E-07 1.2516310E-07 1.2524570E-07 0.28 2834.53 27.493479 5.4331110E-08 5.4263626E-08 5.4331104E-08 0.29 3261.75 30.634607 1.2205545E-07 1.2203689E-07 1.2205546E-07 0.09 645.39 33.775822 -2.0213095E-07 -2.0206903E-07 -2.0213095E-07 0.27 6263.95 36.917099 8.4751576E-08 8.4749573E-08 8.4751581E-08 0.18 82.59 40.058426 -1.7484838E-08 -1.7475532E-08 -1.7484840E-08 0.12 767.56 43.199791 -9.2091398E-08 -9.2135146E-08 -9.2091406E-08 2.47 13530.51 46.341187 2.1663259E-07 2.1664336E-07 2.1663259E-07 0.16 268.90 49.482609 -1.2502527E-07 -1.2504157E-07 -1.2502526E-07 2.69 23512.60 52.624050 1.8706569E-07 1.8707487E-07 1.8706569E-07 0.01 251.43 55.765511 -2.0935557E-08 -2.0932896E-08 -2.0935556E-08 0.10 227.04 58.906982 1.5637660E-07 1.5634730E-07 1.5637661E-07 0.28 892.23 62.048470 3.5779891E-08 3.5787338E-08 3.5779899E-08 0.42 402.61 As I suspected by adding additional terms to the asymptotic approximation and performing all computations with double precision, reduces 'my err' (5th column). The value at x=11.7... is the best I can get. The asymptotic approximations contain divergent series and additional terms do not help. -- Steve http://troutmask.apl.washington.edu/~kargl/

On Thu, 14 Jan 2010, Steven G. Kargl wrote: > Bruce Evans wrote: >> On Wed, 13 Jan 2010, Steven G. Kargl wrote: >> >>>> Description: >>> >>> The j0 bessel function supplied by libm is fairly inaccurate at >>> ... >> >> This is a very hard and relatively unimportant problem. > > Yes, it is very hard, but apparently you do not use bessel > functions in your everyday life. :) > > I only discover this issue because I need bessel functions > of complex arguments and I found my routines have issues > in the vicinity of zeros. So, I decided to look at the > libm routines. It is interesting that the often-poor accuracy of almost every system's libm matters in real life. Complex args are another interesting problem since even complex multiplication is hard to do accurately (it may have large cancelation errors). >> Anyway, if you can get anywhere near < 10 ulp error near all zeros using >> only an asymptotic method, then that would be good. Then the asymptotic >> method would also be capable of locating the zeros very accurately. But >> I would be very surprised if this worked. I know of nothing similar for >> reducing mod Pi for trigonometric functions, which seems a simpler problem. >> I would expect it to at best involve thousands of binary digits in the >> tables for the asymptotic method, and corresponding thousands of digits >> of precision in the calculation (4000 as for mfpr enough for the 2**100th >> zero?). > > The 4000-bit setting for mpfr was a hold over from testing mpfr_j0 > against my ascending series implementation of j0 with mpfr > primitives. As few as 128-bits is sufficient to achieve the > following: > >>> x my j0f(x) libm j0f(x) MPFR j0 my err libm err > 2.404825 5.6434398E-08 5.9634296E-08 5.6434400E-08 0.05 152824.59 > 5.520078 2.4476657E-08 2.4153294E-08 2.4476659E-08 0.10 18878.52 > 8.653728 1.0355303E-07 1.0359805E-07 1.0355306E-07 0.86 1694.47 > 11.791534 -3.5291243E-09 -3.5193941E-09 -3.5301714E-09 75.93 781.53 Wonder why this jumps. > 14.930918 -6.4815082E-09 -6.3911618E-09 -6.4815052E-09 0.23 6722.88 > 18.071064 5.1532352E-09 5.3149818E-09 5.1532318E-09 0.23 10910.50 > 21.211637 -1.5023349E-07 -1.5002509E-07 -1.5023348E-07 2.70 56347.01 > ... > > As I suspected by adding additional terms to the asymptotic > approximation and performing all computations with double > precision, reduces 'my err' (5th column). The value at > x=11.7... is the best I can get. The asymptotic approximations > contain divergent series and additional terms do not help. The extra precision is almost certainly necessary. Whether double precision is nearly enough is unclear, but the error near 11.7 suggests that it is nearly enough except there. The large error might be caused by that zero alone (among small zeros) being very close to a representable value. I forgot the mention that the error table in my previous mail is on amd64 for comparing float precision functions with double precision ones, assuming that the latter are correct, which they aren't, but they are hopefully correct enough for this comparision. The errors on i386 are much larger, due to i386 still using i387 hardware trigonometric which are extremely inaccurate near zeros, starting at the first zero. Here are both tables: amd64: %%% j0: max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 5.581, #>=1:0.5 = 1593961230:1839722934 j1: max_er = 0x7fffffffffbcd2a1 17179869183.9918, avg_er = 4.524, #>=1:0.5 = 1597678928:1856295142 lgamma:max_er = 0x135a0b77e00000 10145883.7461, avg_er = 0.252, #>=1:0.5 = 44084256:331444835 y0: max_er = 0x7fffffffffbcd2a0 17179869183.9918, avg_er = 2.379, #>=1:0.5 = 837057577:1437331064 y1: max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 3.761, #>=1:0.5 = 865063612:1460264955 %%% i386: %%% j0: max_er = 0x6f8686c5c9f26 3654467.3863, avg_er = 0.418, #>=1:0.5 = 671562746:1332944948 j1: max_er = 0x449026e9c286f 2246675.4566, avg_er = 0.425, #>=1:0.5 = 674510414:1347770568 lgamma:max_er = 0xe4cf242400000 7497618.0703, avg_er = 0.274, #>=1:0.5 = 70033452:508702222 y0: max_er = 0x93a2340c00000 4837658.0234, avg_er = 0.380, #>=1:0.5 = 594207097:1303826516 y1: max_er = 0x7ffa2068256f72ab 17176789825.1699, avg_er = 5.188, #>=1:0.5 = 459137173:1213136103 %%% Unfortunately, most of these i386 errors, and the amd64 error for y1() are misreported. The huge max_er of 0x7ff... (16 hex digits) is actually a sign error misreported. Sign errors are bad enough. They always result at a simple zero z0 (f'(z0) != 0) when the approximation is so inaccurate that it cannot tell on which side of infinite-precision z0 the parameter is. Methods involving a table of zeros will not have any of these provided the table is accurate to within 1 ulp, but other methods can easily have them, depending on the other method's ability to locate the zeros to within 1 ulp by solving f~(z) ~= 0 where f~ is the approximate f. Having no sign errors across the whole range seems too good to believe for the amd64 functions. All of the i387 hardware trig functions have sign errors. Bruce

Bruce Evans wrote: > On Thu, 14 Jan 2010, Steven G. Kargl wrote: > > Bruce Evans wrote: > > >> Anyway, if you can get anywhere near < 10 ulp error near all zeros using > >> only an asymptotic method, then that would be good. Then the asymptotic > >> method would also be capable of locating the zeros very accurately. But > >> I would be very surprised if this worked. I know of nothing similar for > >> reducing mod Pi for trigonometric functions, which seems a simpler problem. > >> I would expect it to at best involve thousands of binary digits in the > >> tables for the asymptotic method, and corresponding thousands of digits > >> of precision in the calculation (4000 as for mfpr enough for the 2**100th > >> zero?). > > > > The 4000-bit setting for mpfr was a hold over from testing mpfr_j0 > > against my ascending series implementation of j0 with mpfr > > primitives. As few as 128-bits is sufficient to achieve the > > following: > > > >>> x my j0f(x) libm j0f(x) MPFR j0 my err libm err > > 2.404825 5.6434398E-08 5.9634296E-08 5.6434400E-08 0.05 152824.59 > > 5.520078 2.4476657E-08 2.4153294E-08 2.4476659E-08 0.10 18878.52 > > 8.653728 1.0355303E-07 1.0359805E-07 1.0355306E-07 0.86 1694.47 > > 11.791534 -3.5291243E-09 -3.5193941E-09 -3.5301714E-09 75.93 781.53 > > Wonder why this jumps. Below x=10, I use the ascending series. Above x=0, I use an asymptotic approximation (AA). x = 11.79... is the first zero I hit with the AA. > > 14.930918 -6.4815082E-09 -6.3911618E-09 -6.4815052E-09 0.23 6722.88 > > 18.071064 5.1532352E-09 5.3149818E-09 5.1532318E-09 0.23 10910.50 > > 21.211637 -1.5023349E-07 -1.5002509E-07 -1.5023348E-07 2.70 56347.01 > > ... > > > > As I suspected by adding additional terms to the asymptotic > > approximation and performing all computations with double > > precision, reduces 'my err' (5th column). The value at > > x=11.7... is the best I can get. The asymptotic approximations > > contain divergent series and additional terms do not help. > > The extra precision is almost certainly necessary. Whether double > precision is nearly enough is unclear, but the error near 11.7 suggests > that it is nearly enough except there. The large error might be caused > by that zero alone (among small zeros) being very close to a representable > value. The AA is j0(x) = (P(x) * cos(x+pi/4) + Q(x) * sin(x+pi/4)) where P(x) and Q(x) are infinite, divergent sums. I use the first 11 terms in P(x) and Q(x) to achieve the 'my err' = 75.9. Additional terms cause an increase in 'my err' as does fewer terms. This is probably the limit of double precision. I haven't investigated the intervals around the values I listed. So, there may be larger errors that are yet to be found. BTW, the MPFR website has a document that describes all their algorithms. They claim that the AA can be used for |x| > p*log(2)/2 where p is the precision of x; however, in the mpfr code the criterion is |x| > p/2. http://www.mpfr.org/algo.html > I forgot the mention that the error table in my previous mail is on amd64 > for comparing float precision functions with double precision ones, assuming > that the latter are correct, which they aren't, but they are hopefully > correct enough for this comparision. The errors on i386 are much larger, > due to i386 still using i387 hardware trigonometric which are extremely > inaccurate near zeros, starting at the first zero. Here are both tables: My values are also computed on amd64. I seldomly use i386 for numerical work. A quick change to my test code to use the double precision j0() suggests that it has sufficient precision for the comparison you've done. -- Steve http://troutmask.apl.washington.edu/~kargl/

Bruce Evans wrote: > On Thu, 14 Jan 2010, Steven G. Kargl wrote: > > > Bruce Evans wrote: > >> On Wed, 13 Jan 2010, Steven G. Kargl wrote: > >> > >>>> Description: > >>> > >>> The j0 bessel function supplied by libm is fairly inaccurate at > >>> ... > >> > >> This is a very hard and relatively unimportant problem. > > > > Yes, it is very hard, but apparently you do not use bessel > > functions in your everyday life. :) > > > > I only discover this issue because I need bessel functions > > of complex arguments and I found my routines have issues > > in the vicinity of zeros. So, I decided to look at the > > libm routines. > > It is interesting that the often-poor accuracy of almost every system's > libm matters in real life. > > Complex args are another interesting problem since even complex > multiplication is hard to do accurately (it may have large cancelation > errors). > > >> Anyway, if you can get anywhere near < 10 ulp error near all zeros using > >> only an asymptotic method, then that would be good. Then the asymptotic > >> method would also be capable of locating the zeros very accurately. But > >> I would be very surprised if this worked. I know of nothing similar for > >> reducing mod Pi for trigonometric functions, which seems a simpler problem. > >> I would expect it to at best involve thousands of binary digits in the > >> tables for the asymptotic method, and corresponding thousands of digits > >> of precision in the calculation (4000 as for mfpr enough for the 2**100th > >> zero?). > > > > The 4000-bit setting for mpfr was a hold over from testing mpfr_j0 > > against my ascending series implementation of j0 with mpfr > > primitives. As few as 128-bits is sufficient to achieve the > > following: > > > >>> x my j0f(x) libm j0f(x) MPFR j0 my err libm err > > 2.404825 5.6434398E-08 5.9634296E-08 5.6434400E-08 0.05 152824.59 > > 5.520078 2.4476657E-08 2.4153294E-08 2.4476659E-08 0.10 18878.52 > > 8.653728 1.0355303E-07 1.0359805E-07 1.0355306E-07 0.86 1694.47 > > 11.791534 -3.5291243E-09 -3.5193941E-09 -3.5301714E-09 75.93 781.53 > > Wonder why this jumps. > > > 14.930918 -6.4815082E-09 -6.3911618E-09 -6.4815052E-09 0.23 6722.88 > > 18.071064 5.1532352E-09 5.3149818E-09 5.1532318E-09 0.23 10910.50 > > 21.211637 -1.5023349E-07 -1.5002509E-07 -1.5023348E-07 2.70 56347.01 > > ... > > > > As I suspected by adding additional terms to the asymptotic > > approximation and performing all computations with double > > precision, reduces 'my err' (5th column). The value at > > x=11.7... is the best I can get. The asymptotic approximations > > contain divergent series and additional terms do not help. > > The extra precision is almost certainly necessary. Whether double > precision is nearly enough is unclear, but the error near 11.7 suggests > that it is nearly enough except there. The large error might be caused > by that zero alone (among small zeros) being very close to a representable > value. > > I forgot the mention that the error table in my previous mail is on amd64 > for comparing float precision functions with double precision ones, assuming > that the latter are correct, which they aren't, but they are hopefully > correct enough for this comparision. The errors on i386 are much larger, > due to i386 still using i387 hardware trigonometric which are extremely > inaccurate near zeros, starting at the first zero. Here are both tables: > > amd64: > %%% > j0: max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 5.581, #>=1:0.5 = 1593961230:1839722934 > j1: max_er = 0x7fffffffffbcd2a1 17179869183.9918, avg_er = 4.524, #>=1:0.5 = 1597678928:1856295142 > lgamma:max_er = 0x135a0b77e00000 10145883.7461, avg_er = 0.252, #>=1:0.5 = 44084256:331444835 > y0: max_er = 0x7fffffffffbcd2a0 17179869183.9918, avg_er = 2.379, #>=1:0.5 = 837057577:1437331064 > y1: max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 3.761, #>=1:0.5 = 865063612:1460264955 > %%% > > i386: > %%% > j0: max_er = 0x6f8686c5c9f26 3654467.3863, avg_er = 0.418, #>=1:0.5 = 671562746:1332944948 > j1: max_er = 0x449026e9c286f 2246675.4566, avg_er = 0.425, #>=1:0.5 = 674510414:1347770568 > lgamma:max_er = 0xe4cf242400000 7497618.0703, avg_er = 0.274, #>=1:0.5 = 70033452:508702222 > y0: max_er = 0x93a2340c00000 4837658.0234, avg_er = 0.380, #>=1:0.5 = 594207097:1303826516 > y1: max_er = 0x7ffa2068256f72ab 17176789825.1699, avg_er = 5.188, #>=1:0.5 = 459137173:1213136103 > %%% > > Unfortunately, most of these i386 errors, and the amd64 error for y1() > are misreported. The huge max_er of 0x7ff... (16 hex digits) is > actually a sign error misreported. Sign errors are bad enough. They > always result at a simple zero z0 (f'(z0) != 0) when the approximation > is so inaccurate that it cannot tell on which side of infinite-precision > z0 the parameter is. Methods involving a table of zeros will not have > any of these provided the table is accurate to within 1 ulp, but other > methods can easily have them, depending on the other method's ability > to locate the zeros to within 1 ulp by solving f~(z) ~= 0 where f~ is > the approximate f. > > Having no sign errors across the whole range seems too good to believe > for the amd64 functions. All of the i387 hardware trig functions have > sign errors. > Bruce, David, Since the my real-life work requires additional accuracy for the Bessel function routines in libm, I spent sometime playing with e_j0f. I've come up with the patch that follows my signature. It only deals with the first 32 zeros of j0(x), others can be added to the table. I've yet to come up with a general method to improve the accuracy around all zeros. To help you gauge the improvement, one can look at http://troutmask.apl.washington.edu/~kargl/j0f.jpg http://troutmask.apl.washington.edu/~kargl/j0f_1.jpg which compares the unpatched j0f(x) to the patched j0f(x). The patch makes use of the addition theorem for J0(x+d) where x is a zero and d is some small deviation from the zero. Using only the first 4 terms is sufficent for j0f. J0(x+d) = - J1(x)*J1(d) + J2(x)*J2(d) - J3(x)*J3(d) + J4(x)*J4(d) where J1(d), J2(d), J3(d) and J4(d) can be approximated by truncating their series representations. The functions with x, ie the known zero, are evaluated with MPFR. Note, d is determined from d = z - x = z - high(x) - low(x) where high(x) is the 1st 24 bits of the zero, low(x) is the next 24 bits of zero, and z is some value in [x-0.065,x+0.065]. Bruce (and/or David), if you think that this merits inclusion in FreeBSD's libm, I'll work up a similar patch for e_j0.c and submit both; otherwise, I keep it as a local change. -- Steve http://troutmask.apl.washington.edu/~kargl/ Index: src/e_j0f.c =================================================================== --- src/e_j0f.c (revision 204219) +++ src/e_j0f.c (working copy) @@ -22,6 +22,8 @@ static float pzerof(float), qzerof(float); static const float +invpi = 0.31830988618379067154, +pi = 3.14159265358979323846, huge = 1e30, one = 1.0, invsqrtpi= 5.6418961287e-01, /* 0x3f106ebb */ @@ -38,17 +40,157 @@ static const float zero = 0.0; +/* J1(x) for x << 1. */ +static float +_j1(float x2) +{ + return (x2 * (1 - 0.5 * x2 * x2)); +} + +/* J2(x) for x << 1. */ +static float +_j2(float x2) +{ + x2 *= x2; + return (0.5 * x2 * (1 - x2 / 3)); +} + +/* J3(x) for x << 1. */ +static float +_j3(float x2) +{ + float y; + y = x2 * x2; + return (x2 * y * (1 - 0.25 * y) / 6); +} + +/* J4(x) for x << 1. */ +static float +_j4(float x2) +{ + x2 *= x2; + return (x2 * x2 / 24); +} + +/* + * Use the addition theorem for values near a zero the Bessel function. + * J0(z) = J0(x + d) where |d| << 1. + * = - 2*J1(x)*J1(d) + 2*J2(x)*J2(d) - ... + * Note, the factor of 2 is included in the table. + */ float +j0zerof(int n, float z) +{ + static const float x[32][6] = { + {0x1.33d152p+1, 0x1.d2e368p-24, 0x1.09cdb4p+0, 0x1.ba1deep-1, + 0x1.978d44p-2, 0x1.0933ccp-3}, + {0x1.6148f6p+2, -0x1.34f46ep-24, -0x1.5c6e6p-1, -0x1.f8f72ep-3, + 0x1.00f404p-1, 0x1.9588cep-1}, + {0x1.14eb56p+3, 0x1.999bdap-22, 0x1.15f798p-1, 0x1.00f7fcp-3, + -0x1.f08b9p-2, -0x1.d8c2a8p-2}, + {0x1.79544p+3, 0x1.04e56cp-26, -0x1.dc13e6p-2, -0x1.42ff0cp-4, + 0x1.c0af7ep-2, 0x1.350edcp-2}, + {0x1.ddca14p+3, -0x1.0d8e2ep-25, 0x1.a701dp-2, 0x1.c54b94p-5, + -0x1.97d3ccp-2, -0x1.b9186p-3}, + {0x1.212314p+4, -0x1.d7982ap-26, -0x1.8077f6p-2, -0x1.5467ecp-5, + 0x1.770cdp-2, 0x1.4e26dp-3}, + {0x1.5362dep+4, -0x1.d1810ep-21, 0x1.62d93ap-2, 0x1.0ba9cep-5, + -0x1.5c8a0ap-2, -0x1.08180cp-3}, + {0x1.85a3bap+4, -0x1.9fd524p-21, -0x1.4b2a2ep-2, -0x1.b3298p-6, + 0x1.46b28cp-2, 0x1.aec26ap-4}, + {0x1.b7e54ap+4, 0x1.7f57c4p-22, 0x1.37aac8p-2, 0x1.6ac0d2p-6, + -0x1.345e5cp-2, -0x1.67dfb2p-4}, + {0x1.ea275ap+4, -0x1.c68826p-21, -0x1.27407ep-2, -0x1.34695p-6, + 0x1.24bc2ep-2, 0x1.32708ap-4}, + {0x1.0e34e2p+5, -0x1.8b3204p-20, 0x1.192f24p-2, 0x1.0a6682p-6, + -0x1.17365ap-2, -0x1.08ffd2p-4}, + {0x1.275638p+5, -0x1.5a7986p-21, -0x1.0cf3eep-2, -0x1.d242aap-7, + 0x1.0b5fc4p-2, 0x1.d0352cp-5}, + {0x1.4077a8p+5, -0x1.29d6c6p-23, 0x1.0230bap-2, 0x1.9c8084p-7, + -0x1.00e734p-2, -0x1.9af5aap-5}, + {0x1.59992cp+5, 0x1.974364p-21, -0x1.f13fbp-3, -0x1.70558ep-7, + 0x1.ef1ep-3, 0x1.6f2666p-5}, + {0x1.72bacp+5, 0x1.f02102p-20, 0x1.e018dap-3, 0x1.4b858ap-7, + -0x1.de4fp-3, -0x1.4a986ap-5}, + {0x1.8bdc62p+5, 0x1.27e0cap-20, -0x1.d09b22p-3, -0x1.2c74f6p-7, + 0x1.cf1686p-3, 0x1.2bb87cp-5}, + {0x1.a4fe0ep+5, 0x1.c8899p-20, 0x1.c28612p-3, 0x1.11f526p-7, + -0x1.c138e4p-3, -0x1.115d32p-5}, + {0x1.be1fc4p+5, 0x1.a4c606p-23, -0x1.b5a622p-3, -0x1.f645fep-8, + 0x1.b485eap-3, 0x1.f54de8p-6}, + {0x1.d7418p+5, 0x1.93c83ep-20, 0x1.a9d184p-3, 0x1.cea254p-8, + -0x1.a8d632p-3, -0x1.cdd58ap-6}, + {0x1.f06344p+5, -0x1.7b4716p-22, -0x1.9ee5eep-3, -0x1.abf28ap-8, + 0x1.9e093ap-3, 0x1.ab47cep-6}, + {0x1.04c286p+6, 0x1.0f88f2p-21, 0x1.94c6f6p-3, 0x1.8d6372p-8, + -0x1.9403e4p-3, -0x1.8cd3dp-6}, + {0x1.11536cp+6, 0x1.645ae6p-19, -0x1.8b5ccap-3, -0x1.724d02p-8, + 0x1.8aaf6p-3, 0x1.71d33p-6}, + {0x1.1de456p+6, -0x1.6bc7a4p-19, 0x1.829356p-3, 0x1.5a280ep-8, + -0x1.81f85cp-3, -0x1.59bff8p-6}, + {0x1.2a754p+6, -0x1.5f7eep-20, -0x1.7a597ep-3, -0x1.4486cp-8, + 0x1.79ce5p-3, 0x1.442d38p-6}, + {0x1.37062cp+6, -0x1.ab28bap-20, 0x1.72a09ap-3, 0x1.310f06p-8, + -0x1.72230ep-3, -0x1.30c186p-6}, + {0x1.439718p+6, 0x1.c5c6f4p-19, -0x1.6b5c04p-3, -0x1.1f766p-8, + 0x1.6aea5p-3, 0x1.1f32e8p-6}, + {0x1.502808p+6, -0x1.2cbbcep-19, 0x1.6480c4p-3, 0x1.0f7eb8p-8, + -0x1.641964p-3, -0x1.0f43acp-6}, + {0x1.5cb8f8p+6, -0x1.f0c70ap-19, -0x1.5e0544p-3, -0x1.00f3f2p-8, + 0x1.5da6f4p-3, 0x1.00c004p-6}, + {0x1.6949e8p+6, -0x1.81383ep-20, 0x1.57e12p-3, 0x1.e75414p-9, + -0x1.578accp-3, -0x1.e6f852p-7}, + {0x1.75dadap+6, -0x1.cea80cp-19, -0x1.520ceep-3, -0x1.cef722p-9, + 0x1.51bdacp-3, 0x1.cea5bap-7}, + {0x1.826bccp+6, -0x1.46c93cp-19, 0x1.4c8222p-3, 0x1.b89114p-9, + -0x1.4c392ap-3, -0x1.b8489p-7}, + {0x1.8efcbep+6, 0x1.615096p-20, -0x1.473ae6p-3, -0x1.a3eaeap-9, + 0x1.46f78ap-3, 0x1.a3aa16p-7}, + }; + + float j, d, d2; + d = z - x[n][0] - x[n][1]; + d2 = d / 2; + j = - x[n][2] * _j1(d2) + x[n][3] * _j2(d2) - + x[n][4] * _j3(d2) + x[n][5] * _j4(d2); + return (j); +} + +/* + * The zeros of J0(x) fall near x = (n + 0.75) * pi. These are offsets + * that are used to localize a value of x into a small interval about a + * zero. + */ +static const float offset[32] = { + 4.86309e-02, 2.22910e-02, 1.43477e-02, 1.05619e-02, + 8.35263e-03, 6.90623e-03, 5.88708e-03, 5.12924e-03, + 4.54305e-03, 4.07894e-03, 3.70066e-03, 3.38531e-03, + 3.11957e-03, 2.89196e-03, 2.69488e-03, 2.52450e-03, + 2.37319e-03, 2.24095e-03, 2.12016e-03, 2.01463e-03, + 1.91673e-03, 1.82645e-03, 1.75144e-03, 1.67643e-03, + 1.60904e-03, 1.54166e-03, 1.48953e-03, 1.43740e-03, + 1.38528e-03, 1.34078e-03, 1.29628e-03, 1.25179e-03 +}; + + +float __ieee754_j0f(float x) { float z, s,c,ss,cc,r,u,v; - int32_t hx,ix; + int32_t hx,ix,k; GET_FLOAT_WORD(hx,x); ix = hx&0x7fffffff; if(ix>=0x7f800000) return one/(x*x); x = fabsf(x); if(ix >= 0x40000000) { /* |x| >= 2.0 */ + + if (x < 100.) { + k = (int)(x * invpi - 0.75); + if (fabsf(x - (k + 0.75) * pi - offset[k]) < 0.065) + return (j0zerof(k, x)); + } + s = sinf(x); c = cosf(x); ss = s-c;

On Mon, 22 Feb 2010, Steven G. Kargl wrote: > Since the my real-life work requires additional accuracy for > the Bessel function routines in libm, I spent sometime playing > with e_j0f. I've come up with the patch that follows my signature. > It only deals with the first 32 zeros of j0(x), others can be > added to the table. I've yet to come up with a general method > to improve the accuracy around all zeros. To help you gauge > the improvement, one can look at Interesting. I will reply more later... I just browsed Abromowitz (sp?) and Stegun and saw that it gives methods for finding all the zeros and says that there are complex zeros. It seems strong on formulae and weak on practical computational methods and algorithms (even for its time; of course the best methods now are different). > +float > __ieee754_j0f(float x) > { > float z, s,c,ss,cc,r,u,v; > - int32_t hx,ix; > + int32_t hx,ix,k; > > GET_FLOAT_WORD(hx,x); > ix = hx&0x7fffffff; > if(ix>=0x7f800000) return one/(x*x); > x = fabsf(x); > if(ix >= 0x40000000) { /* |x| >= 2.0 */ > + > + if (x < 100.) { > + k = (int)(x * invpi - 0.75); > + if (fabsf(x - (k + 0.75) * pi - offset[k]) < 0.065) These and many other constants should by float constants. Otherwise you get slowness and negatively useful extra accuracy (no useful extra accuracy, due to floats elsewhere, but more complicated error analysis). This reduction is similar to the one for the first few and/or first few million zeros of trig functions in e_rem_pio2*.c. > + return (j0zerof(k, x)); > + } > + > s = sinf(x); > c = cosf(x); > ss = s-c; > Bruce

Responsible Changed From-To: freebsd-standards->kargl Reassign this one to the submitter, since he has a commit bit now. Submitting too many high-quality PRs can backfire!

kargl returned his commit bit for safekeeping.

batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.

This should be closed. No one will ever address it.