Bug 142803 - j0 Bessel function inaccurate near zeros of the function
Summary: j0 Bessel function inaccurate near zeros of the function
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: standards (show other bugs)
Version: 9.0-CURRENT
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-standards (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-14 01:50 UTC by Steven G. Kargl
Modified: 2018-10-30 21:55 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steven G. Kargl 2010-01-14 01:50:01 UTC
The j0 bessel function supplied by libm is fairly inaccurate at
arguments at and near a zero of the function.  Here's the first
30 zeros computed by j0f, my implementation of j0f, a 4000-bit
significand computed via MPFR (the assumed exact value), and the
relative absolute error. 

    x        my j0f(x)     libm j0f(x)    MPFR j0        my err  libm err
 2.404825  5.6434434E-08  5.9634296E-08  5.6434400E-08      1.64 152824.59
 5.520078  2.4476664E-08  2.4153294E-08  2.4476659E-08      0.31  18878.52
 8.653728  1.0355323E-07  1.0359805E-07  1.0355306E-07      6.36   1694.47
11.791534 -2.4511966E-09 -3.5193941E-09 -3.5301714E-09  78243.14    781.53
14.930918 -6.6019510E-09 -6.3911618E-09 -6.4815052E-09   8962.94   6722.88
18.071064  5.1734359E-09  5.3149818E-09  5.1532318E-09   1362.83  10910.50
21.211637 -1.5023797E-07 -1.5002509E-07 -1.5023348E-07   1213.07  56347.01
24.352472  1.2524691E-07  1.2516310E-07  1.2524570E-07     41.65   2834.53
27.493479  5.4330716E-08  5.4263626E-08  5.4331104E-08     18.77   3261.75
30.634607  1.2205560E-07  1.2203689E-07  1.2205546E-07      4.85    645.39
33.775822 -2.0213101E-07 -2.0206903E-07 -2.0213095E-07      5.48   6263.95
36.917099  8.4751605E-08  8.4749573E-08  8.4751581E-08      0.99     82.59
40.058426 -1.7484851E-08 -1.7475532E-08 -1.7484840E-08      0.91    767.56
43.199791 -9.2091398E-08 -9.2135146E-08 -9.2091406E-08      2.47  13530.51
46.341187  2.1663259E-07  2.1664336E-07  2.1663259E-07      0.16    268.90
49.482609 -1.2502527E-07 -1.2504157E-07 -1.2502526E-07      2.69  23512.60
52.624050  1.8706569E-07  1.8707487E-07  1.8706569E-07      0.01    251.43
55.765511 -2.0935554E-08 -2.0932896E-08 -2.0935556E-08      0.19    227.03
58.906982  1.5637660E-07  1.5634730E-07  1.5637661E-07      0.26    892.22
62.048470  3.5779891E-08  3.5787338E-08  3.5779888E-08      0.14    403.17

Note, my j0f(x) currently uses double precision to accumulate intermediate
values.  Below x = 10, I use the ascending series to compute the value.
Above x = 10, I'm using an asymptotic approximation.  I haven't investigated
whether additional terms in an asymptotic approximation would pull 'my err'
for x = 11, 14, 18, and 21 closer to the exact value.

Fix: 

I'll get to it eventually.
Comment 1 Steven G. Kargl 2010-01-14 02:59:26 UTC
I forgot to mention that I believe this
issue will be found in j1(3) as well.
I haven't tried any tests with y0, y1, 
jn or yn.  I'd be might surprised if 
these were robust.

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/
Comment 2 Steven G. Kargl 2010-01-14 03:18:06 UTC
Well, I just checked the bug report on the
web, and the nicely formatted table looks 
like garbage due to the stupidity of crushing
consecutive spaces down to a single space.

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/
Comment 3 Bruce Evans freebsd_committer 2010-01-14 10:26:49 UTC
On Wed, 13 Jan 2010, Steven G. Kargl wrote:

>> Description:
>
> The j0 bessel function supplied by libm is fairly inaccurate at
> arguments at and near a zero of the function.  Here's the first
> 30 zeros computed by j0f, my implementation of j0f, a 4000-bit
> significand computed via MPFR (the assumed exact value), and the
> relative absolute error.

This is a very hard and relatively unimportant problem.

The corresponding problem for trigonometric functions is fairly hard
and relatively very important.  It is solved in fdlibm (and this in
FreeBSD libm) essentially by using a table of all the relevant zeros,
with the necessary several thousands of binary digits of accuracy for
many of the zeros.  Since trigonometric functions are periodic, it is
not necessary to have a table of approximations (e.g., polynomials)
for each of the zeros of interest.  There are O(2**100) zeros of
interest for 128-bit long doubles, so simple tables wouldn't work even
for the zeros.

The corresponding problem for lgamma() is relatively simple and
relatively important.  It is solved in places like the Intel ia64 libm
by using a table of all the relevant zeros and a table of approximations
for each of the zeros.  There are only about 70 relevant zeros (since
zeroes less than about -70 are so close to the (pseudo-)poles that
they cannot be represented in double precision (unlike the poles which,
since they are at the negative integers, can be represented down to
about -10**53 in double precision)).

For the corresponding problem for at least the simplest bessel function
j0(), the zeros are distributed like the zeros of sin(), but they
aren't exactly at multiples of Pi like those of sin(), so just finding
all the relevant ones to the necessary thousands of binary digits of
accuracy is a hard problem.  At least j0() is bounded like sin() (and
unlike lgamma()); thus we know that there are O(2**100) relevant zeros,
so it is impossible to find them all in advance.  Since bessel functions
aren't periodic, it is also unclear how they could be calculated
accurately near the zeros without using a different special approximation
near every zero.  In general, such calculations need to be done in extra
precision even for the linear term, giving relatively minor extra
complications.

I once tried to caclulate lgamma(z) near just the first zero of lgamma(),
using only extra precision applied to standard formalas (Stirling
approximation and not reflection since extra precision was only easy to
obtain than for the former), to see how far I could get using only
extra precision.  It took approximately doubled double precision (double
precision with most intermediate calculations done in extra precision)
to get near float precision for lgamma().

It might be worth making bessel functions accurate near just the first
few zeros.

>    x        my j0f(x)     libm j0f(x)    MPFR j0        my err  libm err
> 2.404825  5.6434434E-08  5.9634296E-08  5.6434400E-08      1.64 152824.59
> 5.520078  2.4476664E-08  2.4153294E-08  2.4476659E-08      0.31  18878.52
> 8.653728  1.0355323E-07  1.0359805E-07  1.0355306E-07      6.36   1694.47
> 11.791534 -2.4511966E-09 -3.5193941E-09 -3.5301714E-09  78243.14    781.53

Hmm.

> ...
> 62.048470  3.5779891E-08  3.5787338E-08  3.5779888E-08      0.14    403.17

The largest errors across all 2**32 float values for the 1-parameter
float precision bessel functions in libm is about 3.7 Mulps for j0()
and about 17 Gulps for y1():

%%%
j0:    max_er = 0x6f8686c5c9f26 3654467.3863, avg_er = 0.418, #>=1:0.5 = 671562746:1332944948
j1:    max_er = 0x449026e9c286f 2246675.4566, avg_er = 0.425, #>=1:0.5 = 674510414:1347770568
lgamma:max_er = 0xe4cf242400000 7497618.0703, avg_er = 0.274, #>=1:0.5 = 70033452:508702222
y0:    max_er = 0x93a2340c00000 4837658.0234, avg_er = 0.380, #>=1:0.5 = 594207097:1303826516
y1:    max_er = 0x7ffa2068256f72ab 17176789825.1699, avg_er = 5.188, #>=1:0.5 = 459137173:1213136103
%%%


> Note, my j0f(x) currently uses double precision to accumulate intermediate
> values.  Below x = 10, I use the ascending series to compute the value.
> Above x = 10, I'm using an asymptotic approximation.  I haven't investigated
> whether additional terms in an asymptotic approximation would pull 'my err'
> for x = 11, 14, 18, and 21 closer to the exact value.

Stirling asymptotic is what barely worked (with significant extra
precision) for the first zero of lgamma().  lgamma() is especially bad
for higher zeros of lgamma(), since there is a pole very near the zero.
I guess the behaviour for j0() is not so bad since there are no poles,
but the asymptotic method also seems to work not so well (compared with
a power series method) on lgamma(x) for x > 1 where there are no poles.

Anyway, if you can get anywhere near < 10 ulp error near all zeros using
only an asymptotic method, then that would be good.  Then the asymptotic
method would also be capable of locating the zeros very accurately.  But
I would be very surprised if this worked.  I know of nothing similar for
reducing mod Pi for trigonometric functions, which seems a simpler problem.
I would expect it to at best involve thousands of binary digits in the
tables for the asymptotic method, and corresponding thousands of digits
of precision in the calculation (4000 as for mfpr enough for the 2**100th
zero?).

Bruce
Comment 4 Steven G. Kargl 2010-01-14 22:49:45 UTC
Bruce Evans wrote:
> On Wed, 13 Jan 2010, Steven G. Kargl wrote:
> 
> >> Description:
> >
> > The j0 bessel function supplied by libm is fairly inaccurate at
> > arguments at and near a zero of the function.  Here's the first
> > 30 zeros computed by j0f, my implementation of j0f, a 4000-bit
> > significand computed via MPFR (the assumed exact value), and the
> > relative absolute error.
> 
> This is a very hard and relatively unimportant problem.

Yes, it is very hard, but apparently you do not use bessel
functions in your everyday life. :)

I only discover this issue because I need bessel functions
of complex arguments and I found my routines have issues
in the vicinity of zeros.  So, I decided to look at the 
libm routines.

> >    x        my j0f(x)     libm j0f(x)    MPFR j0        my err  libm err
> > 2.404825  5.6434434E-08  5.9634296E-08  5.6434400E-08      1.64 152824.59
> > 5.520078  2.4476664E-08  2.4153294E-08  2.4476659E-08      0.31  18878.52
> > 8.653728  1.0355323E-07  1.0359805E-07  1.0355306E-07      6.36   1694.47
> > 11.791534 -2.4511966E-09 -3.5193941E-09 -3.5301714E-09  78243.14    781.53
> 
> Hmm.

I forgot to mention that 'my err' and 'libm err' are
in units of epsilon (ie, FLT_EPSILON for j0f).


> > Note, my j0f(x) currently uses double precision to accumulate intermediate
> > values.  Below x = 10, I use the ascending series to compute the value.
> > Above x = 10, I'm using an asymptotic approximation.  I haven't investigated
> > whether additional terms in an asymptotic approximation would pull 'my err'
> > for x = 11, 14, 18, and 21 closer to the exact value.
> 

(snip)

> Anyway, if you can get anywhere near < 10 ulp error near all zeros using
> only an asymptotic method, then that would be good.  Then the asymptotic
> method would also be capable of locating the zeros very accurately.  But
> I would be very surprised if this worked.  I know of nothing similar for
> reducing mod Pi for trigonometric functions, which seems a simpler problem.
> I would expect it to at best involve thousands of binary digits in the
> tables for the asymptotic method, and corresponding thousands of digits
> of precision in the calculation (4000 as for mfpr enough for the 2**100th
> zero?).

The 4000-bit setting for mpfr was a hold over from testing mpfr_j0
against my ascending series implementation of j0 with mpfr 
primitives.  As few as 128-bits is sufficient to achieve the 
following:

    2.404825  5.6434398E-08  5.9634296E-08  5.6434400E-08      0.05 152824.59
    5.520078  2.4476657E-08  2.4153294E-08  2.4476659E-08      0.10  18878.52
    8.653728  1.0355303E-07  1.0359805E-07  1.0355306E-07      0.86   1694.47
   11.791534 -3.5291243E-09 -3.5193941E-09 -3.5301714E-09     75.93    781.53
   14.930918 -6.4815082E-09 -6.3911618E-09 -6.4815052E-09      0.23   6722.88
   18.071064  5.1532352E-09  5.3149818E-09  5.1532318E-09      0.23  10910.50
   21.211637 -1.5023349E-07 -1.5002509E-07 -1.5023348E-07      2.70  56347.01
   24.352472  1.2524569E-07  1.2516310E-07  1.2524570E-07      0.28   2834.53
   27.493479  5.4331110E-08  5.4263626E-08  5.4331104E-08      0.29   3261.75
   30.634607  1.2205545E-07  1.2203689E-07  1.2205546E-07      0.09    645.39
   33.775822 -2.0213095E-07 -2.0206903E-07 -2.0213095E-07      0.27   6263.95
   36.917099  8.4751576E-08  8.4749573E-08  8.4751581E-08      0.18     82.59
   40.058426 -1.7484838E-08 -1.7475532E-08 -1.7484840E-08      0.12    767.56
   43.199791 -9.2091398E-08 -9.2135146E-08 -9.2091406E-08      2.47  13530.51
   46.341187  2.1663259E-07  2.1664336E-07  2.1663259E-07      0.16    268.90
   49.482609 -1.2502527E-07 -1.2504157E-07 -1.2502526E-07      2.69  23512.60
   52.624050  1.8706569E-07  1.8707487E-07  1.8706569E-07      0.01    251.43
   55.765511 -2.0935557E-08 -2.0932896E-08 -2.0935556E-08      0.10    227.04
   58.906982  1.5637660E-07  1.5634730E-07  1.5637661E-07      0.28    892.23
   62.048470  3.5779891E-08  3.5787338E-08  3.5779899E-08      0.42    402.61

As I suspected by adding additional terms to the asymptotic
approximation and performing all computations with double
precision, reduces 'my err' (5th column).  The value at
x=11.7... is the best I can get.  The asymptotic approximations
contain divergent series and additional terms do not help.

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/
Comment 5 Bruce Evans freebsd_committer 2010-01-15 00:38:39 UTC
On Thu, 14 Jan 2010, Steven G. Kargl wrote:

> Bruce Evans wrote:
>> On Wed, 13 Jan 2010, Steven G. Kargl wrote:
>>
>>>> Description:
>>>
>>> The j0 bessel function supplied by libm is fairly inaccurate at
>>> ...
>>
>> This is a very hard and relatively unimportant problem.
>
> Yes, it is very hard, but apparently you do not use bessel
> functions in your everyday life. :)
>
> I only discover this issue because I need bessel functions
> of complex arguments and I found my routines have issues
> in the vicinity of zeros.  So, I decided to look at the
> libm routines.

It is interesting that the often-poor accuracy of almost every system's
libm matters in real life.

Complex args are another interesting problem since even complex
multiplication is hard to do accurately (it may have large cancelation
errors).

>> Anyway, if you can get anywhere near < 10 ulp error near all zeros using
>> only an asymptotic method, then that would be good.  Then the asymptotic
>> method would also be capable of locating the zeros very accurately.  But
>> I would be very surprised if this worked.  I know of nothing similar for
>> reducing mod Pi for trigonometric functions, which seems a simpler problem.
>> I would expect it to at best involve thousands of binary digits in the
>> tables for the asymptotic method, and corresponding thousands of digits
>> of precision in the calculation (4000 as for mfpr enough for the 2**100th
>> zero?).
>
> The 4000-bit setting for mpfr was a hold over from testing mpfr_j0
> against my ascending series implementation of j0 with mpfr
> primitives.  As few as 128-bits is sufficient to achieve the
> following:
>
>>>    x        my j0f(x)     libm j0f(x)    MPFR j0        my err  libm err
>    2.404825  5.6434398E-08  5.9634296E-08  5.6434400E-08      0.05 152824.59
>    5.520078  2.4476657E-08  2.4153294E-08  2.4476659E-08      0.10  18878.52
>    8.653728  1.0355303E-07  1.0359805E-07  1.0355306E-07      0.86   1694.47
>   11.791534 -3.5291243E-09 -3.5193941E-09 -3.5301714E-09     75.93    781.53

Wonder why this jumps.

>   14.930918 -6.4815082E-09 -6.3911618E-09 -6.4815052E-09      0.23   6722.88
>   18.071064  5.1532352E-09  5.3149818E-09  5.1532318E-09      0.23  10910.50
>   21.211637 -1.5023349E-07 -1.5002509E-07 -1.5023348E-07      2.70  56347.01
> ...
>
> As I suspected by adding additional terms to the asymptotic
> approximation and performing all computations with double
> precision, reduces 'my err' (5th column).  The value at
> x=11.7... is the best I can get.  The asymptotic approximations
> contain divergent series and additional terms do not help.

The extra precision is almost certainly necessary.  Whether double
precision is nearly enough is unclear, but the error near 11.7 suggests
that it is nearly enough except there.  The large error might be caused
by that zero alone (among small zeros) being very close to a representable
value.

I forgot the mention that the error table in my previous mail is on amd64
for comparing float precision functions with double precision ones, assuming
that the latter are correct, which they aren't, but they are hopefully
correct enough for this comparision.  The errors on i386 are much larger,
due to i386 still using i387 hardware trigonometric which are extremely
inaccurate near zeros, starting at the first zero.  Here are both tables:

amd64:
%%%
j0:    max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 5.581, #>=1:0.5 = 1593961230:1839722934
j1:    max_er = 0x7fffffffffbcd2a1 17179869183.9918, avg_er = 4.524, #>=1:0.5 = 1597678928:1856295142
lgamma:max_er = 0x135a0b77e00000 10145883.7461, avg_er = 0.252, #>=1:0.5 = 44084256:331444835
y0:    max_er = 0x7fffffffffbcd2a0 17179869183.9918, avg_er = 2.379, #>=1:0.5 = 837057577:1437331064
y1:    max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 3.761, #>=1:0.5 = 865063612:1460264955
%%%

i386:
%%%
j0:    max_er = 0x6f8686c5c9f26 3654467.3863, avg_er = 0.418, #>=1:0.5 = 671562746:1332944948
j1:    max_er = 0x449026e9c286f 2246675.4566, avg_er = 0.425, #>=1:0.5 = 674510414:1347770568
lgamma:max_er = 0xe4cf242400000 7497618.0703, avg_er = 0.274, #>=1:0.5 = 70033452:508702222
y0:    max_er = 0x93a2340c00000 4837658.0234, avg_er = 0.380, #>=1:0.5 = 594207097:1303826516
y1:    max_er = 0x7ffa2068256f72ab 17176789825.1699, avg_er = 5.188, #>=1:0.5 = 459137173:1213136103
%%%

Unfortunately, most of these i386 errors, and the amd64 error for y1()
are misreported.  The huge max_er of 0x7ff... (16 hex digits) is
actually a sign error misreported.  Sign errors are bad enough.  They
always result at a simple zero z0 (f'(z0) != 0) when the approximation
is so inaccurate that it cannot tell on which side of infinite-precision
z0 the parameter is.  Methods involving a table of zeros will not have
any of these provided the table is accurate to within 1 ulp, but other
methods can easily have them, depending on the other method's ability
to locate the zeros to within 1 ulp by solving f~(z) ~= 0 where f~ is
the approximate f.

Having no sign errors across the whole range seems too good to believe
for the amd64 functions.  All of the i387 hardware trig functions have
sign errors.

Bruce
Comment 6 Steven G. Kargl 2010-01-15 01:53:17 UTC
Bruce Evans wrote:
> On Thu, 14 Jan 2010, Steven G. Kargl wrote:
> > Bruce Evans wrote:
> 
> >> Anyway, if you can get anywhere near < 10 ulp error near all zeros using
> >> only an asymptotic method, then that would be good.  Then the asymptotic
> >> method would also be capable of locating the zeros very accurately.  But
> >> I would be very surprised if this worked.  I know of nothing similar for
> >> reducing mod Pi for trigonometric functions, which seems a simpler problem.
> >> I would expect it to at best involve thousands of binary digits in the
> >> tables for the asymptotic method, and corresponding thousands of digits
> >> of precision in the calculation (4000 as for mfpr enough for the 2**100th
> >> zero?).
> >
> > The 4000-bit setting for mpfr was a hold over from testing mpfr_j0
> > against my ascending series implementation of j0 with mpfr
> > primitives.  As few as 128-bits is sufficient to achieve the
> > following:
> >
> >>>    x        my j0f(x)     libm j0f(x)    MPFR j0        my err  libm err
> >    2.404825  5.6434398E-08  5.9634296E-08  5.6434400E-08      0.05 152824.59
> >    5.520078  2.4476657E-08  2.4153294E-08  2.4476659E-08      0.10  18878.52
> >    8.653728  1.0355303E-07  1.0359805E-07  1.0355306E-07      0.86   1694.47
> >   11.791534 -3.5291243E-09 -3.5193941E-09 -3.5301714E-09     75.93    781.53
> 
> Wonder why this jumps.

Below x=10, I use the ascending series.  Above x=0, I use an asymptotic
approximation (AA).   x = 11.79... is the first zero I hit with the AA.

> >   14.930918 -6.4815082E-09 -6.3911618E-09 -6.4815052E-09      0.23   6722.88
> >   18.071064  5.1532352E-09  5.3149818E-09  5.1532318E-09      0.23  10910.50
> >   21.211637 -1.5023349E-07 -1.5002509E-07 -1.5023348E-07      2.70  56347.01
> > ...
> >
> > As I suspected by adding additional terms to the asymptotic
> > approximation and performing all computations with double
> > precision, reduces 'my err' (5th column).  The value at
> > x=11.7... is the best I can get.  The asymptotic approximations
> > contain divergent series and additional terms do not help.
> 
> The extra precision is almost certainly necessary.  Whether double
> precision is nearly enough is unclear, but the error near 11.7 suggests
> that it is nearly enough except there.  The large error might be caused
> by that zero alone (among small zeros) being very close to a representable
> value.

The AA is j0(x) = (P(x) * cos(x+pi/4) + Q(x) * sin(x+pi/4)) where 
P(x) and Q(x) are infinite, divergent sums.  I use the first 11 
terms in P(x) and Q(x) to achieve the 'my err' = 75.9.  Additional
terms cause an increase in 'my err' as does fewer terms.  This is
probably the limit of double precision.

I haven't investigated the intervals around the values I listed.  So,
there may be larger errors that are yet to be found.

BTW, the MPFR website has a document that describes all their algorithms.
They claim that the AA can be used for |x| > p*log(2)/2 where p is
the precision of x; however, in the mpfr code the criterion is |x| > p/2.

http://www.mpfr.org/algo.html

> I forgot the mention that the error table in my previous mail is on amd64
> for comparing float precision functions with double precision ones, assuming
> that the latter are correct, which they aren't, but they are hopefully
> correct enough for this comparision.  The errors on i386 are much larger,
> due to i386 still using i387 hardware trigonometric which are extremely
> inaccurate near zeros, starting at the first zero.  Here are both tables:

My values are also computed on amd64.  I seldomly use i386 for numerical
work.  A quick change to my test code to use the double precision j0()
suggests that it has sufficient precision for the comparison you've
done.

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/
Comment 7 Steven G. Kargl 2010-02-23 00:21:21 UTC
Bruce Evans wrote:
> On Thu, 14 Jan 2010, Steven G. Kargl wrote:
> 
> > Bruce Evans wrote:
> >> On Wed, 13 Jan 2010, Steven G. Kargl wrote:
> >>
> >>>> Description:
> >>>
> >>> The j0 bessel function supplied by libm is fairly inaccurate at
> >>> ...
> >>
> >> This is a very hard and relatively unimportant problem.
> >
> > Yes, it is very hard, but apparently you do not use bessel
> > functions in your everyday life. :)
> >
> > I only discover this issue because I need bessel functions
> > of complex arguments and I found my routines have issues
> > in the vicinity of zeros.  So, I decided to look at the
> > libm routines.
> 
> It is interesting that the often-poor accuracy of almost every system's
> libm matters in real life.
> 
> Complex args are another interesting problem since even complex
> multiplication is hard to do accurately (it may have large cancelation
> errors).
> 
> >> Anyway, if you can get anywhere near < 10 ulp error near all zeros using
> >> only an asymptotic method, then that would be good.  Then the asymptotic
> >> method would also be capable of locating the zeros very accurately.  But
> >> I would be very surprised if this worked.  I know of nothing similar for
> >> reducing mod Pi for trigonometric functions, which seems a simpler problem.
> >> I would expect it to at best involve thousands of binary digits in the
> >> tables for the asymptotic method, and corresponding thousands of digits
> >> of precision in the calculation (4000 as for mfpr enough for the 2**100th
> >> zero?).
> >
> > The 4000-bit setting for mpfr was a hold over from testing mpfr_j0
> > against my ascending series implementation of j0 with mpfr
> > primitives.  As few as 128-bits is sufficient to achieve the
> > following:
> >
> >>>    x        my j0f(x)     libm j0f(x)    MPFR j0        my err  libm err
> >    2.404825  5.6434398E-08  5.9634296E-08  5.6434400E-08      0.05 152824.59
> >    5.520078  2.4476657E-08  2.4153294E-08  2.4476659E-08      0.10  18878.52
> >    8.653728  1.0355303E-07  1.0359805E-07  1.0355306E-07      0.86   1694.47
> >   11.791534 -3.5291243E-09 -3.5193941E-09 -3.5301714E-09     75.93    781.53
> 
> Wonder why this jumps.
> 
> >   14.930918 -6.4815082E-09 -6.3911618E-09 -6.4815052E-09      0.23   6722.88
> >   18.071064  5.1532352E-09  5.3149818E-09  5.1532318E-09      0.23  10910.50
> >   21.211637 -1.5023349E-07 -1.5002509E-07 -1.5023348E-07      2.70  56347.01
> > ...
> >
> > As I suspected by adding additional terms to the asymptotic
> > approximation and performing all computations with double
> > precision, reduces 'my err' (5th column).  The value at
> > x=11.7... is the best I can get.  The asymptotic approximations
> > contain divergent series and additional terms do not help.
> 
> The extra precision is almost certainly necessary.  Whether double
> precision is nearly enough is unclear, but the error near 11.7 suggests
> that it is nearly enough except there.  The large error might be caused
> by that zero alone (among small zeros) being very close to a representable
> value.
> 
> I forgot the mention that the error table in my previous mail is on amd64
> for comparing float precision functions with double precision ones, assuming
> that the latter are correct, which they aren't, but they are hopefully
> correct enough for this comparision.  The errors on i386 are much larger,
> due to i386 still using i387 hardware trigonometric which are extremely
> inaccurate near zeros, starting at the first zero.  Here are both tables:
> 
> amd64:
> %%%
> j0:    max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 5.581, #>=1:0.5 = 1593961230:1839722934
> j1:    max_er = 0x7fffffffffbcd2a1 17179869183.9918, avg_er = 4.524, #>=1:0.5 = 1597678928:1856295142
> lgamma:max_er = 0x135a0b77e00000 10145883.7461, avg_er = 0.252, #>=1:0.5 = 44084256:331444835
> y0:    max_er = 0x7fffffffffbcd2a0 17179869183.9918, avg_er = 2.379, #>=1:0.5 = 837057577:1437331064
> y1:    max_er = 0x7fffffffffdf5e07 17179869183.9960, avg_er = 3.761, #>=1:0.5 = 865063612:1460264955
> %%%
> 
> i386:
> %%%
> j0:    max_er = 0x6f8686c5c9f26 3654467.3863, avg_er = 0.418, #>=1:0.5 = 671562746:1332944948
> j1:    max_er = 0x449026e9c286f 2246675.4566, avg_er = 0.425, #>=1:0.5 = 674510414:1347770568
> lgamma:max_er = 0xe4cf242400000 7497618.0703, avg_er = 0.274, #>=1:0.5 = 70033452:508702222
> y0:    max_er = 0x93a2340c00000 4837658.0234, avg_er = 0.380, #>=1:0.5 = 594207097:1303826516
> y1:    max_er = 0x7ffa2068256f72ab 17176789825.1699, avg_er = 5.188, #>=1:0.5 = 459137173:1213136103
> %%%
> 
> Unfortunately, most of these i386 errors, and the amd64 error for y1()
> are misreported.  The huge max_er of 0x7ff... (16 hex digits) is
> actually a sign error misreported.  Sign errors are bad enough.  They
> always result at a simple zero z0 (f'(z0) != 0) when the approximation
> is so inaccurate that it cannot tell on which side of infinite-precision
> z0 the parameter is.  Methods involving a table of zeros will not have
> any of these provided the table is accurate to within 1 ulp, but other
> methods can easily have them, depending on the other method's ability
> to locate the zeros to within 1 ulp by solving f~(z) ~= 0 where f~ is
> the approximate f.
> 
> Having no sign errors across the whole range seems too good to believe
> for the amd64 functions.  All of the i387 hardware trig functions have
> sign errors.
> 

Bruce, David,

Since the my real-life work requires additional accuracy for
the Bessel function routines in libm, I spent sometime playing
with e_j0f.  I've come up with the patch that follows my signature.
It only deals with the first 32 zeros of j0(x), others can be
added to the table.  I've yet to come up with a general method
to improve the accuracy around all zeros.   To help you gauge
the improvement, one can look at

http://troutmask.apl.washington.edu/~kargl/j0f.jpg
http://troutmask.apl.washington.edu/~kargl/j0f_1.jpg

which compares the unpatched j0f(x) to the patched j0f(x).

The patch makes use of the addition theorem for J0(x+d) where
x is a zero and d is some small deviation from the zero.  Using
only the first 4 terms is sufficent for j0f.

J0(x+d) = - J1(x)*J1(d) + J2(x)*J2(d) - J3(x)*J3(d) + J4(x)*J4(d)

where J1(d), J2(d), J3(d) and J4(d) can be approximated by 
truncating their series representations.  The functions with x, ie
the known zero, are evaluated with MPFR.  Note, d is determined
from d = z - x = z - high(x) - low(x) where high(x) is the 1st
24 bits of the zero, low(x) is the next 24 bits of zero, and 
z is some value in [x-0.065,x+0.065].

Bruce (and/or David), if you think that this merits inclusion
in FreeBSD's libm, I'll work up a similar patch for e_j0.c and
submit both; otherwise, I keep it as a local change. 

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/


Index: src/e_j0f.c
===================================================================
--- src/e_j0f.c	(revision 204219)
+++ src/e_j0f.c	(working copy)
@@ -22,6 +22,8 @@
 static float pzerof(float), qzerof(float);
 
 static const float
+invpi	= 0.31830988618379067154,
+pi	= 3.14159265358979323846,
 huge 	= 1e30,
 one	= 1.0,
 invsqrtpi=  5.6418961287e-01, /* 0x3f106ebb */
@@ -38,17 +40,157 @@
 
 static const float zero = 0.0;
 
+/* J1(x) for x << 1. */
+static float
+_j1(float x2)
+{
+	return (x2 * (1 - 0.5 * x2 * x2));
+}
+
+/* J2(x) for x << 1. */
+static float
+_j2(float x2)
+{
+	x2 *= x2;
+	return (0.5 * x2 * (1 - x2 / 3));
+}
+
+/* J3(x) for x << 1. */
+static float
+_j3(float x2)
+{
+	float y;
+	y =  x2 * x2;
+	return (x2 * y * (1 - 0.25 * y) / 6);
+}
+
+/* J4(x) for x << 1. */
+static float
+_j4(float x2)
+{
+	x2 *= x2;
+	return (x2 * x2 / 24);
+}
+
+/*
+ * Use the addition theorem for values near a zero the Bessel function.
+ * J0(z) = J0(x + d) where |d| << 1.
+ *       = - 2*J1(x)*J1(d) + 2*J2(x)*J2(d) - ...
+ * Note, the factor of 2 is included in the table.
+ */
 float
+j0zerof(int n, float z)
+{
+	static const float x[32][6] = {
+	    {0x1.33d152p+1, 0x1.d2e368p-24, 0x1.09cdb4p+0, 0x1.ba1deep-1,
+		0x1.978d44p-2, 0x1.0933ccp-3},
+	    {0x1.6148f6p+2, -0x1.34f46ep-24, -0x1.5c6e6p-1, -0x1.f8f72ep-3,
+		0x1.00f404p-1, 0x1.9588cep-1},
+	    {0x1.14eb56p+3, 0x1.999bdap-22, 0x1.15f798p-1, 0x1.00f7fcp-3,
+		-0x1.f08b9p-2, -0x1.d8c2a8p-2},
+	    {0x1.79544p+3, 0x1.04e56cp-26, -0x1.dc13e6p-2, -0x1.42ff0cp-4,
+		0x1.c0af7ep-2, 0x1.350edcp-2},
+	    {0x1.ddca14p+3, -0x1.0d8e2ep-25, 0x1.a701dp-2, 0x1.c54b94p-5,
+		-0x1.97d3ccp-2, -0x1.b9186p-3},
+	    {0x1.212314p+4, -0x1.d7982ap-26, -0x1.8077f6p-2, -0x1.5467ecp-5,
+		0x1.770cdp-2, 0x1.4e26dp-3},
+	    {0x1.5362dep+4, -0x1.d1810ep-21, 0x1.62d93ap-2, 0x1.0ba9cep-5,
+		-0x1.5c8a0ap-2, -0x1.08180cp-3},
+	    {0x1.85a3bap+4, -0x1.9fd524p-21, -0x1.4b2a2ep-2, -0x1.b3298p-6,
+		0x1.46b28cp-2, 0x1.aec26ap-4},
+	    {0x1.b7e54ap+4, 0x1.7f57c4p-22, 0x1.37aac8p-2, 0x1.6ac0d2p-6,
+		-0x1.345e5cp-2, -0x1.67dfb2p-4},
+	    {0x1.ea275ap+4, -0x1.c68826p-21, -0x1.27407ep-2, -0x1.34695p-6,
+		0x1.24bc2ep-2, 0x1.32708ap-4},
+	    {0x1.0e34e2p+5, -0x1.8b3204p-20, 0x1.192f24p-2, 0x1.0a6682p-6,
+		-0x1.17365ap-2, -0x1.08ffd2p-4},
+	    {0x1.275638p+5, -0x1.5a7986p-21, -0x1.0cf3eep-2, -0x1.d242aap-7,
+		0x1.0b5fc4p-2, 0x1.d0352cp-5},
+	    {0x1.4077a8p+5, -0x1.29d6c6p-23, 0x1.0230bap-2, 0x1.9c8084p-7,
+		-0x1.00e734p-2, -0x1.9af5aap-5},
+	    {0x1.59992cp+5, 0x1.974364p-21, -0x1.f13fbp-3, -0x1.70558ep-7,
+		0x1.ef1ep-3, 0x1.6f2666p-5},
+	    {0x1.72bacp+5, 0x1.f02102p-20, 0x1.e018dap-3, 0x1.4b858ap-7,
+		-0x1.de4fp-3, -0x1.4a986ap-5},
+	    {0x1.8bdc62p+5, 0x1.27e0cap-20, -0x1.d09b22p-3, -0x1.2c74f6p-7,
+		0x1.cf1686p-3, 0x1.2bb87cp-5},
+	    {0x1.a4fe0ep+5, 0x1.c8899p-20, 0x1.c28612p-3, 0x1.11f526p-7,
+		-0x1.c138e4p-3, -0x1.115d32p-5},
+	    {0x1.be1fc4p+5, 0x1.a4c606p-23, -0x1.b5a622p-3, -0x1.f645fep-8,
+		0x1.b485eap-3, 0x1.f54de8p-6},
+	    {0x1.d7418p+5, 0x1.93c83ep-20, 0x1.a9d184p-3, 0x1.cea254p-8,
+		-0x1.a8d632p-3, -0x1.cdd58ap-6},
+	    {0x1.f06344p+5, -0x1.7b4716p-22, -0x1.9ee5eep-3, -0x1.abf28ap-8,
+		0x1.9e093ap-3, 0x1.ab47cep-6},
+	    {0x1.04c286p+6, 0x1.0f88f2p-21, 0x1.94c6f6p-3, 0x1.8d6372p-8,
+		-0x1.9403e4p-3, -0x1.8cd3dp-6},
+	    {0x1.11536cp+6, 0x1.645ae6p-19, -0x1.8b5ccap-3, -0x1.724d02p-8,
+		0x1.8aaf6p-3, 0x1.71d33p-6},
+	    {0x1.1de456p+6, -0x1.6bc7a4p-19, 0x1.829356p-3, 0x1.5a280ep-8,
+		-0x1.81f85cp-3, -0x1.59bff8p-6},
+	    {0x1.2a754p+6, -0x1.5f7eep-20, -0x1.7a597ep-3, -0x1.4486cp-8,
+		0x1.79ce5p-3, 0x1.442d38p-6},
+	    {0x1.37062cp+6, -0x1.ab28bap-20, 0x1.72a09ap-3, 0x1.310f06p-8,
+		-0x1.72230ep-3, -0x1.30c186p-6},
+	    {0x1.439718p+6, 0x1.c5c6f4p-19, -0x1.6b5c04p-3, -0x1.1f766p-8,
+		0x1.6aea5p-3, 0x1.1f32e8p-6},
+	    {0x1.502808p+6, -0x1.2cbbcep-19, 0x1.6480c4p-3, 0x1.0f7eb8p-8,
+		-0x1.641964p-3, -0x1.0f43acp-6},
+	    {0x1.5cb8f8p+6, -0x1.f0c70ap-19, -0x1.5e0544p-3, -0x1.00f3f2p-8,
+		0x1.5da6f4p-3, 0x1.00c004p-6},
+	    {0x1.6949e8p+6, -0x1.81383ep-20, 0x1.57e12p-3, 0x1.e75414p-9,
+		-0x1.578accp-3, -0x1.e6f852p-7},
+	    {0x1.75dadap+6, -0x1.cea80cp-19, -0x1.520ceep-3, -0x1.cef722p-9,
+		0x1.51bdacp-3, 0x1.cea5bap-7},
+	    {0x1.826bccp+6, -0x1.46c93cp-19, 0x1.4c8222p-3, 0x1.b89114p-9,
+		-0x1.4c392ap-3, -0x1.b8489p-7},
+	    {0x1.8efcbep+6, 0x1.615096p-20, -0x1.473ae6p-3, -0x1.a3eaeap-9,
+		0x1.46f78ap-3, 0x1.a3aa16p-7},
+	};
+
+	float j, d, d2;
+	d = z - x[n][0] - x[n][1];
+	d2 = d / 2;
+	j =  - x[n][2] * _j1(d2) + x[n][3] * _j2(d2) - 
+	       x[n][4] * _j3(d2) + x[n][5] * _j4(d2);
+	return (j);
+}
+
+/*
+ * The zeros of J0(x) fall near x = (n + 0.75) * pi.  These are offsets
+ * that are used to localize a value of x into a small interval about a 
+ * zero.
+ */
+static const float offset[32] = {
+    4.86309e-02, 2.22910e-02, 1.43477e-02, 1.05619e-02,
+    8.35263e-03, 6.90623e-03, 5.88708e-03, 5.12924e-03,
+    4.54305e-03, 4.07894e-03, 3.70066e-03, 3.38531e-03,
+    3.11957e-03, 2.89196e-03, 2.69488e-03, 2.52450e-03,
+    2.37319e-03, 2.24095e-03, 2.12016e-03, 2.01463e-03,
+    1.91673e-03, 1.82645e-03, 1.75144e-03, 1.67643e-03,
+    1.60904e-03, 1.54166e-03, 1.48953e-03, 1.43740e-03,
+    1.38528e-03, 1.34078e-03, 1.29628e-03, 1.25179e-03
+};
+
+
+float
 __ieee754_j0f(float x)
 {
 	float z, s,c,ss,cc,r,u,v;
-	int32_t hx,ix;
+	int32_t hx,ix,k;
 
 	GET_FLOAT_WORD(hx,x);
 	ix = hx&0x7fffffff;
 	if(ix>=0x7f800000) return one/(x*x);
 	x = fabsf(x);
 	if(ix >= 0x40000000) {	/* |x| >= 2.0 */
+
+		if (x < 100.) {
+			k = (int)(x * invpi - 0.75);
+			if (fabsf(x - (k + 0.75) * pi - offset[k]) < 0.065)
+				return (j0zerof(k, x));
+		}
+
 		s = sinf(x);
 		c = cosf(x);
 		ss = s-c;
Comment 8 Bruce Evans freebsd_committer 2010-02-25 11:09:09 UTC
On Mon, 22 Feb 2010, Steven G. Kargl wrote:

> Since the my real-life work requires additional accuracy for
> the Bessel function routines in libm, I spent sometime playing
> with e_j0f.  I've come up with the patch that follows my signature.
> It only deals with the first 32 zeros of j0(x), others can be
> added to the table.  I've yet to come up with a general method
> to improve the accuracy around all zeros.   To help you gauge
> the improvement, one can look at

Interesting.  I will reply more later...  I just browsed Abromowitz
(sp?) and Stegun and saw that it gives methods for finding all the zeros
and says that there are complex zeros.  It seems strong on formulae and
weak on practical computational methods and algorithms (even for its
time; of course the best methods now are different).

> +float
> __ieee754_j0f(float x)
> {
> 	float z, s,c,ss,cc,r,u,v;
> -	int32_t hx,ix;
> +	int32_t hx,ix,k;
>
> 	GET_FLOAT_WORD(hx,x);
> 	ix = hx&0x7fffffff;
> 	if(ix>=0x7f800000) return one/(x*x);
> 	x = fabsf(x);
> 	if(ix >= 0x40000000) {	/* |x| >= 2.0 */
> +
> +		if (x < 100.) {
> +			k = (int)(x * invpi - 0.75);
> +			if (fabsf(x - (k + 0.75) * pi - offset[k]) < 0.065)

These and many other constants should by float constants.  Otherwise you
get slowness and negatively useful extra accuracy (no useful extra accuracy,
due to floats elsewhere, but more complicated error analysis).

This reduction is similar to the one for the first few and/or first
few million zeros of trig functions in e_rem_pio2*.c.

> +				return (j0zerof(k, x));
> +		}
> +
> 		s = sinf(x);
> 		c = cosf(x);
> 		ss = s-c;
>

Bruce
Comment 9 David Schultz freebsd_committer 2011-03-06 08:58:09 UTC
Responsible Changed
From-To: freebsd-standards->kargl

Reassign this one to the submitter, since he has a commit bit now. 
Submitting too many high-quality PRs can backfire!
Comment 10 Mark Linimon freebsd_committer freebsd_triage 2016-08-08 07:56:06 UTC
kargl returned his commit bit for safekeeping.
Comment 11 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:49:31 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 12 sgk 2018-10-30 21:55:01 UTC
This should be closed.  No one will ever address it.