Bug 235413 - [LIBM] optizimation for cexp and cexpf
Summary: [LIBM] optizimation for cexp and cexpf
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords: patch
Depends on:
Blocks: 216862
  Show dependency treegraph
 
Reported: 2019-02-02 00:28 UTC by Steve Kargl
Modified: 2019-10-26 16:33 UTC (History)
1 user (show)

See Also:


Attachments
patch (1.75 KB, patch)
2019-02-02 00:28 UTC, Steve Kargl
no flags Details | Diff
New patch with cexpl implementation included (21.83 KB, patch)
2019-02-27 01:56 UTC, Steve Kargl
no flags Details | Diff
Updated patch (24.30 KB, patch)
2019-10-26 16:32 UTC, Steve Kargl
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Kargl freebsd_committer 2019-02-02 00:28:44 UTC
Created attachment 201620 [details]
patch

The attach patch utilizes sincos[f] in the computation
of cexp[f].  For 20 million random z=x+Iy drawn in the
box defined by x,y in [0,MAX_EXP*LN2], this amounts to
a 8.7% and 11.4% speed improvement over computing sin[f]
and cos[f] individually in cexp[f].
Comment 1 Steve Kargl freebsd_committer 2019-02-27 01:56:55 UTC
Created attachment 202398 [details]
New patch with cexpl implementation included

The new patch cexpl.diff supercedes the old path.  It includes the changes in the old patch as well as implementations for long double complex cexpl().  This patch depends on https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236063

A suitable commit messages is 

* lib/msun/src/math_private.h:
  . Add an EXTRACT_LDBL80_WORDS2() macro to get access to the high and
    low word of a 64-bit significand as well as the expsign.
  . Add prototype for __ldexp_expl().
  . Add prototype for __ldexp_cexpl().

* lib/msun/src/s_cexp.c:
  . A float.h to get LDBL_MANT_DIG.
  . Add c and s to declaration, and sort.
  . Move x = 0 case to be the first case tested.  This is in preparation
    for fixing constanting folding in GCC.
  . Use sincos() instead of a call to sin() and to cos().
  . A week_refrence for LDBL_MANT_DIG == 53.

* lib/msun/src/s_cexpf.c:
  . Add c and s to declaration, and sort.
  . Move x = 0 case to be the first case tested.  This is in preparation
    for fixing constanting folding in GCC.
  . Use sincosf() instead of a call to sinf() and to cosf().

* lib/msun/src/k_exp.c:
  . Add c and s to declaration, and sort.
  . Use sincos() instead of a call to sin() and to cos().
  
* lib/msun/src/k_expf.c
  . Add c and s to declaration, and sort.
  . Use sincosf() instead of a call to sinf() and to cosf().

* lib/msun/ld128/k_cexpl.c:
  . Copy src/k_exp.c to here.  #if 0 ... #endif all code and have
    functions return NaN.  These functions are currently unused,
    and await someone who cares.
  . Issue a compile-time warning about the sloppiness.

* lib/msun/ld128/s_cexpl.c:
  . Copy src/s_cexp.c to here.
  . Convert "double complex" to "long double complex" without use of
    bit-twiddling.
  . Add compile-time warning about the sloppiness.
  . Add run-time warning about the sloppiness.

* lib/msun/ld80/k_cexpl.c:
  . Copy src/k_exp.c to here.
  . Convert "double complex" to "long double complex".  Use bit-twiddling.

* lib/msun/ld80/s_cexpl.c:
  . Copy src/s_cexp.c to here.
  . Convert "double complex" to "long double complex".  Use bit-twiddling
    where bits are grabbed with new EXTRACT_LDBL80_WORDS2() macro.

* lib/msun/man/cexp.3:
  . Document the addtion of cexpl.

* include/complex.h:
  . Add prototype for cexpl().
Comment 2 Steve Kargl freebsd_committer 2019-02-27 18:23:03 UTC
Argh.  bde has pointed out some silly mistakes in the code.
He would also like to see the bit-twiddling reworked.  So,
the current patch should be considered informative at best.
Comment 3 Steve Kargl freebsd_committer 2019-10-26 16:32:28 UTC
Created attachment 208609 [details]
Updated patch

The updated patch includes the optimization for cexpf, cexp,
and new implementations for ld80/cexpl and ld128/cexpl.  The
ld80/cexpl implementation uses bit twiddling and has been
tested on i586 and amd64 systems.  The ld128/cexpl is a stub
implementation that has been neither compiled nor tested.
Comment 4 Steve Kargl freebsd_committer 2019-10-26 16:33:09 UTC
New commit log for cexpl_new.diff.

* include/complex.h:
  . Prototype for cexpl().

* lib/msun/Makefile:
  . Add s_cexpl.c to the build.
  . Add MLINK for cexpl.3 to cexp.3

* lib/msun/Symbol.map:
  . Add a new section for FreeBSD 13 symbols.
  . Add cexpl to the symbol table.

* lib/msun/src/math_private.h:
  . Change ENTERI() to toggle FPU precision for ld80 long double.
  . Introduce LEAVEI() to toggles FPU precision back to double.
  . Update RETURNI(x) to use __typeof(x) to set return type, use
    LEAVEI(), and use RETURNF(x) for the actual return.
  . Add prototype for __ldexp_expl();

* lib/msun/man/cexp.3:
  . Document cexpl().

* lib/msun/src/k_exp.c:
* lib/msun/src/k_expf.c:
  . Add declarations for c and s, and sort.
  . Use sincos[f]() instead of a call to sin[f]() and a call cos[f]().

* lib/msun/src/s_cexp.c:
  . Include float.h to get access to LDBL_MANT_DIG.
  . Add declarations for c and s, and sort.
  . Use sincos() instead of a call to sin() and a call cos().
  . Use __weak_reference(cexp, cexpl) on LDBL_MANT_DIG == 53.

* lib/msun/src/s_cexpf.c:
  . Add declarations for c and s, and sort.
  . Use sincosf() instead of a call to sinf() and a call cosf().

* lib/msun/ld80/k_cexpl.c:
  . Implementations of __frexp_expl(), __ldexp_expl(),  and __ldexp_cexpl(),
    which are computation kernels used by cexpl().

* lib/msun/ld80/k_expl.h:
  . Add 'L' suffix to a long double literal constant.
  . Fix typo in computation of scale2.
  . Use sincosl() instead of a call to sinl() and a call cosl().

* lib/msun/ld80/s_cexpl.c:
  . Implementation of cexpl() for ld80 hardware.

* lib/msun/ld128/k_cexpl.c:
  . Stub for __frexp_expl(), __ldexp_expl(), and __ldexp_cexpl().

* lib/msun/ld128/k_expl.h:
  . Add 'L' suffix to a long double literal constant.
  . Fix typo in computation of scale2.
  . Use sincosl() instead of a call to sinl() and a call cosl().

* lib/msun/ld128/s_cexpl.c:
  . Naive, but better than some might know, implementation of cexpl()
    for 113-bit long double hardware.