On Tue, Dec 15, 2009 at 10:15 AM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
On Tue, Dec 15, 2009 at 9:09 AM, Hauke Heibel
<hauke.heibel@xxxxxxxxxxxxxxx> wrote:
On Tue, Dec 15, 2009 at 5:25 AM, Benoit Jacob
<jacob.benoit.1@xxxxxxxxx> wrote:
There is one thing where I didn't follow Intel's code: they use a
RCPSS instruction to compute 1/det approximately, then followed by a
Newton-Raphson iteration. This sacrifices up to 2 bits of precision in
the mantissa, which already is a bit nontrivial for us (4x4 matrix
inversion is a basic operation on which people will rely very
heavily).
Hi Benoit, I recognized that you wrote in your commit log "that elsewhere in Eigen we dont allow ourselves this approximation" (Newton-Raphson). I just recalled stumbling once over such an approximation hidden in Eigen. It is for the SSE computation of the inverse square root.
The approximation takes place overe
here in Packet4f ei_psqrt(Packet4f _x) at the very bottom.
Indeed, a while ago I think we agreed that this optimized version should be enabled only if EIGEN_FAST_MATH==1 but it seems we never did that change. Let me also recall that -ffast-math => EIGEN_FAST_MATH.
So still OK to use the SSE intrinsic _mm_sqrt_p* by default and use the optimized version only when EIGEN_FAST_MATH==1 ?
sorry what I said is wrong. So currently we have by default EIGEN_FAST_MATH==1, and defining EIGEN_FAST_MATH==0 disable the vectorized version of sin and cos. If we want to be safe I suggest the following rules:
1- set EIGEN_FAST_MATH==0 by default
2- if "-ffast-fast" and EIGEN_FAST_MATH is not defined then we set EIGEN_FAST_MATH==1
3- EIGEN_FAST_MATH==0 => no vectorized version of sin and cos and the use of _mm_sqrt_p* for ei_psqrt
4- EIGEN_FAST_MATH==1 => vectorized version of sin and cos + the use of an optimized version of ei_psqrt
Also let me recall that the optimized version of ei_psqrt is about 3x faster than the IEEE compliant version for floats but it loses about 2 bits of precision.
gael