On Tue, Dec 15, 2009 at 9:09 AM, Hauke Heibel wrote:

Indeed, a while ago I think we agreed that this optimized version should be enabled only if EIGEN_FAST_MATH==1 but it seems we never did that change. Let me also recall that -ffast-math => EIGEN_FAST_MATH.

So still OK to use the SSE intrinsic _mm_sqrt_p* by default and use the optimized version only when EIGEN_FAST_MATH==1 ?

gael

There is one thing where I didn't follow Intel's code: they use a

RCPSS instruction to compute 1/det approximately, then followed by a

Newton-Raphson iteration. This sacrifices up to 2 bits of precision in

the mantissa, which already is a bit nontrivial for us (4x4 matrix

inversion is a basic operation on which people will rely very

heavily).

Hi Benoit, I recognized that you wrote in your commit log "that elsewhere in Eigen we dont allow ourselves this approximation" (Newton-Raphson). I just recalled stumbling once over such an approximation hidden in Eigen. It is for the SSE computation of the inverse square root.

The approximation takes place overe here in Packet4f ei_psqrt(Packet4f _x) at the very bottom.

- Hauke

