>
>
> On Tue, Dec 15, 2009 at 10:15 AM, Gael Guennebaud
> <
gael.guennebaud@xxxxxxxxxx> wrote:
>>
>>
>> On Tue, Dec 15, 2009 at 9:09 AM, Hauke Heibel
>> <
hauke.heibel@xxxxxxxxxxxxxx> wrote:
>>>
>>> On Tue, Dec 15, 2009 at 5:25 AM, Benoit Jacob <
jacob.benoit.1@xxxxxxxxx>
>>> wrote:
>>>>
>>>> There is one thing where I didn't follow Intel's code: they use a
>>>> RCPSS instruction to compute 1/det approximately, then followed by a
>>>> Newton-Raphson iteration. This sacrifices up to 2 bits of precision in
>>>> the mantissa, which already is a bit nontrivial for us (4x4 matrix
>>>> inversion is a basic operation on which people will rely very
>>>> heavily).
>>>
>>> Hi Benoit, I recognized that you wrote in your commit log "that elsewhere
>>> in Eigen we dont allow ourselves this approximation" (Newton-Raphson). I
>>> just recalled stumbling once over such an approximation hidden in Eigen. It
>>> is for the SSE computation of the inverse square root.
>>>
>>> The approximation takes place overe here in Packet4f ei_psqrt(Packet4f
>>> _x) at the very bottom.
>>
>> Indeed, a while ago I think we agreed that this optimized version should
>> be enabled only if EIGEN_FAST_MATH==1 but it seems we never did that change.
>> Let me also recall that -ffast-math => EIGEN_FAST_MATH.
>>
>> So still OK to use the SSE intrinsic _mm_sqrt_p* by default and use the
>> optimized version only when EIGEN_FAST_MATH==1 ?
>
>
> sorry what I said is wrong. So currently we have by default
> EIGEN_FAST_MATH==1, and defining EIGEN_FAST_MATH==0 disable the vectorized
> version of sin and cos. If we want to be safe I suggest the following rules:
>
> 1- set EIGEN_FAST_MATH==0 by default
>
> 2- if "-ffast-fast" and EIGEN_FAST_MATH is not defined then we set
> EIGEN_FAST_MATH==1
>
> 3- EIGEN_FAST_MATH==0 => no vectorized version of sin and cos and the use of
> _mm_sqrt_p* for ei_psqrt
>
> 4- EIGEN_FAST_MATH==1 => vectorized version of sin and cos + the use of an
> optimized version of ei_psqrt
>
> Also let me recall that the optimized version of ei_psqrt is about 3x faster
> than the IEEE compliant version for floats but it loses about 2 bits of
> precision.