Re: [eigen] Re: 4x4 matrix inverse

[ Thread Index | Date Index | More Archives ]

2009/12/15 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> 2009/12/15 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> On Tue, Dec 15, 2009 at 10:15 AM, Gael Guennebaud
>> <gael.guennebaud@xxxxxxxxx> wrote:
>>> On Tue, Dec 15, 2009 at 9:09 AM, Hauke Heibel
>>> <hauke.heibel@xxxxxxxxxxxxxx> wrote:
>>>> On Tue, Dec 15, 2009 at 5:25 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>>>> wrote:
>>>>> There is one thing where I didn't follow Intel's code: they use a
>>>>> RCPSS instruction to compute 1/det approximately, then followed by a
>>>>> Newton-Raphson iteration. This sacrifices up to 2 bits of precision in
>>>>> the mantissa, which already is a bit nontrivial for us (4x4 matrix
>>>>> inversion is a basic operation on which people will rely very
>>>>> heavily).
>>>> Hi Benoit, I recognized that you wrote in your commit log "that elsewhere
>>>> in Eigen we dont allow ourselves this approximation" (Newton-Raphson). I
>>>> just recalled stumbling once over such an approximation hidden in Eigen. It
>>>> is for the SSE computation of the inverse square root.
>>>> The approximation takes place overe here in Packet4f ei_psqrt(Packet4f
>>>> _x) at the very bottom.
>>> Indeed, a while ago I think we agreed that this optimized version should
>>> be enabled only if EIGEN_FAST_MATH==1 but it seems we never did that change.
>>> Let me also recall that -ffast-math => EIGEN_FAST_MATH.
>>> So still OK to use the SSE intrinsic _mm_sqrt_p* by default and use the
>>> optimized version only when EIGEN_FAST_MATH==1 ?
>> sorry what I said is wrong. So currently we have by default
>> EIGEN_FAST_MATH==1, and defining EIGEN_FAST_MATH==0 disable the vectorized
>> version of sin and cos. If we want to be safe I suggest the following rules:
>> 1-  set EIGEN_FAST_MATH==0 by default
>> 2- if "-ffast-fast" and EIGEN_FAST_MATH is not defined then we set
>> 3- EIGEN_FAST_MATH==0 => no vectorized version of sin and cos and the use of
>> _mm_sqrt_p* for ei_psqrt
>> 4- EIGEN_FAST_MATH==1 => vectorized version of sin and cos + the use of an
>> optimized version of ei_psqrt
>> Also let me recall that the optimized version of ei_psqrt is about 3x faster
>> than the IEEE compliant version for floats but it loses about 2 bits of
>> precision.
> I agree with your plan but have 2 remarks:
> 1) as I said, I found that on core i7, the "fast division" using
> RCP+Newton was not faster than a plain DIV anymore. So, at least in
> this case, we see that we should always be doing DIV if very modern
> CPUs are targeted, a good test for which is EIGEN_VECTORIZE_SSE4.
> 2) I wonder: when an algo loses 2 bits of precision, is it true that
> doing one more Newton iteration would give full precision? If yes,
> that would be worth doing for heavy operations like sqrt!
> Theoretically, each Newton iteration doubles the number of accurate
> bits, but there may be bad effects of floating point arithmetic
> preventing that from happening in practice...

About 2), I just experimented for the division, and the answer is,
even with 2 Newton iterations you don't recover full precision (with
the DIV instruction, inversion of permutation matrices is exactly the
same as transposition, which I test for in prec_inverse+4x4).

> Benoit

Mail converted by MHonArc 2.6.19+