Re: [eigen] Re: 4x4 matrix inverse

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2009/12/15 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> 2009/12/15 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>
>>
>> On Tue, Dec 15, 2009 at 10:15 AM, Gael Guennebaud
>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>
>>>
>>> On Tue, Dec 15, 2009 at 9:09 AM, Hauke Heibel
>>> <hauke.heibel@xxxxxxxxxxxxxx> wrote:
>>>>
>>>> On Tue, Dec 15, 2009 at 5:25 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> There is one thing where I didn't follow Intel's code: they use a
>>>>> RCPSS instruction to compute 1/det approximately, then followed by a
>>>>> Newton-Raphson iteration. This sacrifices up to 2 bits of precision in
>>>>> the mantissa, which already is a bit nontrivial for us (4x4 matrix
>>>>> inversion is a basic operation on which people will rely very
>>>>> heavily).
>>>>
>>>> Hi Benoit, I recognized that you wrote in your commit log "that elsewhere
>>>> in Eigen we dont allow ourselves this approximation" (Newton-Raphson). I
>>>> just recalled stumbling once over such an approximation hidden in Eigen. It
>>>> is for the SSE computation of the inverse square root.
>>>>
>>>> The approximation takes place overe here in Packet4f ei_psqrt(Packet4f
>>>> _x) at the very bottom.
>>>
>>> Indeed, a while ago I think we agreed that this optimized version should
>>> be enabled only if EIGEN_FAST_MATH==1 but it seems we never did that change.
>>> Let me also recall that -ffast-math => EIGEN_FAST_MATH.
>>>
>>> So still OK to use the SSE intrinsic _mm_sqrt_p* by default and use the
>>> optimized version only when EIGEN_FAST_MATH==1 ?
>>
>>
>> sorry what I said is wrong. So currently we have by default
>> EIGEN_FAST_MATH==1, and defining EIGEN_FAST_MATH==0 disable the vectorized
>> version of sin and cos. If we want to be safe I suggest the following rules:
>>
>> 1-  set EIGEN_FAST_MATH==0 by default
>>
>> 2- if "-ffast-fast" and EIGEN_FAST_MATH is not defined then we set
>> EIGEN_FAST_MATH==1
>>
>> 3- EIGEN_FAST_MATH==0 => no vectorized version of sin and cos and the use of
>> _mm_sqrt_p* for ei_psqrt
>>
>> 4- EIGEN_FAST_MATH==1 => vectorized version of sin and cos + the use of an
>> optimized version of ei_psqrt
>>
>> Also let me recall that the optimized version of ei_psqrt is about 3x faster
>> than the IEEE compliant version for floats but it loses about 2 bits of
>> precision.
>
> I agree with your plan but have 2 remarks:
>
> 1) as I said, I found that on core i7, the "fast division" using
> RCP+Newton was not faster than a plain DIV anymore. So, at least in
> this case, we see that we should always be doing DIV if very modern
> CPUs are targeted, a good test for which is EIGEN_VECTORIZE_SSE4.
>
> 2) I wonder: when an algo loses 2 bits of precision, is it true that
> doing one more Newton iteration would give full precision? If yes,
> that would be worth doing for heavy operations like sqrt!
> Theoretically, each Newton iteration doubles the number of accurate
> bits, but there may be bad effects of floating point arithmetic
> preventing that from happening in practice...

About 2), I just experimented for the division, and the answer is,
even with 2 Newton iterations you don't recover full precision (with
the DIV instruction, inversion of permutation matrices is exactly the
same as transposition, which I test for in prec_inverse+4x4).

>
> Benoit
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/