|Re: [eigen] Re: 4x4 matrix inverse|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Re: 4x4 matrix inverse
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Tue, 15 Dec 2009 07:25:39 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=+NkMOnnO2gdyD1UNgCo0ytWyHtLERCqP4xNArzJNC0Y=; b=BbBxHOzz1+zEicr+0nyoJl3yZDqoLo+ieFkiyLvf5kNCWogtWmcFa8I67eH3GMIBe0 8VziGkKCI7bJ15pnWaOaeto+vXz3j3FY42cyJHMrwqw02a3kP7C4TKP1eVg3GFt2au7b yB4CcRonKTEropHbzryWPlMYBqHwrKpDaK2Cs=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=me4ERxEOfbIDN4ZJ11J5NcZswemgL+WWP3mmrNDh7wahRjeNywxFsVCnE9elZX8WJx 0rnasc85SAZf9aIUC1ikVdmuT4gSSwUNqQM+apsIx/Q3JcNPMfCt3ib98W2oKjCwD+It VjJDLHYNsAAvU6+MhZYldamhCqVz4bhGpU5sw=
2009/12/15 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> 2009/12/15 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>> On Tue, Dec 15, 2009 at 10:15 AM, Gael Guennebaud
>> <gael.guennebaud@xxxxxxxxx> wrote:
>>> On Tue, Dec 15, 2009 at 9:09 AM, Hauke Heibel
>>> <hauke.heibel@xxxxxxxxxxxxxx> wrote:
>>>> On Tue, Dec 15, 2009 at 5:25 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>>>>> There is one thing where I didn't follow Intel's code: they use a
>>>>> RCPSS instruction to compute 1/det approximately, then followed by a
>>>>> Newton-Raphson iteration. This sacrifices up to 2 bits of precision in
>>>>> the mantissa, which already is a bit nontrivial for us (4x4 matrix
>>>>> inversion is a basic operation on which people will rely very
>>>> Hi Benoit, I recognized that you wrote in your commit log "that elsewhere
>>>> in Eigen we dont allow ourselves this approximation" (Newton-Raphson). I
>>>> just recalled stumbling once over such an approximation hidden in Eigen. It
>>>> is for the SSE computation of the inverse square root.
>>>> The approximation takes place overe here in Packet4f ei_psqrt(Packet4f
>>>> _x) at the very bottom.
>>> Indeed, a while ago I think we agreed that this optimized version should
>>> be enabled only if EIGEN_FAST_MATH==1 but it seems we never did that change.
>>> Let me also recall that -ffast-math => EIGEN_FAST_MATH.
>>> So still OK to use the SSE intrinsic _mm_sqrt_p* by default and use the
>>> optimized version only when EIGEN_FAST_MATH==1 ?
>> sorry what I said is wrong. So currently we have by default
>> EIGEN_FAST_MATH==1, and defining EIGEN_FAST_MATH==0 disable the vectorized
>> version of sin and cos. If we want to be safe I suggest the following rules:
>> 1- set EIGEN_FAST_MATH==0 by default
>> 2- if "-ffast-fast" and EIGEN_FAST_MATH is not defined then we set
>> 3- EIGEN_FAST_MATH==0 => no vectorized version of sin and cos and the use of
>> _mm_sqrt_p* for ei_psqrt
>> 4- EIGEN_FAST_MATH==1 => vectorized version of sin and cos + the use of an
>> optimized version of ei_psqrt
>> Also let me recall that the optimized version of ei_psqrt is about 3x faster
>> than the IEEE compliant version for floats but it loses about 2 bits of
> I agree with your plan but have 2 remarks:
> 1) as I said, I found that on core i7, the "fast division" using
> RCP+Newton was not faster than a plain DIV anymore. So, at least in
> this case, we see that we should always be doing DIV if very modern
> CPUs are targeted, a good test for which is EIGEN_VECTORIZE_SSE4.
> 2) I wonder: when an algo loses 2 bits of precision, is it true that
> doing one more Newton iteration would give full precision? If yes,
> that would be worth doing for heavy operations like sqrt!
> Theoretically, each Newton iteration doubles the number of accurate
> bits, but there may be bad effects of floating point arithmetic
> preventing that from happening in practice...
About 2), I just experimented for the division, and the answer is,
even with 2 Newton iterations you don't recover full precision (with
the DIV instruction, inversion of permutation matrices is exactly the
same as transposition, which I test for in prec_inverse+4x4).