Re: [eigen] Re: 4x4 matrix inverse |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] Re: 4x4 matrix inverse*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Tue, 15 Dec 2009 09:56:45 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=J4s3pn2KRC0EjiZ3c+OR9/DfsfqZnzdAfLEQkSBTHvo=; b=rpjg/KTB/D29tD6SVXJj8w8YUHDVFj/uYhZoXgwM0KebHc701lmQSE0VwDt4DCVTqC njZa3FxyCMDT0jmnNW4Q0VM96vpvfHVIUaNxdwGsWMd5zCU11cNgYyfpo+E6KZQ3pfG/ 57+PLhaL+J5Sj4yFqrcDsfK5zvSg4rMkGJxUs=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=XInE77FEwOrQEnmYX43NLCeNuE/YJLQFnserTzxd0QbhI4bNW9LuF+GJRQeo8XPawk d7zQ4fiWtjdCYLyZyX6tnsvJfGc853fnVUrzhM+BRZXSF/kfhJyb747JF5CC8dNDSL0I s8cgxG3HazJusZNbduYVimPHFz7UR9QOu+nJQ=

On Tue, Dec 15, 2009 at 5:25 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:

why don't you use tmp1 = ei_pload(src); since we know src will be aligned ?

gael.

Hi,

To summarize recent commits: all this is now done in the development

branch, it only remains to consider backporting.

The SSE code is 4.5x faster than my plain scalar path! I guess that's

explained not only by SSE intrinsics but also by better ordering of

instructions...

There is one thing where I didn't follow Intel's code: they use a

RCPSS instruction to compute 1/det approximately, then followed by a

Newton-Raphson iteration. This sacrifices up to 2 bits of precision in

the mantissa, which already is a bit nontrivial for us (4x4 matrix

inversion is a basic operation on which people will rely very

heavily). To help solve that dilemma (performance vs precision) I

benchmarked it, and it turns out than on my core i7, DIVSS is

slightly faster !! Intel's paper was written for the pentium 3 so

that's perhaps not surprising, but I saw forum posts mentioning that

the RCPSS trick is still faster on the Core2. If you want to test, see

lines 128-130 in Inverse_SSE.h.

I have a question. I currently get warnings in this code (taken

straight from Intel):

__m128 tmp1;

tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)), (__m64*)(src+ 4));

why don't you use tmp1 = ei_pload(src); since we know src will be aligned ?

gael.

The warning claims that tmp1 is used uninitalized here. GCC doesn't

understand that it only is passed to _mm_loadl_pi that writes into it,

does not read from it. How to fix that warning? I tested initializing

tmp1, this had a not-totally-negligible impact on performance (because

there are 2 more variables that need this). There does not seem to be

an __attribute__ for this.

Benoit

2009/12/4 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:

> Hi,

>

> Long ago I thought it would be a good idea to optimize 4x4 matrix

> inverse using "Euler's trick" which reduced greatly the number of

> operations but relies on some 2x2 block inside the matrix being

> invertible.

>

> The problem is that this gives bad precision, and the best compromise

> that I could find between precision and performance is still:

> - 10x more imprecise in the worst case

> - only 25% faster.

>

> My last reason to clinge to this approach is that it was supposedly

> more vectorizable, but reading this,

> ftp://download.intel.com/design/PentiumIII/sml/24504301..pdf

> I realized that Intel engineers actually figured how to vectorize the

> plain old cofactors approach very efficiently.

>

> So I'll switch to cofactors in both branches, I think. I'll also

> implement SSE at least in the default branch.

>

> Question: do you think that Intel's code is provided free of use? Or

> should I avoid looking at it? Even if I can't look at it, they still

> provide good explanations.

>

> Benoit

>

**Follow-Ups**:**Re: [eigen] Re: 4x4 matrix inverse***From:*Benoit Jacob

**References**:**[eigen] 4x4 matrix inverse***From:*Benoit Jacob

**[eigen] Re: 4x4 matrix inverse***From:*Benoit Jacob

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Re: 4x4 matrix inverse** - Next by Date:
**Re: [eigen] Re: 4x4 matrix inverse** - Previous by thread:
**Re: [eigen] Re: 4x4 matrix inverse** - Next by thread:
**Re: [eigen] Re: 4x4 matrix inverse**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |