Re: [eigen] Re: 4x4 matrix inverse |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] Re: 4x4 matrix inverse*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Tue, 15 Dec 2009 13:23:03 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=3SwLQJnj0lloo93j8Ae9v68IAPqsJKT7r5hBQP5C1fc=; b=NDbFuu+l7zbjKbJJ803AD//WAwyi+1oNMsjhW+oxnHOGkQ3JpmH6NqndmJeJH7NG/w 8ExAg96nHRvAUy0suoM3eKFV9v1V+DbQMxsmjiw5Y3T483hTca0avCbahY03/ig21+4I jiiutjxUOkmsQeV09qoWjvSlwGYBOnO3j5jzQ=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=uKgPpiqHz8OhKKN1TtHpadFZtQMn3TlnR+ogUO6o6HEadiFjtnuqcWZRtu49AKITsf 5AQP1/HSGaBkspZnKlaYBugdXHwkk7ZO/Yc+4Vk1eJmsewCM1+jbmrxFU/y4RL7JeJd1 YZjzrBVMxfK8y5jyKc2dtDa5QU8rkLrF+gMl8=

On Tue, Dec 15, 2009 at 12:52 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:

this is more or less what we do in ei_ploadu, but using a pair of movsd/movhps and inline assembly to avoid GCC messing up. But currently for MSVC we use raw unaligned loads, so we probably should switch to a pair of intrinsics.

I chose a pair of movsd/movhps because it appeared to be fastest option for my CPU.

Finally, to be clear I think you can change these 2 lines with ei_ploadu and the perf. should be the same.

gael

2009/12/15 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:

You're right, I didn't look carefully at this line from Intel, and>

>

> On Tue, Dec 15, 2009 at 5:25 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>

> wrote:

>>

>> Hi,

>>

>> To summarize recent commits: all this is now done in the development

>> branch, it only remains to consider backporting.

>>

>> The SSE code is 4.5x faster than my plain scalar path! I guess that's

>> explained not only by SSE intrinsics but also by better ordering of

>> instructions...

>>

>> There is one thing where I didn't follow Intel's code: they use a

>> RCPSS instruction to compute 1/det approximately, then followed by a

>> Newton-Raphson iteration. This sacrifices up to 2 bits of precision in

>> the mantissa, which already is a bit nontrivial for us (4x4 matrix

>> inversion is a basic operation on which people will rely very

>> heavily). To help solve that dilemma (performance vs precision) I

>> benchmarked it, and it turns out than on my core i7, DIVSS is

>> slightly faster !! Intel's paper was written for the pentium 3 so

>> that's perhaps not surprising, but I saw forum posts mentioning that

>> the RCPSS trick is still faster on the Core2. If you want to test, see

>> lines 128-130 in Inverse_SSE.h.

>>

>> I have a question. I currently get warnings in this code (taken

>> straight from Intel):

>>

>> __m128 tmp1;

>> tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)), (__m64*)(src+

>> 4));

>

> why don't you use tmp1 = ei_pload(src); since we know src will be aligned ?

just below, there are these lines:

tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src+ 2)),

(__m64*)(src+ 6));

row3 = _mm_loadh_pi(_mm_loadl_pi(row3, (__m64*)(src+10)),

(__m64*)(src+14));

which cannot use a ei_pload since they use non-multiple-of-16-bytes

offsets, and I was confused from there.

By the way, is this trick (to load from 64-bit aligned addresses)

worth abstracting into a "ei_pload8" function? It's probably faster

than completely unaligned loads...

this is more or less what we do in ei_ploadu, but using a pair of movsd/movhps and inline assembly to avoid GCC messing up. But currently for MSVC we use raw unaligned loads, so we probably should switch to a pair of intrinsics.

I chose a pair of movsd/movhps because it appeared to be fastest option for my CPU.

Finally, to be clear I think you can change these 2 lines with ei_ploadu and the perf. should be the same.

gael

Thanks for the tip,

Benoit

>

> gael.

>

>>

>> The warning claims that tmp1 is used uninitalized here. GCC doesn't

>> understand that it only is passed to _mm_loadl_pi that writes into it,

>> does not read from it. How to fix that warning? I tested initializing

>> tmp1, this had a not-totally-negligible impact on performance (because

>> there are 2 more variables that need this). There does not seem to be

>> an __attribute__ for this.

>>

>> Benoit

>>

>> 2009/12/4 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:

>> > Hi,

>> >

>> > Long ago I thought it would be a good idea to optimize 4x4 matrix

>> > inverse using "Euler's trick" which reduced greatly the number of

>> > operations but relies on some 2x2 block inside the matrix being

>> > invertible.

>> >

>> > The problem is that this gives bad precision, and the best compromise

>> > that I could find between precision and performance is still:

>> > - 10x more imprecise in the worst case

>> > - only 25% faster.

>> >

>> > My last reason to clinge to this approach is that it was supposedly

>> > more vectorizable, but reading this,

>> > ftp://download.intel.com/design/PentiumIII/sml/24504301.pdf

>> > I realized that Intel engineers actually figured how to vectorize the

>> > plain old cofactors approach very efficiently.

>> >

>> > So I'll switch to cofactors in both branches, I think. I'll also

>> > implement SSE at least in the default branch.

>> >

>> > Question: do you think that Intel's code is provided free of use? Or

>> > should I avoid looking at it? Even if I can't look at it, they still

>> > provide good explanations.

>> >

>> > Benoit

>> >

>

**Follow-Ups**:**Re: [eigen] Re: 4x4 matrix inverse***From:*Benoit Jacob

**References**:**[eigen] 4x4 matrix inverse***From:*Benoit Jacob

**[eigen] Re: 4x4 matrix inverse***From:*Benoit Jacob

**Re: [eigen] Re: 4x4 matrix inverse***From:*Gael Guennebaud

**Re: [eigen] Re: 4x4 matrix inverse***From:*Benoit Jacob

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Re: 4x4 matrix inverse** - Next by Date:
**Re: [eigen] Re: 4x4 matrix inverse** - Previous by thread:
**Re: [eigen] Re: 4x4 matrix inverse** - Next by thread:
**Re: [eigen] Re: 4x4 matrix inverse**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |