Re: [eigen] Re: 4x4 matrix inverse

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


B. Ober told us about a more recent version of Intel's SSE fast
inversion code which can be found here:

http://software.intel.com/en-us/articles/optimized-matrix-library-for-use-with-the-intel-pentiumr-4-processors-sse2-instructions/

It takes advantage of SSE2 instructions, and there is also a version
for double, and all of this with a clearer license.

Both versions (floats and doubles) are already in the devel branch. If
you wonder, here are some results for 10,000,000 inversions:

float, no SSE : 1.72s
float, SSE (previous version): 0.29s
float, SSE2 (new version): 0.26s

double, no SSE: 1.72
double, SSE2: 0.45

(core2 2.66GHz, gcc 4.4, linux 64bits)

gael


On Tue, Dec 15, 2009 at 2:38 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/12/15 mmoll <Markus.Moll@xxxxxxxxxxxxxxxx>:
>> Hi
>>
>> Quoting Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>>> Actually, these lines were not equivalent to loads !
>>>
>>> When you look at this,
>>>
>>>    tmp1  = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)),
>>> (__m64*)(src+ 4));
>>>
>>> The second half is loaded from src+4, not src+2.
>>>
>>> What is being loaded here is the top-left 2x2 corner of the matrix.
>>
>> Ah, I was wondering what the purpose was. But can't the same be achieved
>> by a combination of
>>
>> 1. aligned loads of the matrix rows into say a, b, c, d (a=[a4,a3,a2,a1]
>> and so on)
>> 2. unpack quad word pairs: _mm_unpackhi_epi64(b, a) apparently yields
>> [a4,a3,b4,b3] (upper left) and _mm_unpacklo_epi64(b, a) yields [a2, a1,
>> b2, b1] (upper right)? (this is SSE2, though)
>>
>> I have no idea how the performance compares, though. (or whether it
>> works at all)
>
> You know this much better than me (honest), why don't you try it? If
> it's faster, we'll use it. SSE2 is OK, we require it anyway for any
> SSE code.
>
> Benoit
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/