Re: [eigen] Re: 4x4 matrix inverse

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2009/12/15 mmoll <Markus.Moll@xxxxxxxxxxxxxxxx>:
> Hi
>
> Quoting Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>> Actually, these lines were not equivalent to loads !
>>
>> When you look at this,
>>
>>    tmp1  = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)),
>> (__m64*)(src+ 4));
>>
>> The second half is loaded from src+4, not src+2.
>>
>> What is being loaded here is the top-left 2x2 corner of the matrix.
>
> Ah, I was wondering what the purpose was. But can't the same be achieved
> by a combination of
>
> 1. aligned loads of the matrix rows into say a, b, c, d (a=[a4,a3,a2,a1]
> and so on)
> 2. unpack quad word pairs: _mm_unpackhi_epi64(b, a) apparently yields
> [a4,a3,b4,b3] (upper left) and _mm_unpacklo_epi64(b, a) yields [a2, a1,
> b2, b1] (upper right)? (this is SSE2, though)
>
> I have no idea how the performance compares, though. (or whether it
> works at all)

You know this much better than me (honest), why don't you try it? If
it's faster, we'll use it. SSE2 is OK, we require it anyway for any
SSE code.

Benoit



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/