|Re: [eigen] Re: 4x4 matrix inverse|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
Quoting Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> Actually, these lines were not equivalent to loads !
> When you look at this,
> tmp1 = _mm_loadh_pi(_mm_loadl_pi(tmp1, (__m64*)(src)),
> (__m64*)(src+ 4));
> The second half is loaded from src+4, not src+2.
> What is being loaded here is the top-left 2x2 corner of the matrix.
Ah, I was wondering what the purpose was. But can't the same be achieved
by a combination of
1. aligned loads of the matrix rows into say a, b, c, d (a=[a4,a3,a2,a1]
and so on)
2. unpack quad word pairs: _mm_unpackhi_epi64(b, a) apparently yields
[a4,a3,b4,b3] (upper left) and _mm_unpacklo_epi64(b, a) yields [a2, a1,
b2, b1] (upper right)? (this is SSE2, though)
I have no idea how the performance compares, though. (or whether it
works at all)