Re: [eigen] Issues regarding Quaternion-alignment and const Maps

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


I have made a patch letting ei_pset1 use _mm_loaddup_pd when we have SSE3:

template<> EIGEN_STRONG_INLINE Packet2d ei_pset1<double>(const double&  from) {
#ifdef EIGEN_VECTORIZE_SSE3
  return _mm_loaddup_pd(&from);
#else
  Packet2d res = _mm_set_sd(from);
  return ei_vec2d_swizzle1(res, 0, 0);
#endif
}

But guess what? It's actually not faster (perhaps even a bit slower)
than our ei_vec2d_swizzle1!

So let's just forget about it.

Christoph, is  _mm_loaddup_pd the only SSE3 intrinsic your code is
using ? If yes, by using ei_pset1 instead of _mm_loaddup_pd, you can
make your code work on SSE2 !

For the record,  ei_vec2d_swizzle1 is:

#define ei_vec2d_swizzle1(v,p,q) \
  (_mm_castsi128_pd(_mm_shuffle_epi32( _mm_castpd_si128(v),
((q*2+1)<<6|(q*2)<<4|(p*2+1)<<2|(p*2)))))


Benoit

2010/7/9 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:
> Benoit Jacob wrote:
>>
>> 2010/7/9 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:
>>>
>>> Benoit Jacob wrote:
>>>>
>>>> Wow, very good work.
>>>>
>>>> I indeed confirm the 2x speed improvement, and once i moved your
>>>> benchmarking code to a non-inlinable function called from main(), it
>>>> even got a bit higher (indeed GCC fails to optimize correctly code in
>>>> the main() function).
>>>>
>>>> Could you make a patch against the development branch? (We're not
>>>> going to add features to 2.0 at this point).
>>>
>>> I think I can do that, but most likely not before Monday/Tuesday.
>>>
>>>>
>>>> http://eigen.tuxfamily.org/index.php?title=Developer%27s_Corner#Generating_a_patch
>>>>
>>>> Also, I didn't know about that loaddup instruction in SSE3. It's
>>>> great! I'll have a look at using it in ei_pset1 when SSE3 is
>>>> available.
>>>
>>> It's actually a pity that there is no complete list with *just* all
>>> SSE-instructions (not mixed with every other x86-instruction), including
>>> a
>>> short description, maybe a usage example, and intrinsics for some common
>>> compilers. At least I did't find any ...
>>
>> Yes, I've been trying to see if there is a single-precision equivalent
>> for MOVDDUP and I still don't know...
>
>
> I just searched every <*mmintrin.h> for float and found in <xmmintrin.h>:
>
> /* Create a vector with all four elements equal to *P.  */
> extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
> __artificial__))
> _mm_load1_ps (float const *__P)
> {
>  return _mm_set1_ps (*__P);
> }
>
> but looking at _mm_set1_ps, it doesn't really look like this is actually an
> SSE instruction ...
>
>
> --
> ----------------------------------------------
> Dipl.-Inf. Christoph Hertzberg
> Cartesium 0.051
> Universität Bremen
> Enrique-Schmidt-Straße 5
> 28359 Bremen
>
> Tel: (+49) 421-218-64252
> ----------------------------------------------
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/