Re: [eigen] Issues regarding Quaternion-alignment and const Maps

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2010/7/9 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> I have made a patch letting ei_pset1 use _mm_loaddup_pd when we have SSE3:
>
> template<> EIGEN_STRONG_INLINE Packet2d ei_pset1<double>(const double&  from) {
> #ifdef EIGEN_VECTORIZE_SSE3
>  return _mm_loaddup_pd(&from);
> #else
>  Packet2d res = _mm_set_sd(from);
>  return ei_vec2d_swizzle1(res, 0, 0);
> #endif
> }
>
> But guess what? It's actually not faster (perhaps even a bit slower)

ok, it is perhaps 0.001% faster. But since it allows to replace 3
instructions by 1 instructions, I guess it is still a win. So, I have
committed this, so ei_pset1 uses _mm_loaddup_pd when possible.

Benoit

> than our ei_vec2d_swizzle1!
>
> So let's just forget about it.
>
> Christoph, is  _mm_loaddup_pd the only SSE3 intrinsic your code is
> using ? If yes, by using ei_pset1 instead of _mm_loaddup_pd, you can
> make your code work on SSE2 !
>
> For the record,  ei_vec2d_swizzle1 is:
>
> #define ei_vec2d_swizzle1(v,p,q) \
>  (_mm_castsi128_pd(_mm_shuffle_epi32( _mm_castpd_si128(v),
> ((q*2+1)<<6|(q*2)<<4|(p*2+1)<<2|(p*2)))))
>
>
> Benoit
>
> 2010/7/9 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:
>> Benoit Jacob wrote:
>>>
>>> 2010/7/9 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:
>>>>
>>>> Benoit Jacob wrote:
>>>>>
>>>>> Wow, very good work.
>>>>>
>>>>> I indeed confirm the 2x speed improvement, and once i moved your
>>>>> benchmarking code to a non-inlinable function called from main(), it
>>>>> even got a bit higher (indeed GCC fails to optimize correctly code in
>>>>> the main() function).
>>>>>
>>>>> Could you make a patch against the development branch? (We're not
>>>>> going to add features to 2.0 at this point).
>>>>
>>>> I think I can do that, but most likely not before Monday/Tuesday.
>>>>
>>>>>
>>>>> http://eigen.tuxfamily.org/index.php?title=Developer%27s_Corner#Generating_a_patch
>>>>>
>>>>> Also, I didn't know about that loaddup instruction in SSE3. It's
>>>>> great! I'll have a look at using it in ei_pset1 when SSE3 is
>>>>> available.
>>>>
>>>> It's actually a pity that there is no complete list with *just* all
>>>> SSE-instructions (not mixed with every other x86-instruction), including
>>>> a
>>>> short description, maybe a usage example, and intrinsics for some common
>>>> compilers. At least I did't find any ...
>>>
>>> Yes, I've been trying to see if there is a single-precision equivalent
>>> for MOVDDUP and I still don't know...
>>
>>
>> I just searched every <*mmintrin.h> for float and found in <xmmintrin.h>:
>>
>> /* Create a vector with all four elements equal to *P.  */
>> extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
>> __artificial__))
>> _mm_load1_ps (float const *__P)
>> {
>>  return _mm_set1_ps (*__P);
>> }
>>
>> but looking at _mm_set1_ps, it doesn't really look like this is actually an
>> SSE instruction ...
>>
>>
>> --
>> ----------------------------------------------
>> Dipl.-Inf. Christoph Hertzberg
>> Cartesium 0.051
>> Universität Bremen
>> Enrique-Schmidt-Straße 5
>> 28359 Bremen
>>
>> Tel: (+49) 421-218-64252
>> ----------------------------------------------
>>
>>
>>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/