Re: [eigen] Issues regarding Quaternion-alignment and const Maps

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] Issues regarding Quaternion-alignment and const Maps
From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
Date: Fri, 9 Jul 2010 18:51:24 -0400
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=AKw0myPoaEy4Ut9aTeaMPHCgMSzLUf3GG25l7b1O8G8=; b=Tc70d2YQAVwxF9v09QIFNePp7dtiRDGZNA2h/QT4c28emF4Jv30obZy3Opupb2SXbG RqqLTvaeK3zO2zO+XAjZfTIimPO8ONDSvvH5e/0gfkdH2sTOOh1E4N75xcj3BF4iubEd 5jZjR1Pdeu2HwNQVbdNxd7/hq3ZHlQUW/Wu/A=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=CRr3eryhc2sB3NShO7fRwVX9c3XoMfYfBNwopaewXZxbviGBRHeLdpfIxuE9k1OMdz iFj97tBnOKExtnmKTg7hXoAgnAi+yfi1BwTWz7y7nUKKFTxA6ocRiT+M2OJ47wMK9ej4 eXQcvX3SCyZi9vaR62D300t6W+at5VqXtVdQk=

2010/7/9 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> I have made a patch letting ei_pset1 use _mm_loaddup_pd when we have SSE3:
>
> template<> EIGEN_STRONG_INLINE Packet2d ei_pset1<double>(const double&  from) {
> #ifdef EIGEN_VECTORIZE_SSE3
>  return _mm_loaddup_pd(&from);
> #else
>  Packet2d res = _mm_set_sd(from);
>  return ei_vec2d_swizzle1(res, 0, 0);
> #endif
> }
>
> But guess what? It's actually not faster (perhaps even a bit slower)

ok, it is perhaps 0.001% faster. But since it allows to replace 3
instructions by 1 instructions, I guess it is still a win. So, I have
committed this, so ei_pset1 uses _mm_loaddup_pd when possible.

Benoit

> than our ei_vec2d_swizzle1!
>
> So let's just forget about it.
>
> Christoph, is  _mm_loaddup_pd the only SSE3 intrinsic your code is
> using ? If yes, by using ei_pset1 instead of _mm_loaddup_pd, you can
> make your code work on SSE2 !
>
> For the record,  ei_vec2d_swizzle1 is:
>
> #define ei_vec2d_swizzle1(v,p,q) \
>  (_mm_castsi128_pd(_mm_shuffle_epi32( _mm_castpd_si128(v),
> ((q*2+1)<<6|(q*2)<<4|(p*2+1)<<2|(p*2)))))
>
>
> Benoit
>
> 2010/7/9 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:
>> Benoit Jacob wrote:
>>>
>>> 2010/7/9 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:
>>>>
>>>> Benoit Jacob wrote:
>>>>>
>>>>> Wow, very good work.
>>>>>
>>>>> I indeed confirm the 2x speed improvement, and once i moved your
>>>>> benchmarking code to a non-inlinable function called from main(), it
>>>>> even got a bit higher (indeed GCC fails to optimize correctly code in
>>>>> the main() function).
>>>>>
>>>>> Could you make a patch against the development branch? (We're not
>>>>> going to add features to 2.0 at this point).
>>>>
>>>> I think I can do that, but most likely not before Monday/Tuesday.
>>>>
>>>>>
>>>>> http://eigen.tuxfamily.org/index.php?title=Developer%27s_Corner#Generating_a_patch
>>>>>
>>>>> Also, I didn't know about that loaddup instruction in SSE3. It's
>>>>> great! I'll have a look at using it in ei_pset1 when SSE3 is
>>>>> available.
>>>>
>>>> It's actually a pity that there is no complete list with *just* all
>>>> SSE-instructions (not mixed with every other x86-instruction), including
>>>> a
>>>> short description, maybe a usage example, and intrinsics for some common
>>>> compilers. At least I did't find any ...
>>>
>>> Yes, I've been trying to see if there is a single-precision equivalent
>>> for MOVDDUP and I still don't know...
>>
>>
>> I just searched every <*mmintrin.h> for float and found in <xmmintrin.h>:
>>
>> /* Create a vector with all four elements equal to *P.  */
>> extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
>> __artificial__))
>> _mm_load1_ps (float const *__P)
>> {
>>  return _mm_set1_ps (*__P);
>> }
>>
>> but looking at _mm_set1_ps, it doesn't really look like this is actually an
>> SSE instruction ...
>>
>>
>> --
>> ----------------------------------------------
>> Dipl.-Inf. Christoph Hertzberg
>> Cartesium 0.051
>> Universität Bremen
>> Enrique-Schmidt-Straße 5
>> 28359 Bremen
>>
>> Tel: (+49) 421-218-64252
>> ----------------------------------------------
>>
>>
>>
>

References:
- [eigen] Issues regarding Quaternion-alignment and const Maps
  - From: Christoph Hertzberg
- Re: [eigen] Issues regarding Quaternion-alignment and const Maps
  - From: Benoit Jacob
- Re: [eigen] Issues regarding Quaternion-alignment and const Maps
  - From: Christoph Hertzberg
- Re: [eigen] Issues regarding Quaternion-alignment and const Maps
  - From: Benoit Jacob
- Re: [eigen] Issues regarding Quaternion-alignment and const Maps
  - From: Christoph Hertzberg
- Re: [eigen] Issues regarding Quaternion-alignment and const Maps
  - From: Benoit Jacob
- Re: [eigen] Issues regarding Quaternion-alignment and const Maps
  - From: Christoph Hertzberg
- Re: [eigen] Issues regarding Quaternion-alignment and const Maps
  - From: Benoit Jacob

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] Issues regarding Quaternion-alignment and const Maps
Next by Date: Re: [eigen] Issues regarding Quaternion-alignment and const Maps
Previous by thread: Re: [eigen] Issues regarding Quaternion-alignment and const Maps
Next by thread: Re: [eigen] Issues regarding Quaternion-alignment and const Maps

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/