Re: [eigen] Issues regarding Quaternion-alignment and const Maps |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Issues regarding Quaternion-alignment and const Maps
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Fri, 9 Jul 2010 18:47:00 -0400
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=yGwxdol1EV9jOCw2WFTjiBDWbYN/R5WsiPcnw9BmJpc=; b=yGDJDIDg+LoYzaDMfxnIcS2zDve6tIc8vufZR1fxzkJGNLE52bNmouyHsugWfR7Hb/ zlIdxeJX6tfxTo85FSK154lcU5aXgniyMxRakRczL4PR9xncyFkb1e+pWJIw00XyCtXb xnEjy9LzsorXihHPlNGaFD4rvWyBkcTEQu7J8=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=OIRdUF+8X0zcRCnERasqD97iA2LVhDazij4y1WYfhyhMDCthamgj89l8NWKViB2v4i evy023dZK+VDTdnyFl8/rH7nDjLNCZuktmPNAau1Gc6bAs2hZlulr5KdjQf2LzsCZNp+ Tb1bCR8n/TJ/8LJXFOpxVoZ3ea+KRwfUYeKu4=
I have made a patch letting ei_pset1 use _mm_loaddup_pd when we have SSE3:
template<> EIGEN_STRONG_INLINE Packet2d ei_pset1<double>(const double& from) {
#ifdef EIGEN_VECTORIZE_SSE3
return _mm_loaddup_pd(&from);
#else
Packet2d res = _mm_set_sd(from);
return ei_vec2d_swizzle1(res, 0, 0);
#endif
}
But guess what? It's actually not faster (perhaps even a bit slower)
than our ei_vec2d_swizzle1!
So let's just forget about it.
Christoph, is _mm_loaddup_pd the only SSE3 intrinsic your code is
using ? If yes, by using ei_pset1 instead of _mm_loaddup_pd, you can
make your code work on SSE2 !
For the record, ei_vec2d_swizzle1 is:
#define ei_vec2d_swizzle1(v,p,q) \
(_mm_castsi128_pd(_mm_shuffle_epi32( _mm_castpd_si128(v),
((q*2+1)<<6|(q*2)<<4|(p*2+1)<<2|(p*2)))))
Benoit
2010/7/9 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:
> Benoit Jacob wrote:
>>
>> 2010/7/9 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>:
>>>
>>> Benoit Jacob wrote:
>>>>
>>>> Wow, very good work.
>>>>
>>>> I indeed confirm the 2x speed improvement, and once i moved your
>>>> benchmarking code to a non-inlinable function called from main(), it
>>>> even got a bit higher (indeed GCC fails to optimize correctly code in
>>>> the main() function).
>>>>
>>>> Could you make a patch against the development branch? (We're not
>>>> going to add features to 2.0 at this point).
>>>
>>> I think I can do that, but most likely not before Monday/Tuesday.
>>>
>>>>
>>>> http://eigen.tuxfamily.org/index.php?title=Developer%27s_Corner#Generating_a_patch
>>>>
>>>> Also, I didn't know about that loaddup instruction in SSE3. It's
>>>> great! I'll have a look at using it in ei_pset1 when SSE3 is
>>>> available.
>>>
>>> It's actually a pity that there is no complete list with *just* all
>>> SSE-instructions (not mixed with every other x86-instruction), including
>>> a
>>> short description, maybe a usage example, and intrinsics for some common
>>> compilers. At least I did't find any ...
>>
>> Yes, I've been trying to see if there is a single-precision equivalent
>> for MOVDDUP and I still don't know...
>
>
> I just searched every <*mmintrin.h> for float and found in <xmmintrin.h>:
>
> /* Create a vector with all four elements equal to *P. */
> extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
> __artificial__))
> _mm_load1_ps (float const *__P)
> {
> return _mm_set1_ps (*__P);
> }
>
> but looking at _mm_set1_ps, it doesn't really look like this is actually an
> SSE instruction ...
>
>
> --
> ----------------------------------------------
> Dipl.-Inf. Christoph Hertzberg
> Cartesium 0.051
> Universität Bremen
> Enrique-Schmidt-Straße 5
> 28359 Bremen
>
> Tel: (+49) 421-218-64252
> ----------------------------------------------
>
>
>