Re: [eigen] Issues regarding Quaternion-alignment and const Maps |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] Issues regarding Quaternion-alignment and const Maps*From*: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>*Date*: Mon, 12 Jul 2010 19:19:40 -0400*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=LcZFH+QRvenxvBFn+gUkyd5a8KJuHhc8ZoRPfGGR+A0=; b=GisozpNs85+Ow9/wH8KY9t+bb9ikj+znJzA1S62dOQj0xAM/BTIoWtR6POVZXc348W SoXMgXLNXJzK4m5OgW0dvNqEXy4lOjJUW2rL/WKWLJFcjgk/ooEnWAdriCgcW7DNGBGc jGbLbdPNolC/MTRHvN4l7s8k77NDV6GZI9buc=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=QLGULBFinrHsOLAeam9+5x9uP4tMLtxLez4nD2/iy8F050U9bxzdTrK2g0bG6RC1Ny goSmveBPFlgGhP5f1OpsESP3Y5lyFRTIBn4zr7ca98SmkriqYM4UywubP9Wd9mw+082J d9hbHsYFR4LVpwbtofWNKqnifiwqrrPq0NaEk=

(Letting Gael handle this since he now looked at this topic closer than I did). 2010/7/12 Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>: > Here comes the patch, > > genericity (is that a word?) is based on Gael's suggestions (I just > replaced ei_por by ei_pxor, and used the same mask for both addsub > replacements). > > I was surprised at first that I only get a speedup of about ~1.6x > against non-vectorized version, but then found out that originally > -msse2 was actually slower than the version without vectorization. > Anyways, now -msse2 and -msse3 both run faster than just -O2 (on my > Core2 Duo). > On my (rather archaic) Athlon64 the current sse2 version (it does not > support sse3) is *slower* than just using -O2 :( > Which, by the way, reminds me of the original topic of this thread ;) > > Christoph > > > Gael Guennebaud schrieb: >> here a variant using at most as possible generic code: >> >> const __m128d mask1 = _mm_castsi128_pd(_mm_set_epi32(0x0,0x0,0x80000000,0x0)); >> const __m128d mask2 = _mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0)); >> >> Quaternion<double> res; >> >> typedef ei_packet_traits<double>::type Packet; >> >> const double* a = _a.coeffs().data(); >> Packet b_xy = _b.coeffs().template packet<Aligned>(0); >> Packet b_zw = _b.coeffs().template packet<Aligned>(2); >> Packet a_xx = ei_pset1(a[0]); >> Packet a_yy = ei_pset1(a[1]); >> Packet a_zz = ei_pset1(a[2]); >> Packet a_ww = ei_pset1(a[3]); >> Packet t1, t2; >> >> t1 = ei_padd(ei_pmul(a_ww, b_xy), ei_pmul(a_yy, b_zw)); >> t2 = ei_psub(ei_pmul(a_zz, b_xy), ei_pmul(a_xx, b_zw)); >> >> #ifdef __SSE3__ >> ei_pstore(&res.x(), _mm_addsub_pd(t1, ei_preverse(t2))); >> #else >> ei_pstore(&res.x(), ei_padd(t1, ei_por(mask1,ei_preverse(t2)))); >> #endif >> >> t1 = ei_psub(ei_pmul(a_ww, b_zw), ei_pmul(a_yy, b_xy)); >> t2 = ei_padd(ei_pmul(a_zz, b_zw), ei_pmul(a_xx, b_xy)); >> #ifdef __SSE3__ >> ei_pstore(&res.z(), ei_preverse(_mm_addsub_pd(ei_preverse(t1), t2))); >> #else >> ei_pstore(&res.z(), ei_padd(t1, ei_por(mask2,ei_preverse(t2)))); >> #endif >> >> return res; >> >> Actually, my recent work on the vectorization of complexes, and this >> code, let me thought that it would be a good idea to add ei_paddsub >> and ei_psubadd functions such that we could write generic vectorized >> code for complex and quaternions (generic in the sense it would work >> for all vector engine). >> >> Here is how I see it. For instance let's take the example of the >> quaternion multiplication. We could have a generic >> >> template<typename Quat> Quat ei_quatmul(Quat& a, Quat& b); >> >> function calling a ei_quatmul_selector which would be specialized for >> the 3 following configurations: >> >> 1 - ei_packet_traits<Quat::Scalar>::size == 2 => the above code >> 2 - ei_packet_traits<Quat::Scalar>::size == 4 => the code we already >> have but written in a generic way >> 3 - otherwise => scalar path >> >> And we should make sure that one can specialize this function for a >> given scalar type/vector engine in the case some specific >> optimizations can be done. >> >> gael. >> >> On Sat, Jul 10, 2010 at 1:11 AM, Christoph Hertzberg >> <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: >>> Benoit Jacob wrote: >>>> I have made a patch letting ei_pset1 use _mm_loaddup_pd when we have SSE3: >>>> >>>> template<> EIGEN_STRONG_INLINE Packet2d ei_pset1<double>(const double& >>>> from) { >>>> #ifdef EIGEN_VECTORIZE_SSE3 >>>> return _mm_loaddup_pd(&from); >>>> #else >>>> Packet2d res = _mm_set_sd(from); >>>> return ei_vec2d_swizzle1(res, 0, 0); >>>> #endif >>>> } >>>> >>>> But guess what? It's actually not faster (perhaps even a bit slower) >>>> than our ei_vec2d_swizzle1! >>>> >>>> So let's just forget about it. >>>> >>>> Christoph, is _mm_loaddup_pd the only SSE3 intrinsic your code is >>>> using ? If yes, by using ei_pset1 instead of _mm_loaddup_pd, you can >>>> make your code work on SSE2 ! >>> I guess the most important SSE3 instruction is _mm_addsub_pd which adds the >>> first and subtracts the second element. If there is a code which negates >>> just one element, this could be replaced. >>> >>> Googleing a bit implies that the SSE-way to do it is to XOR with >>> {-0.0, 0.0} (or the other way around). I will try that ... >>> >>> Christoph > > > -- > ---------------------------------------------- > Dipl.-Inf. Christoph Hertzberg > Cartesium 0.051 > Universität Bremen > Enrique-Schmidt-Straße 5 > 28359 Bremen > > Tel: (+49) 421-218-64252 > ---------------------------------------------- >

**References**:**[eigen] Issues regarding Quaternion-alignment and const Maps***From:*Christoph Hertzberg

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Benoit Jacob

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Christoph Hertzberg

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Benoit Jacob

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Christoph Hertzberg

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Benoit Jacob

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Christoph Hertzberg

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Benoit Jacob

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Christoph Hertzberg

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Gael Guennebaud

**Re: [eigen] Issues regarding Quaternion-alignment and const Maps***From:*Christoph Hertzberg

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Issues regarding Quaternion-alignment and const Maps** - Next by Date:
**Re: [eigen] questions and remarks to automatic differentiation based on adol-c** - Previous by thread:
**Re: [eigen] Issues regarding Quaternion-alignment and const Maps** - Next by thread:
**Re: [eigen] Issues regarding Quaternion-alignment and const Maps**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |