[eigen] Vectorization of complex

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: [eigen] Vectorization of complex
From: David Luitz <tux008@xxxxxxxxxxxxxx>
Date: Thu, 20 Jan 2011 23:11:56 +0100
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=5aR3ONdFoCSY1srt2bAYwg3S5AHc3AxvgA0wUX/tPQo=; b=V4E8HYkfYWkhNaAomPt9z/8/oWwRV6QdTmPF/46ECXRIkX2Uoz/Z1Zg6SQcpZPhI4w mRpFJc0+t7LTjlwfqW5sr7wKMZbql5PUjTzmvf0oycYnojp4pD5iUeVWVW/O5Wk1R0mw +fCYgIzJJ/x9l+galysFw1nn3n4ODTStsbT5E=
Domainkey-signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=Mku95+2XOXAZlmp7IGZmEMROjCTdkRot5y91DKvAIqhvR9aVWCWNbGmhxiskBwByQW RWAOzzEsMMy4l2snitVtuCdN8o21sqxoOASWMhCG69eAvkycbkhHzkHZZGvHXv5xmWdP gNV6sGsI18zr2shsoYG/uXTo55X1fku5A1o30=

Hi all,

I experimented a bit with vectorizing complex multiplication and amquite surprised by what I found:

First of all, I tried to implement complex multiplication using theSSE4_1 command intrinsics _mm_dp_pd and _mm_blend_pd.

I came up with the following implementation inEigen/src/Core/arch/SSE/Complex.h:

template<> EIGEN_STRONG_INLINE Packet1cd pmul<Packet1cd>(constPacket1cd& a, const Packet1cd& b)

{
  #ifdef EIGEN_VECTORIZE_SSE4_1

const __m128d mask =_mm_castsi128_pd(_mm_set_epi32(0x80000000,0x0,0x0,0x0));

return Packet1cd( _mm_blend_pd( _mm_dp_pd( _mm_xor_pd( a.v,mask ),b.v, 0xF1),

                _mm_dp_pd(vec2d_swizzle1(a.v,1,0),b.v,0xF2),
                0x02) );
  #else
  #ifdef EIGEN_VECTORIZE_SSE3

return Packet1cd(_mm_addsub_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0),b.v),

                                 _mm_mul_pd(vec2d_swizzle1(a.v, 1, 1),
                                            vec2d_swizzle1(b.v, 1, 0))));
  #else

const __m128d mask =_mm_castsi128_pd(_mm_set_epi32(0x0,0x0,0x80000000,0x0));

  return Packet1cd(_mm_add_pd(_mm_mul_pd(vec2d_swizzle1(a.v, 0, 0), b.v),

_mm_xor_pd(_mm_mul_pd(vec2d_swizzle1(a.v,1, 1),vec2d_swizzle1(b.v,1, 0)), mask)));

  #endif // SSE3
  #endif // SSE4_2
}

I then started testing the code and realized that unfortunately myimplementation is a bit slower than the SSE2 version. Even morepuzzling: Actually, the already existing SSE3 implementation is ALSOSLOWER than the SSE2 code! Does anybody have an idea, why my SSE4_1 codeis even slower than the SSE3 code?

By the way, we are talking about something like 1 percent run timedifference in my tests, but still if the SSE3 and SSE4 codes are notreally faster than SSE2, I think they should be removed...


I only tested this for complex matrix matrix products of
Eigen::Matrix<std::complex<float>, Eigen::Dynamic, Eigen::Dynamic> and

Eigen::Matrix<std::complex<double>, Eigen::Dynamic, Eigen::Dynamic>, somaybe I missed something and there are cases where the SSE3 code is usefull.


Greetings,
David Luitz

Follow-Ups:
- Re: [eigen] Vectorization of complex
  - From: Christoph Hertzberg

References:
- [eigen] Eigen2 to Eigen3 Migration Path
  - From: Tully Foote
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Benoit Jacob
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Tully Foote
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Hauke Heibel
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Benoit Jacob
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Hauke Heibel
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Gael Guennebaud
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Hauke Heibel
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Benoit Jacob
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Tully Foote
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Benoit Jacob
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Benoit Jacob
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Benoit Jacob
- Re: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Benoit Jacob
- RE: [eigen] Eigen2 to Eigen3 Migration Path
  - From: Yohann Solaro

Messages sorted by: [ date | thread ]
Prev by Date: RE: [eigen] Eigen2 to Eigen3 Migration Path
Next by Date: Re: [eigen] Eigen2 to Eigen3 Migration Path
Previous by thread: RE: [eigen] Eigen2 to Eigen3 Migration Path
Next by thread: Re: [eigen] Vectorization of complex

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/