Re: [eigen] Non-optimal sse assembly code with gcc

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] Non-optimal sse assembly code with gcc
From: Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 23 Jan 2012 10:39:38 +0100

On 22.01.2012 17:32, Benjamin Schindler wrote:

Hi

I just had a close look at the assembly generated by the following
function:

bool particleCheckSpheric(Eigen::AlignedVector3<float> pos1,
Eigen::AlignedVector3<float> pos2, float particleSize)
{
return particleSize*particleSize > (pos1-pos2).squaredNorm();
}

The assembly I got is the following (compiled on an amd64 machine using
gcc 4.5.3, with -O2 -DNDEBUG):

01: movaps (%rdi), %xmm1
03: mulss %xmm0, %xmm0
04: subps (%rsi), %xmm1
05: mulps %xmm1, %xmm1
06: movaps %xmm1, %xmm2
07: movhlps %xmm1, %xmm2
08: addps %xmm1, %xmm2
09: movaps %xmm2, %xmm1
10: shufps $0x1, %xmm2, %xmm1
11: addss %xmm1, %xmm2
12: ucomiss %xmm2, %xmm0
13: seta %al
14: retq

Notice line 6 (and 9): It seems to me that these copies are unnecessary
as only the low quadword is really used. Is this a problem of the
compiler is this an eigen issue?

Side-note: I guess, if you activate SSE3 line 6 to 11 will be replacedby just two haddps %xmm1, %xmm1

And I think gcc does everything correct, as the Eigen source (w/o SSE3)says:


template<> EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a)
{
  Packet4f tmp = _mm_add_ps(a, _mm_movehl_ps(a,a));
  return pfirst(_mm_add_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1)));
}

So _mm_movehl_ps(a,a) actually requires that the upper half is copiedfrom a (i.e. xmm1). And I guess it can make a difference, because if theupper half of xmm2 happens to contain a denormalized number, the addpsinstruction might be slower on some hardware (not sure about that, though).


Christoph


--
----------------------------------------------
Dipl.-Inf. Christoph Hertzberg
Cartesium 0.051
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: (+49) 421-218-64252
----------------------------------------------

Follow-Ups:
- Re: [eigen] Non-optimal sse assembly code with gcc
  - From: Benjamin Schindler

References:
- [eigen] Non-optimal sse assembly code with gcc
  - From: Benjamin Schindler

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] Eigen and Visualization Toolkit (VTK)
Next by Date: Re: [eigen] Non-optimal sse assembly code with gcc
Previous by thread: [eigen] Non-optimal sse assembly code with gcc
Next by thread: Re: [eigen] Non-optimal sse assembly code with gcc

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/