[eigen] Non-optimal sse assembly code with gcc

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: <eigen@xxxxxxxxxxxxxxxxxxx>
Subject: [eigen] Non-optimal sse assembly code with gcc
From: Benjamin Schindler <bschindler@xxxxxxxxxxx>
Date: Sun, 22 Jan 2012 17:32:53 +0100

Hi

I just had a close look at the assembly generated by the followingfunction:

bool particleCheckSpheric(Eigen::AlignedVector3<float> pos1,Eigen::AlignedVector3<float> pos2, float particleSize)

{
    return particleSize*particleSize > (pos1-pos2).squaredNorm();
}

The assembly I got is the following (compiled on an amd64 machine usinggcc 4.5.3, with -O2 -DNDEBUG):


01: movaps  (%rdi), %xmm1
03: mulss   %xmm0,  %xmm0
04: subps   (%rsi), %xmm1
05: mulps   %xmm1,  %xmm1
06: movaps  %xmm1,  %xmm2
07: movhlps %xmm1,  %xmm2
08: addps   %xmm1,  %xmm2
09: movaps  %xmm2,  %xmm1
10: shufps  $0x1,   %xmm2, %xmm1
11: addss   %xmm1,  %xmm2
12: ucomiss %xmm2,  %xmm0
13: seta    %al
14: retq

Notice line 6 (and 9): It seems to me that these copies are unnecessaryas only the low quadword is really used. Is this a problem of thecompiler is this an eigen issue?


Thank you
Benjamin

Follow-Ups:
- Re: [eigen] Non-optimal sse assembly code with gcc
  - From: Christoph Hertzberg

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] How to convert a Matrix into (big) column vector
Next by Date: [eigen] Eigen and Visualization Toolkit (VTK)
Previous by thread: Re: [eigen] How to convert a Matrix into (big) column vector
Next by thread: Re: [eigen] Non-optimal sse assembly code with gcc

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/