[eigen] Non-optimal sse assembly code with gcc

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


I just had a close look at the assembly generated by the following function:

bool particleCheckSpheric(Eigen::AlignedVector3<float> pos1, Eigen::AlignedVector3<float> pos2, float particleSize)
    return particleSize*particleSize > (pos1-pos2).squaredNorm();

The assembly I got is the following (compiled on an amd64 machine using gcc 4.5.3, with -O2 -DNDEBUG):

01: movaps  (%rdi), %xmm1
03: mulss   %xmm0,  %xmm0
04: subps   (%rsi), %xmm1
05: mulps   %xmm1,  %xmm1
06: movaps  %xmm1,  %xmm2
07: movhlps %xmm1,  %xmm2
08: addps   %xmm1,  %xmm2
09: movaps  %xmm2,  %xmm1
10: shufps  $0x1,   %xmm2, %xmm1
11: addss   %xmm1,  %xmm2
12: ucomiss %xmm2,  %xmm0
13: seta    %al
14: retq

Notice line 6 (and 9): It seems to me that these copies are unnecessary as only the low quadword is really used. Is this a problem of the compiler is this an eigen issue?

Thank you

Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/