[eigen] Non-optimal sse assembly code with gcc |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Hi
I just had a close look at the assembly generated by the following
function:
bool particleCheckSpheric(Eigen::AlignedVector3<float> pos1,
Eigen::AlignedVector3<float> pos2, float particleSize)
{
return particleSize*particleSize > (pos1-pos2).squaredNorm();
}
The assembly I got is the following (compiled on an amd64 machine using
gcc 4.5.3, with -O2 -DNDEBUG):
01: movaps (%rdi), %xmm1
03: mulss %xmm0, %xmm0
04: subps (%rsi), %xmm1
05: mulps %xmm1, %xmm1
06: movaps %xmm1, %xmm2
07: movhlps %xmm1, %xmm2
08: addps %xmm1, %xmm2
09: movaps %xmm2, %xmm1
10: shufps $0x1, %xmm2, %xmm1
11: addss %xmm1, %xmm2
12: ucomiss %xmm2, %xmm0
13: seta %al
14: retq
Notice line 6 (and 9): It seems to me that these copies are unnecessary
as only the low quadword is really used. Is this a problem of the
compiler is this an eigen issue?
Thank you
Benjamin