Re: [eigen] Non-optimal sse assembly code with gcc

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Thanks for the hint on sse3 - I completely missed enabling it. Enabled
and code looks now much better.

I would love to know whether your rationale on denormalized numbers is
correct though. Does anybody else know exactly?

Cheers
Benjamin

On 01/23/2012 10:39 AM, Christoph Hertzberg wrote:
> On 22.01.2012 17:32, Benjamin Schindler wrote:
>> Hi
>>
>> I just had a close look at the assembly generated by the following
>> function:
>>
>> bool particleCheckSpheric(Eigen::AlignedVector3<float> pos1,
>> Eigen::AlignedVector3<float> pos2, float particleSize)
>> {
>> return particleSize*particleSize > (pos1-pos2).squaredNorm();
>> }
>>
>> The assembly I got is the following (compiled on an amd64 machine using
>> gcc 4.5.3, with -O2 -DNDEBUG):
>>
>> 01: movaps (%rdi), %xmm1
>> 03: mulss %xmm0, %xmm0
>> 04: subps (%rsi), %xmm1
>> 05: mulps %xmm1, %xmm1
>> 06: movaps %xmm1, %xmm2
>> 07: movhlps %xmm1, %xmm2
>> 08: addps %xmm1, %xmm2
>> 09: movaps %xmm2, %xmm1
>> 10: shufps $0x1, %xmm2, %xmm1
>> 11: addss %xmm1, %xmm2
>> 12: ucomiss %xmm2, %xmm0
>> 13: seta %al
>> 14: retq
>>
>> Notice line 6 (and 9): It seems to me that these copies are unnecessary
>> as only the low quadword is really used. Is this a problem of the
>> compiler is this an eigen issue?
> 
> Side-note: I guess, if you activate SSE3 line 6 to 11 will be replaced
> by just two haddps %xmm1, %xmm1
> 
> 
> And I think gcc does everything correct, as the Eigen source (w/o SSE3)
> says:
> 
> template<> EIGEN_STRONG_INLINE float predux<Packet4f>(const Packet4f& a)
> {
>   Packet4f tmp = _mm_add_ps(a, _mm_movehl_ps(a,a));
>   return pfirst(_mm_add_ss(tmp, _mm_shuffle_ps(tmp,tmp, 1)));
> }
> 
> So _mm_movehl_ps(a,a) actually requires that the upper half is copied
> from a (i.e. xmm1). And I guess it can make a difference, because if the
> upper half of xmm2 happens to contain a denormalized number, the addps
> instruction might be slower on some hardware (not sure about that, though).
> 
> Christoph
> 
> 




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/