| Re: [eigen] two things |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
On Thursday 26 June 2008 18:55:22 Gael Guennebaud wrote:
> yes, exactly. but I'm still puzzled by these results since on a 2GHz
> core2 we could expect a peak performance of 8 GFlops and we are far
> far away. I've also tried c = a + b; => even slower. On the other hand
> with a += a; I could reach ~ 4.5 GFlops . For comparison purpose, our
> optimized matrix product on 1024x1024 matrices achieve ~9 GFlops ! yes
> 9 ! this is because the CPU can does an "add" and a "mul" at the same
> time... I guess the trick would be to do some prefetching but I did
> not manage to get any improvements so far...
I was thinking the same;
Here is what the critical loop looks like in assembly:
.L68:
movaps (%edx,%eax,4), %xmm0
addps (%esi,%eax,4), %xmm0
movaps %xmm0, (%edx,%eax,4)
addl $4, %eax
cmpl %eax, %ecx
jg .L68
So, for one productive instruction (the addps) there are 2 mov instructions
(and i don't could the 3 last instructions which go away once we peel the
loop). Could that somehow be improved?
By the way, I tried this benchmark without vectorization, and got 0.4 GFlops
at 400x400 size (where the cost of not linearizing is negligible) so the
benefit of vectorization here is somewhere between +25% and +50%.
By comparison, I made a simple benchmark for sum() of a big float vector
(really just modifying vdw_new). There, vectorization speeds up by 4x; and
when it is enabled I get 1.7 GFlop (counting 1G = 10^9) on my 1.66 GHz CPU.
So, much better. Not the theoretical maximum, but since this benchmark is
memory intensive, doing only one add per loaded number, I can believe that
1.7 GFlop is all what my laptop's memory allows. Perhaps the better flops in
the matrix product is because (especially with your cache-friendly code) it
is more computation intensive relatively to the amount of memory accesses.
Here is the performance-critical part of that sum() benchmark:
.L18:
addps (%ebx,%eax,4), %xmm1
addl $4, %eax
cmpl %eax, %edx
jg .L18
Cheers,
Benoit
Attachment:
signature.asc
Description: This is a digitally signed message part.
| Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |