Re: [eigen] two things

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On Thu, Jun 26, 2008 at 6:12 PM, Benoît Jacob <jacob@xxxxxxxxxxxxxxx> wrote:
> Here is the result here after revision 824739 introducing packet(int):
>
> 2.30471s  0.646553 GFlops
> 400 x 400  2.51917s   0.59151 GFlops
> 320 x 500  3.00975s   0.495097 GFlops
> 256 x 625  2.92007s   0.510302 GFlops
> 250 x 640  2.9007s   0.513709 GFlops
> 200 x 800  2.9415s   0.506583 GFlops
> 160 x 1000  2.90815s   0.512393 GFlops
> 128 x 1250  2.73835s   0.544166 GFlops
> 125 x 1280  2.889s   0.515789 GFlops
> 100 x 1600  2.92752s   0.509003 GFlops
> 80 x 2000  2.86383s   0.520322 GFlops
> 64 x 2500  2.9053s   0.512896 GFlops
> 50 x 3200  2.90992s   0.512081 GFlops
> 40 x 4000  2.90439s   0.513056 GFlops
> 32 x 5000  2.80648s   0.530955 GFlops
> 25 x 6400  2.85519s   0.521896 GFlops
> 20 x 8000  2.79833s   0.532503 GFlops
> 16 x 10000  2.8511s   0.522646 GFlops
> 10 x 16000  2.81542s   0.52927 GFlops
> 8 x 20000  2.80733s   0.530795 GFlops
> 5 x 32000  2.76623s   0.53868 GFlops
> 4 x 40000  2.85234s   0.522418 GFlops
> 2.80785s   0.530697 GFlops
>
> So, as expected, this problem is solved.

great !

>> hand coded vector with loop peeling:
>> 1.0101 sec 1.47521 GFlops
>>
>> VectorXf(400*400):
>> 1.50368 0.990978 GFlops
>
> So it would be much worth peeling loops. Now that we have a real linear path
> (and could also write a linear path in non-vectorized case) this will be much
> easier and more efficient.

yes, exactly. but I'm still puzzled by these results since on a 2GHz
core2  we could expect a peak performance of 8 GFlops and we are far
far away. I've also tried c = a + b; => even slower. On the other hand
with a += a;  I could reach ~ 4.5 GFlops . For comparison purpose, our
optimized matrix product on 1024x1024 matrices achieve ~9 GFlops ! yes
9 ! this is because the CPU can does an "add" and a "mul" at the same
time... I guess the trick would be to do some prefetching but I did
not manage to get any improvements so far...

Gael.


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/