Re: [eigen] two things |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Here is the result here after revision 824739 introducing packet(int): 2.30471s 0.646553 GFlops 400 x 400 2.51917s 0.59151 GFlops 320 x 500 3.00975s 0.495097 GFlops 256 x 625 2.92007s 0.510302 GFlops 250 x 640 2.9007s 0.513709 GFlops 200 x 800 2.9415s 0.506583 GFlops 160 x 1000 2.90815s 0.512393 GFlops 128 x 1250 2.73835s 0.544166 GFlops 125 x 1280 2.889s 0.515789 GFlops 100 x 1600 2.92752s 0.509003 GFlops 80 x 2000 2.86383s 0.520322 GFlops 64 x 2500 2.9053s 0.512896 GFlops 50 x 3200 2.90992s 0.512081 GFlops 40 x 4000 2.90439s 0.513056 GFlops 32 x 5000 2.80648s 0.530955 GFlops 25 x 6400 2.85519s 0.521896 GFlops 20 x 8000 2.79833s 0.532503 GFlops 16 x 10000 2.8511s 0.522646 GFlops 10 x 16000 2.81542s 0.52927 GFlops 8 x 20000 2.80733s 0.530795 GFlops 5 x 32000 2.76623s 0.53868 GFlops 4 x 40000 2.85234s 0.522418 GFlops 2.80785s 0.530697 GFlops So, as expected, this problem is solved. > hand coded vector with loop peeling: > 1.0101 sec 1.47521 GFlops > > VectorXf(400*400): > 1.50368 0.990978 GFlops So it would be much worth peeling loops. Now that we have a real linear path (and could also write a linear path in non-vectorized case) this will be much easier and more efficient. Cheers, Benoit PS. There was a crash in benchVecAdd caused by ei_pload on non-aligned address. It is tricky as of course with a bit of luck you could get three aligned addresses in a row, especially on a 64 bit system like yours... fixed in bench/.
Attachment:
signature.asc
Description: This is a digitally signed message part.
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |