[eigen] a record for Eigen: 250 GFLOPS !!

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi,

this morning I played with a 48 cores AMD SMP server (8 processors
AMD-Opteron-8439-SE, 6 cores each @ 2,8 GHz) and a bi-processor made
of Intel X5570 @ 2.93GHz (4 multithreaded cores each => a total of 8
cores, 16 threads), and here are the results for a product of 2048^2
matrices of floats:

** Intel **

16 threads (multi-threading)
eigen real        0.158446s     108.427 GFLOPS  (2.22212s)
mt speed up x5.55349 => 34.7093%

8 threads
eigen real        0.125598s     136.785 GFLOPS  (1.2581s)
mt speed up x7.0835 => 88.5438%

4 threads
eigen real        0.228977s     75.0287 GFLOPS  (2.37034s)
mt speed up x3.88544 => 97.136%

2 threads
eigen real        0.449604s     38.2111 GFLOPS  (4.72754s)
mt speed up x1.98317 => 99.1583%

1 thread
eigen mono cpu    0.891639s     19.2677 GFLOPS  (8.9178s)


a speed up factor of ~7 for 8 cores is a very nice scaling IMO.


** AMD **


1 thread
eigen mono cpu    1.54084s      11.1496 GFLOPS  (15.4136s)

2 threads
eigen real        0.817967s     21.0031 GFLOPS  (8.18607s)
mt speed up x1.88375 => 94.1874%

4 threads
eigen real        0.41879s      41.0226 GFLOPS  (4.1911s)
mt speed up x3.73174 => 93.2936%

8 threads
eigen real        0.214083s     80.2485 GFLOPS  (2.15697s)
mt speed up x7.49282 => 93.6602%

16 threads
eigen real        0.115521s     148.716 GFLOPS  (1.26385s)
mt speed up x13.4568 => 84.1048%

24 threads
eigen real        0.168208s     102.135 GFLOPS  (1.75357s)
mt speed up x9.55177 => 39.7991%

32 threads
eigen real        0.0686023s    250.427 GFLOPS  (1.19708s)
mt speed up x23.001 => 71.8781%

42 threads
eigen real        0.0799503s    214.882 GFLOPS  (0.938163s)
mt speed up x19.9015 => 47.3844%

48 threads
eigen real        0.143299s     119.888 GFLOPS  (1.62653s)
mt speed up x11.2097 => 23.3536%


We can see that AMD's SSE implementation is half the speed of Intel's
one. This architecture seems to be tricky to control because the peak
performance is obtained with 32 threads with a speed up factor of x23
that is not bad. With more threads the perf significantly drops down.
There is also a slow down with 24 threads.

that's all folks.

gael



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/