Re: [eigen] benchmarking clang, part 1

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


In data venerdì 09 luglio 2010 08:04:01, Gael Guennebaud ha scritto:
> by default bench_gemm is currently configured for complex<double>, so
> the results you get are not really meaningful. 

Oops, indeed. Here are the results with double. It's more like i thought it would be.

> Also, for gcc, -O2 is usually much better than -O3.

Yes, i usually use -O2, but the examples i copy/pasted inside bench_multi_compilers.sh where using -O3 !
All tests are now done with -O2

> 
> Also:
> L1 = 64 KB
> L2/L3 cache size  = 512 KB
> large L1 but very small L2, what is your CPU ?

/proc/cpuinfo reports "AMD Athlon(tm) II X4 620"

I think it's the one described on http://en.wikipedia.org/wiki/List_of_AMD_Phenom_microprocessors (search for "X4 620"). L2 is 512k for _each_ core (i dont know if it's usual or not)

Thomas
-- 
Thomas Capricelli <orzel@xxxxxxxxxxxxxxx>
http://www.freehackers.org/thomas

orzel@berlioz hg/eigen/bench% ./bench_multi_compilers.sh basicbench.cxxlist  bench_gemm.cpp
/home/orzel/svn/llvm/Release/bin/clang++ -O2 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
L1 cache size     = 64 KB
L2/L3 cache size  = 512 KB
Register blocking = 2 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 16 x 1024
eigen cpu         6.42778s      2.67275 GFLOPS  (13.0109s)
eigen real        6.43462s      2.66991 GFLOPS  (13.0245s)

/home/orzel/svn/llvm/Release/bin/clang++ -O2 -DNDEBUG -lrt
L1 cache size     = 64 KB
L2/L3 cache size  = 512 KB
Register blocking = 4 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 32 x 512
../bench_multi_compilers.sh: line 28:  3555 Segmentation fault      (core dumped) ./.bench 2> /dev/null

g++-4.4.4 -O2 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
L1 cache size     = 64 KB
L2/L3 cache size  = 512 KB
Register blocking = 2 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 16 x 1024
eigen cpu         5.28475s      3.25084 GFLOPS  (10.8045s)
eigen real        5.28508s      3.25063 GFLOPS  (10.8053s)

g++-4.4.4 -O2 -DNDEBUG -lrt
L1 cache size     = 64 KB
L2/L3 cache size  = 512 KB
Register blocking = 4 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 32 x 512
eigen cpu         2.70565s      6.34964 GFLOPS  (5.55716s)
eigen real        2.70594s      6.34895 GFLOPS  (5.57305s)

g++-4.5.0 -O2 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
L1 cache size     = 64 KB
L2/L3 cache size  = 512 KB
Register blocking = 2 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 16 x 1024
eigen cpu         5.1539s       3.33337 GFLOPS  (10.3333s)
eigen real        5.1568s       3.3315 GFLOPS   (10.3599s)

g++-4.5.0 -O2 -DNDEBUG -lrt
L1 cache size     = 64 KB
L2/L3 cache size  = 512 KB
Register blocking = 4 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 32 x 512
eigen cpu         2.65614s      6.46799 GFLOPS  (5.33532s)
eigen real        2.65809s      6.46324 GFLOPS  (5.33979s)



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/