Re: [eigen] benchmarking clang, part 1 |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
In data venerdì 09 luglio 2010 08:04:01, Gael Guennebaud ha scritto:
> by default bench_gemm is currently configured for complex<double>, so
> the results you get are not really meaningful.
Oops, indeed. Here are the results with double. It's more like i thought it would be.
> Also, for gcc, -O2 is usually much better than -O3.
Yes, i usually use -O2, but the examples i copy/pasted inside bench_multi_compilers.sh where using -O3 !
All tests are now done with -O2
>
> Also:
> L1 = 64 KB
> L2/L3 cache size = 512 KB
> large L1 but very small L2, what is your CPU ?
/proc/cpuinfo reports "AMD Athlon(tm) II X4 620"
I think it's the one described on http://en.wikipedia.org/wiki/List_of_AMD_Phenom_microprocessors (search for "X4 620"). L2 is 512k for _each_ core (i dont know if it's usual or not)
Thomas
--
Thomas Capricelli <orzel@xxxxxxxxxxxxxxx>
http://www.freehackers.org/thomas
orzel@berlioz hg/eigen/bench% ./bench_multi_compilers.sh basicbench.cxxlist bench_gemm.cpp
/home/orzel/svn/llvm/Release/bin/clang++ -O2 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 2 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 16 x 1024
eigen cpu 6.42778s 2.67275 GFLOPS (13.0109s)
eigen real 6.43462s 2.66991 GFLOPS (13.0245s)
/home/orzel/svn/llvm/Release/bin/clang++ -O2 -DNDEBUG -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 4 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 32 x 512
../bench_multi_compilers.sh: line 28: 3555 Segmentation fault (core dumped) ./.bench 2> /dev/null
g++-4.4.4 -O2 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 2 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 16 x 1024
eigen cpu 5.28475s 3.25084 GFLOPS (10.8045s)
eigen real 5.28508s 3.25063 GFLOPS (10.8053s)
g++-4.4.4 -O2 -DNDEBUG -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 4 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 32 x 512
eigen cpu 2.70565s 6.34964 GFLOPS (5.55716s)
eigen real 2.70594s 6.34895 GFLOPS (5.57305s)
g++-4.5.0 -O2 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 2 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 16 x 1024
eigen cpu 5.1539s 3.33337 GFLOPS (10.3333s)
eigen real 5.1568s 3.3315 GFLOPS (10.3599s)
g++-4.5.0 -O2 -DNDEBUG -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 4 x 4
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 32 x 512
eigen cpu 2.65614s 6.46799 GFLOPS (5.33532s)
eigen real 2.65809s 6.46324 GFLOPS (5.33979s)