[eigen] benchmarking clang, part 1 |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Hi,
Since a recent patch from Gael (rev 82bf6ba3c9d5), clang was not able to compile eigen anymore. I've tried to fix that but my conclusion was that clang was wrong about it. So I've narrowed the problem to a single file with few lines and reported the problem to clang (http://llvm.org/bugs/show_bug.cgi?id=7587) : they have fixed the problem quickly. We were happy with the reactivity of the gcc team, and it seems that the clang people are doing well too :-)
They also added (a modification of) my narrowed code as a nonregression test, so we should really be ok with this in the future.
Anyway, now that clang is usable again, i'd like to benchmark it. I've noticed bench/README.txt and bench/bench_multi_compilers.sh
It seems that this code is quite old. basicbenchmark.cpp does not compile for example. I've fixed some of those, but the main interesting result, so far, is with using bench_gemm.cpp. Here are they (attached).
clang crashes when vectorization is enabled, and it has done so for long, i'm not surprised. I think they know about it and it should be fixed at some point in the future.
Tests for g++-4.5 are really strange. The non-vec test is faster than with g++-4.4 vectorized.. is that possible ? The test g++-4.5 vectorized is really slow... and surprises me as well. Do you have any idea why ?
Then, the clang test by itself is not really impressing.... sure :-)
regards,
Thomas
--
Thomas Capricelli <orzel@xxxxxxxxxxxxxxx>
http://www.freehackers.org/thomas
orzel@berlioz hg/eigen/bench% ./bench_multi_compilers.sh basicbench.cxxlist bench_gemm.cpp
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 2 x 2
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 8 x 1024
eigen cpu 31.3207s 0.548514 GFLOPS (63.5633s)
eigen real 31.3953s 0.547211 GFLOPS (63.7008s)
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 2 x 2
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 8 x 1024
../bench_multi_compilers.sh: line 28: 16060 Segmentation fault (core dumped) ./.bench 2> /dev/null
g++-4.4.4 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 2 x 2
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 8 x 1024
eigen cpu 25.8037s 0.665791 GFLOPS (52.288s)
eigen real 25.8053s 0.66575 GFLOPS (52.3121s)
g++-4.4.4 -O3 -DNDEBUG -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 2 x 2
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 8 x 1024
eigen cpu 21.5275s 0.798043 GFLOPS (47.1191s)
eigen real 21.5414s 0.797528 GFLOPS (47.1556s)
g++-4.5.0 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 2 x 2
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 8 x 1024
eigen cpu 20.1941s 0.850737 GFLOPS (42.8769s)
eigen real 20.3274s 0.845158 GFLOPS (43.019s)
g++-4.5.0 -O3 -DNDEBUG -lrt
L1 cache size = 64 KB
L2/L3 cache size = 512 KB
Register blocking = 2 x 2
Matrix sizes = 2048x2048 * 2048x2048
blocking size = 8 x 1024
eigen cpu 42.6044s 0.403242 GFLOPS (85.6438s)
eigen real 42.6085s 0.403203 GFLOPS (85.6619s)