[eigen] benchmarking clang, part 2 |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Here are some more results from eigen/bench/*:
clang performs pretty well on benchVecAdd, and vectorization is actually slower with g++-4.4 (??).
Results for bench_reverse are difficult to read/analyse, but clang does not perform well. Here too, the use of vectorization make things mostly slower, with g++-4.5, though not with g++-4.4
I forgot to mention tests about speed of compilation / ram usage. I compiled a project of mine with both g++-4.5 and clang(current svn), both using '-pipe -ansi -msse3 -Wall -Wextra -O2' and 'time make':
clang : 158.721u 7.528s 2:48.30 98.7% 0+0k 0+62432io 0pf+0w
g++-4.5 : 254.255u 13.820s 4:30.60 99.0% 0+0k 81136+ 6272io 16pf+0w
clang is really faster, and you can feel it :)
I could not find a way to measure 'ram usage', but i can tell you, watching at 'top', that clang does really better than g++.
regards,
Thomas
--
Thomas Capricelli <orzel@xxxxxxxxxxxxxxx>
http://www.freehackers.org/thomas
orzel@berlioz hg/eigen/bench% ./bench_multi_compilers.sh basicbench.cxxlist benchVecAdd.cpp
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
2.44146s 0.610339 GFlops
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -lrt
../bench_multi_compilers.sh: line 28: 16924 Segmentation fault (core dumped) ./.bench 2> /dev/null
g++-4.4.4 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
2.94578s 0.505847 GFlops
g++-4.4.4 -O3 -DNDEBUG -lrt
2.40424s 0.619787 GFlops
g++-4.5.0 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
3.03161s 0.491527 GFlops
g++-4.5.0 -O3 -DNDEBUG -lrt
2.47257s 0.602658 GFlops
orzel@berlioz hg/eigen/bench% ./bench_multi_compilers.sh basicbench.cxxlist benchmark.cpp
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
real 0m1.833s
user 0m1.812s
sys 0m0.000s
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
real 0m1.750s
user 0m1.748s
sys 0m0.000s
g++-4.4.4 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
real 0m1.292s
user 0m1.276s
sys 0m0.000s
g++-4.4.4 -O3 -DNDEBUG -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
real 0m1.347s
user 0m1.336s
sys 0m0.000s
g++-4.5.0 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
real 0m1.069s
user 0m1.056s
sys 0m0.000s
g++-4.5.0 -O3 -DNDEBUG -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
real 0m1.064s
user 0m1.056s
sys 0m0.000s
orzel@berlioz hg/eigen/bench% ./bench_multi_compilers.sh basicbench.cxxlist bench_reverse.cpp
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
size no sqrt standard
dyn 4 x 4 0.00339225s (471.663 MFLOPS)
dyn 16 x 1 0.00685774s (233.313 MFLOPS)
dyn 6 x 6 0.00532467s (676.098 MFLOPS)
dyn 36 x 1 0.0113004s (318.572 MFLOPS)
dyn 8 x 8 0.00854191s (749.247 MFLOPS)
dyn 64 x 1 0.0189229s (338.214 MFLOPS)
dyn 16 x 16 0.0398758s (641.993 MFLOPS)
dyn 256 x 1 0.0725934s (352.649 MFLOPS)
dyn 24 x 24 0.081053s (710.646 MFLOPS)
dyn 576 x 1 0.159585s (360.937 MFLOPS)
dyn 32 x 32 0.137402s (745.257 MFLOPS)
dyn 1024 x 1 0.282974s (361.871 MFLOPS)
dyn 49 x 49 0.305111s (786.926 MFLOPS)
dyn 2401 x 1 0.660054s (363.758 MFLOPS)
dyn 64 x 64 0.510809s (801.865 MFLOPS)
dyn 4096 x 1 1.13061s (362.283 MFLOPS)
dyn 128 x 128 1.9904s (823.152 MFLOPS)
dyn 16384 x 1 5.95667s (275.053 MFLOPS)
dyn 256 x 256 19.1453s (342.309 MFLOPS)
dyn 65536 x 1 24.546s (266.993 MFLOPS)
dyn 512 x 512 79.6244s (329.226 MFLOPS)
dyn 262144 x 1 100.907s (259.786 MFLOPS)
dyn 900 x 900 272.849s (296.867 MFLOPS)
dyn 810000 x 1 324.717s (249.448 MFLOPS)
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -lrt
size no sqrt standard
../bench_multi_compilers.sh: line 28: 17225 Segmentation fault (core dumped) ./.bench 2> /dev/null
g++-4.4.4 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
size no sqrt standard
dyn 4 x 4 0.00438854s (364.586 MFLOPS)
dyn 16 x 1 0.00241318s (663.026 MFLOPS)
dyn 6 x 6 0.0050179s (717.432 MFLOPS)
dyn 36 x 1 0.00471128s (764.124 MFLOPS)
dyn 8 x 8 0.0144499s (442.911 MFLOPS)
dyn 64 x 1 0.0142942s (447.734 MFLOPS)
dyn 16 x 16 0.0346651s (738.494 MFLOPS)
dyn 256 x 1 0.052478s (487.823 MFLOPS)
dyn 24 x 24 0.119559s (481.77 MFLOPS)
dyn 576 x 1 0.0667638s (862.742 MFLOPS)
dyn 32 x 32 0.114914s (891.103 MFLOPS)
dyn 1024 x 1 0.134462s (761.552 MFLOPS)
dyn 49 x 49 0.454647s (528.102 MFLOPS)
dyn 2401 x 1 0.276487s (868.395 MFLOPS)
dyn 64 x 64 0.414197s (988.902 MFLOPS)
dyn 4096 x 1 0.830039s (493.471 MFLOPS)
dyn 128 x 128 1.64026s (998.866 MFLOPS)
dyn 16384 x 1 2.34454s (698.816 MFLOPS)
dyn 256 x 256 18.4155s (355.873 MFLOPS)
dyn 65536 x 1 23.0011s (284.925 MFLOPS)
dyn 512 x 512 85.8939s (305.195 MFLOPS)
dyn 262144 x 1 77.528s (338.128 MFLOPS)
dyn 900 x 900 301.802s (268.387 MFLOPS)
dyn 810000 x 1 265.313s (305.3 MFLOPS)
g++-4.4.4 -O3 -DNDEBUG -lrt
size no sqrt standard
dyn 4 x 4 0.00148596s (1076.74 MFLOPS)
dyn 16 x 1 0.00135985s (1176.6 MFLOPS)
dyn 6 x 6 0.0036389s (989.309 MFLOPS)
dyn 36 x 1 0.00340932s (1055.93 MFLOPS)
dyn 8 x 8 0.00873399s (732.769 MFLOPS)
dyn 64 x 1 0.00540127s (1184.91 MFLOPS)
dyn 16 x 16 0.0199676s (1282.08 MFLOPS)
dyn 256 x 1 0.0190172s (1346.15 MFLOPS)
dyn 24 x 24 0.0418322s (1376.93 MFLOPS)
dyn 576 x 1 0.0417535s (1379.52 MFLOPS)
dyn 32 x 32 0.0752364s (1361.04 MFLOPS)
dyn 1024 x 1 0.0760524s (1346.44 MFLOPS)
dyn 49 x 49 0.176386s (1361.22 MFLOPS)
dyn 2401 x 1 0.178905s (1342.06 MFLOPS)
dyn 64 x 64 0.301533s (1358.39 MFLOPS)
dyn 4096 x 1 0.358231s (1143.4 MFLOPS)
dyn 128 x 128 1.36444s (1200.79 MFLOPS)
dyn 16384 x 1 1.36642s (1199.05 MFLOPS)
dyn 256 x 256 18.6793s (350.849 MFLOPS)
dyn 65536 x 1 19.3172s (339.262 MFLOPS)
dyn 512 x 512 104.411s (251.07 MFLOPS)
dyn 262144 x 1 77.8001s (336.946 MFLOPS)
dyn 900 x 900 312.607s (259.111 MFLOPS)
dyn 810000 x 1 296.46s (273.224 MFLOPS)
g++-4.5.0 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
size no sqrt standard
dyn 4 x 4 0.0035243s (453.991 MFLOPS)
dyn 16 x 1 0.0026187s (610.991 MFLOPS)
dyn 6 x 6 0.00559286s (643.678 MFLOPS)
dyn 36 x 1 0.00417522s (862.23 MFLOPS)
dyn 8 x 8 0.00877162s (729.626 MFLOPS)
dyn 64 x 1 0.0065499s (977.114 MFLOPS)
dyn 16 x 16 0.0388849s (658.354 MFLOPS)
dyn 256 x 1 0.0234392s (1092.19 MFLOPS)
dyn 24 x 24 0.0799013s (720.889 MFLOPS)
dyn 576 x 1 0.0496777s (1159.47 MFLOPS)
dyn 32 x 32 0.135532s (755.544 MFLOPS)
dyn 1024 x 1 0.0874438s (1171.04 MFLOPS)
dyn 49 x 49 0.302534s (793.63 MFLOPS)
dyn 2401 x 1 0.45267s (530.408 MFLOPS)
dyn 64 x 64 0.509065s (804.612 MFLOPS)
dyn 4096 x 1 0.341029s (1201.07 MFLOPS)
dyn 128 x 128 2.01086s (814.777 MFLOPS)
dyn 16384 x 1 1.53246s (1069.13 MFLOPS)
dyn 256 x 256 19.7178s (332.37 MFLOPS)
dyn 65536 x 1 25.6473s (255.528 MFLOPS)
dyn 512 x 512 78.2921s (334.828 MFLOPS)
dyn 262144 x 1 121s (216.648 MFLOPS)
dyn 900 x 900 290.862s (278.482 MFLOPS)
dyn 810000 x 1 240.644s (336.597 MFLOPS)
g++-4.5.0 -O3 -DNDEBUG -lrt
size no sqrt standard
dyn 4 x 4 0.00367756s (435.071 MFLOPS)
dyn 16 x 1 0.00287286s (556.937 MFLOPS)
dyn 6 x 6 0.00846585s (425.238 MFLOPS)
dyn 36 x 1 0.0118372s (304.126 MFLOPS)
dyn 8 x 8 0.0138652s (461.588 MFLOPS)
dyn 64 x 1 0.0113374s (564.504 MFLOPS)
dyn 16 x 16 0.0510978s (501 MFLOPS)
dyn 256 x 1 0.0421342s (607.583 MFLOPS)
dyn 24 x 24 0.1145s (503.058 MFLOPS)
dyn 576 x 1 0.0940034s (612.744 MFLOPS)
dyn 32 x 32 0.211483s (484.199 MFLOPS)
dyn 1024 x 1 0.256332s (399.481 MFLOPS)
dyn 49 x 49 0.479354s (500.882 MFLOPS)
dyn 2401 x 1 0.393788s (609.719 MFLOPS)
dyn 64 x 64 0.808482s (506.629 MFLOPS)
dyn 4096 x 1 0.665215s (615.74 MFLOPS)
dyn 128 x 128 3.49944s (468.189 MFLOPS)
dyn 16384 x 1 2.86992s (570.887 MFLOPS)
dyn 256 x 256 21.0292s (311.642 MFLOPS)
dyn 65536 x 1 20.9608s (312.66 MFLOPS)
dyn 512 x 512 86.1382s (304.33 MFLOPS)
dyn 262144 x 1 81.1934s (322.864 MFLOPS)
dyn 900 x 900 395.549s (204.779 MFLOPS)
dyn 810000 x 1 369.77s (219.055 MFLOPS)