[eigen] benchmarking clang, part 2

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Here are some more results from eigen/bench/*:

clang performs pretty well on benchVecAdd, and vectorization is actually slower with g++-4.4 (??).

Results for bench_reverse are difficult to read/analyse, but clang does not perform well. Here too, the use of vectorization make things mostly slower, with g++-4.5, though not with g++-4.4


I forgot to mention tests about speed of compilation / ram usage. I compiled a project of mine with both g++-4.5 and clang(current svn), both using '-pipe -ansi -msse3 -Wall -Wextra -O2' and 'time make':
	clang   : 158.721u  7.528s 2:48.30 98.7%  0+0k     0+62432io  0pf+0w
	g++-4.5 : 254.255u 13.820s 4:30.60 99.0%  0+0k 81136+ 6272io 16pf+0w
clang is really faster, and you can feel it :)

I could not find a way to measure 'ram usage', but i can tell you, watching at 'top', that clang does really better than g++.

regards,
Thomas
-- 
Thomas Capricelli <orzel@xxxxxxxxxxxxxxx>
http://www.freehackers.org/thomas
orzel@berlioz hg/eigen/bench% ./bench_multi_compilers.sh basicbench.cxxlist benchVecAdd.cpp
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
2.44146s  0.610339 GFlops

/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -lrt
../bench_multi_compilers.sh: line 28: 16924 Segmentation fault      (core dumped) ./.bench 2> /dev/null

g++-4.4.4 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
2.94578s  0.505847 GFlops

g++-4.4.4 -O3 -DNDEBUG -lrt
2.40424s  0.619787 GFlops

g++-4.5.0 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
3.03161s  0.491527 GFlops

g++-4.5.0 -O3 -DNDEBUG -lrt
2.47257s  0.602658 GFlops

orzel@berlioz hg/eigen/bench% ./bench_multi_compilers.sh basicbench.cxxlist benchmark.cpp
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002

real    0m1.833s
user    0m1.812s
sys     0m0.000s

/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002

real    0m1.750s
user    0m1.748s
sys     0m0.000s

g++-4.4.4 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002

real    0m1.292s
user    0m1.276s
sys     0m0.000s

g++-4.4.4 -O3 -DNDEBUG -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002

real    0m1.347s
user    0m1.336s
sys     0m0.000s

g++-4.5.0 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002

real    0m1.069s
user    0m1.056s
sys     0m0.000s

g++-4.5.0 -O3 -DNDEBUG -lrt
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002
1.0002 1.0002 1.0002

real    0m1.064s
user    0m1.056s
sys     0m0.000s


orzel@berlioz hg/eigen/bench% ./bench_multi_compilers.sh basicbench.cxxlist bench_reverse.cpp
/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
size            no sqrt                           standard
dyn   4 x 4     0.00339225s (471.663 MFLOPS)
dyn   16 x 1    0.00685774s (233.313 MFLOPS)
dyn   6 x 6     0.00532467s (676.098 MFLOPS)
dyn   36 x 1    0.0113004s (318.572 MFLOPS)
dyn   8 x 8     0.00854191s (749.247 MFLOPS)
dyn   64 x 1    0.0189229s (338.214 MFLOPS)
dyn   16 x 16   0.0398758s (641.993 MFLOPS)
dyn   256 x 1   0.0725934s (352.649 MFLOPS)
dyn   24 x 24   0.081053s (710.646 MFLOPS)
dyn   576 x 1   0.159585s (360.937 MFLOPS)
dyn   32 x 32   0.137402s (745.257 MFLOPS)
dyn   1024 x 1  0.282974s (361.871 MFLOPS)
dyn   49 x 49   0.305111s (786.926 MFLOPS)
dyn   2401 x 1  0.660054s (363.758 MFLOPS)
dyn   64 x 64   0.510809s (801.865 MFLOPS)
dyn   4096 x 1  1.13061s (362.283 MFLOPS)
dyn   128 x 128         1.9904s (823.152 MFLOPS)
dyn   16384 x 1         5.95667s (275.053 MFLOPS)
dyn   256 x 256         19.1453s (342.309 MFLOPS)
dyn   65536 x 1         24.546s (266.993 MFLOPS)
dyn   512 x 512         79.6244s (329.226 MFLOPS)
dyn   262144 x 1        100.907s (259.786 MFLOPS)
dyn   900 x 900         272.849s (296.867 MFLOPS)
dyn   810000 x 1        324.717s (249.448 MFLOPS)

/home/orzel/svn/llvm/Release/bin/clang++ -O3 -DNDEBUG -lrt
size            no sqrt                           standard
../bench_multi_compilers.sh: line 28: 17225 Segmentation fault      (core dumped) ./.bench 2> /dev/null

g++-4.4.4 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
size            no sqrt                           standard
dyn   4 x 4     0.00438854s (364.586 MFLOPS)
dyn   16 x 1    0.00241318s (663.026 MFLOPS)
dyn   6 x 6     0.0050179s (717.432 MFLOPS)
dyn   36 x 1    0.00471128s (764.124 MFLOPS)
dyn   8 x 8     0.0144499s (442.911 MFLOPS)
dyn   64 x 1    0.0142942s (447.734 MFLOPS)
dyn   16 x 16   0.0346651s (738.494 MFLOPS)
dyn   256 x 1   0.052478s (487.823 MFLOPS)
dyn   24 x 24   0.119559s (481.77 MFLOPS)
dyn   576 x 1   0.0667638s (862.742 MFLOPS)
dyn   32 x 32   0.114914s (891.103 MFLOPS)
dyn   1024 x 1  0.134462s (761.552 MFLOPS)
dyn   49 x 49   0.454647s (528.102 MFLOPS)
dyn   2401 x 1  0.276487s (868.395 MFLOPS)
dyn   64 x 64   0.414197s (988.902 MFLOPS)
dyn   4096 x 1  0.830039s (493.471 MFLOPS)
dyn   128 x 128         1.64026s (998.866 MFLOPS)
dyn   16384 x 1         2.34454s (698.816 MFLOPS)
dyn   256 x 256         18.4155s (355.873 MFLOPS)
dyn   65536 x 1         23.0011s (284.925 MFLOPS)
dyn   512 x 512         85.8939s (305.195 MFLOPS)
dyn   262144 x 1        77.528s (338.128 MFLOPS)
dyn   900 x 900         301.802s (268.387 MFLOPS)
dyn   810000 x 1        265.313s (305.3 MFLOPS)

g++-4.4.4 -O3 -DNDEBUG -lrt
size            no sqrt                           standard
dyn   4 x 4     0.00148596s (1076.74 MFLOPS)
dyn   16 x 1    0.00135985s (1176.6 MFLOPS)
dyn   6 x 6     0.0036389s (989.309 MFLOPS)
dyn   36 x 1    0.00340932s (1055.93 MFLOPS)
dyn   8 x 8     0.00873399s (732.769 MFLOPS)
dyn   64 x 1    0.00540127s (1184.91 MFLOPS)
dyn   16 x 16   0.0199676s (1282.08 MFLOPS)
dyn   256 x 1   0.0190172s (1346.15 MFLOPS)
dyn   24 x 24   0.0418322s (1376.93 MFLOPS)
dyn   576 x 1   0.0417535s (1379.52 MFLOPS)
dyn   32 x 32   0.0752364s (1361.04 MFLOPS)
dyn   1024 x 1  0.0760524s (1346.44 MFLOPS)
dyn   49 x 49   0.176386s (1361.22 MFLOPS)
dyn   2401 x 1  0.178905s (1342.06 MFLOPS)
dyn   64 x 64   0.301533s (1358.39 MFLOPS)
dyn   4096 x 1  0.358231s (1143.4 MFLOPS)
dyn   128 x 128         1.36444s (1200.79 MFLOPS)
dyn   16384 x 1         1.36642s (1199.05 MFLOPS)
dyn   256 x 256         18.6793s (350.849 MFLOPS)
dyn   65536 x 1         19.3172s (339.262 MFLOPS)
dyn   512 x 512         104.411s (251.07 MFLOPS)
dyn   262144 x 1        77.8001s (336.946 MFLOPS)
dyn   900 x 900         312.607s (259.111 MFLOPS)
dyn   810000 x 1        296.46s (273.224 MFLOPS)

g++-4.5.0 -O3 -DNDEBUG -DEIGEN_DONT_VECTORIZE -lrt
size            no sqrt                           standard
dyn   4 x 4     0.0035243s (453.991 MFLOPS)
dyn   16 x 1    0.0026187s (610.991 MFLOPS)
dyn   6 x 6     0.00559286s (643.678 MFLOPS)
dyn   36 x 1    0.00417522s (862.23 MFLOPS)
dyn   8 x 8     0.00877162s (729.626 MFLOPS)
dyn   64 x 1    0.0065499s (977.114 MFLOPS)
dyn   16 x 16   0.0388849s (658.354 MFLOPS)
dyn   256 x 1   0.0234392s (1092.19 MFLOPS)
dyn   24 x 24   0.0799013s (720.889 MFLOPS)
dyn   576 x 1   0.0496777s (1159.47 MFLOPS)
dyn   32 x 32   0.135532s (755.544 MFLOPS)
dyn   1024 x 1  0.0874438s (1171.04 MFLOPS)
dyn   49 x 49   0.302534s (793.63 MFLOPS)
dyn   2401 x 1  0.45267s (530.408 MFLOPS)
dyn   64 x 64   0.509065s (804.612 MFLOPS)
dyn   4096 x 1  0.341029s (1201.07 MFLOPS)
dyn   128 x 128         2.01086s (814.777 MFLOPS)
dyn   16384 x 1         1.53246s (1069.13 MFLOPS)
dyn   256 x 256         19.7178s (332.37 MFLOPS)
dyn   65536 x 1         25.6473s (255.528 MFLOPS)
dyn   512 x 512         78.2921s (334.828 MFLOPS)
dyn   262144 x 1        121s (216.648 MFLOPS)
dyn   900 x 900         290.862s (278.482 MFLOPS)
dyn   810000 x 1        240.644s (336.597 MFLOPS)

g++-4.5.0 -O3 -DNDEBUG -lrt
size            no sqrt                           standard
dyn   4 x 4     0.00367756s (435.071 MFLOPS)
dyn   16 x 1    0.00287286s (556.937 MFLOPS)
dyn   6 x 6     0.00846585s (425.238 MFLOPS)
dyn   36 x 1    0.0118372s (304.126 MFLOPS)
dyn   8 x 8     0.0138652s (461.588 MFLOPS)
dyn   64 x 1    0.0113374s (564.504 MFLOPS)
dyn   16 x 16   0.0510978s (501 MFLOPS)
dyn   256 x 1   0.0421342s (607.583 MFLOPS)
dyn   24 x 24   0.1145s (503.058 MFLOPS)
dyn   576 x 1   0.0940034s (612.744 MFLOPS)
dyn   32 x 32   0.211483s (484.199 MFLOPS)
dyn   1024 x 1  0.256332s (399.481 MFLOPS)
dyn   49 x 49   0.479354s (500.882 MFLOPS)
dyn   2401 x 1  0.393788s (609.719 MFLOPS)
dyn   64 x 64   0.808482s (506.629 MFLOPS)
dyn   4096 x 1  0.665215s (615.74 MFLOPS)
dyn   128 x 128         3.49944s (468.189 MFLOPS)
dyn   16384 x 1         2.86992s (570.887 MFLOPS)
dyn   256 x 256         21.0292s (311.642 MFLOPS)
dyn   65536 x 1         20.9608s (312.66 MFLOPS)
dyn   512 x 512         86.1382s (304.33 MFLOPS)
dyn   262144 x 1        81.1934s (322.864 MFLOPS)
dyn   900 x 900         395.549s (204.779 MFLOPS)
dyn   810000 x 1        369.77s (219.055 MFLOPS)



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/