Re: [eigen] Performance gap between gcc and msvc ? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
In fact after some investigation, I found out that core i5 does perform
very baddly with gcc produced code (there is some benchmark on phoronix.com
that show this).
It is actually slower than a 8years old Pentium 4 that does the
computation in 4s.
I've tried with several optimisation flags (graphite, loop
parallelisation...) with no success. I always get a 10 s computation, at
best I get 9,39s.
I might try with icc later...
On Fri, 18 Jun 2010 11:22:59 +0200, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> On Fri, Jun 18, 2010 at 10:07 AM, Hauke Heibel
> <hauke.heibel@xxxxxxxxxxxxxx> wrote:
>> On Fri, Jun 18, 2010 at 9:32 AM, <vincent.lejeune@xxxxxxxxxx> wrote:
>>> i've done some performance comparaison between windows and linux,
using
>>> the blocked qr function.
>>> I was using a Core i5 with 3gb memory, and I ran the decomposition on
>>> 2048x2048 double random matrix on 2 operating system :
>>> - The first one is an opensuse 11.3 RC1 64 bits, shipped with gcc 4.5.
I
>>> got the computation done in 10s in release mode (that is, with -O3)
>>> - The second one is Windows 7 64 bits, using Visual C++ 2010 express.
It
>>> ships with the 32 bits version of the compiler, and I've heard that
some
>>> feature like openMP are disabled. However, the computation was done in
>>> 6s
>>> with release mode...
>>
>>> I've got something like a 40% performance drop for gcc in comparaison
to
>>> VC++ 2010. I've heard that gcc generated code was marginally slower
than
>>> MSVC one in some case, but 40% is not something negligible in my
>>> opinion.
>>
>> Typically it is vice versa, i.e. normally GCC produces faster code. In
>> particular 32bit builds with MSVC are rather bad since the register
>> handling of MSVC's 32bit compiler is far from optimal. So you really
>> seem to be missing some important flag for GCC. Could it be that you
>> still have debug symbols enabled?
>
> I'm also surprised by your results because with Eigen we always found
> that GCC outperformed MSVC.
>
> On my computer (Core2, 2.66GHz, 64bits system, gcc 4.4), and with many
> compilations in the background, I get the following timings (block
> size = 128):
>
> 2048^2, float : 1.1 sec
> 2048^2, double : 2.5 sec
>
> the compilation command:
>
> g++-4.4 -I.. -O3 -lrt -DNDEBUG bench_qr.cpp && ./a.out
>
> the test program:
>
> #include <Eigen/QR>
> #include "BenchTimer.h"
> #include <iostream>
>
> using namespace Eigen;
> int main()
> {
> typedef MatrixXd Mat;
> int s = 2048;
> Mat m = Mat::Random(s,s);
> BenchTimer t;
> HouseholderQR<Mat> qr(m);
> BENCH(t, 4, 1, qr.compute(m));
> std::cout << t.value() << "s\n";
> }
>
> With -O3 it is a bit slower. Actually, with Eigen the recommended
> flags are simply -O2 -DNDEBUG for a 64bits system, and add -msse2 for
> a 32 bits system to enable SSE optimizations.
>
> Regarding openmp, with gcc you can enable it with -fopenmp, however
> here it seems there is no gain because the blocks are too small...
>
> gael
>
>>
>>> On another note I ran qr decomposition for a 2048x2048 random matrix
>>> under
>>> scilab on windows, because scilab ships with a (binary only) mkl on
>>> windows. The computation is done in 2s.
>>
>>> I think that the difference may be explained by MSVC disabling openMP
on
>>> express version of the compiler, as Core i5 does have 4 logical core
>>> (2physical+2 Hyperthreaded I think), hence a performance improvement.
I
>>> would like to know if Eigen does use openMP feature on matrix product,
>>> simultineously with vectorisation feature.
>>
>> OpenMP is always (!) disabled per default on MSVC as is vectorization
>> (only under 32bit builds). For 64bit builds vectorization always takes
>> place. OpenMP can be enabled under "Properties -> C/C++ -> Language ->
>> Open MP Support". SSE is enabled via "Properties -> C/C++ -> Code
>> Generation -> Enable Enhanced Instruction Set". Regarding the
>> simultaneous usage, it is possible to use OpenMP and SSE.
>>
>> Regards,
>> Hauke
>>
>>
>>