Re: [eigen] Performance gap between gcc and msvc ?

[ Thread Index | Date Index | More Archives ]

On Fri, Jun 18, 2010 at 10:07 AM, Hauke Heibel
<hauke.heibel@xxxxxxxxxxxxxx> wrote:
> On Fri, Jun 18, 2010 at 9:32 AM,  <vincent.lejeune@xxxxxxxxxx> wrote:
>> i've done some performance comparaison between windows and linux, using
>> the blocked qr function.
>> I was using a Core i5 with 3gb memory, and I ran the decomposition on
>> 2048x2048 double random matrix on 2 operating system :
>> - The first one is an opensuse 11.3 RC1 64 bits, shipped with gcc 4.5. I
>> got the computation done in 10s in release mode (that is, with -O3)
>> - The second one is Windows 7 64 bits, using Visual C++ 2010 express. It
>> ships with the 32 bits version of the compiler, and I've heard that some
>> feature like openMP are disabled. However, the computation was done in 6s
>> with release mode...
>> I've got something like a 40% performance drop for gcc in comparaison to
>> VC++ 2010. I've heard that gcc generated code was marginally slower than
>> MSVC one in some case, but 40% is not something negligible in my opinion..
> Typically it is vice versa, i.e. normally GCC produces faster code. In
> particular 32bit builds with MSVC are rather bad since the register
> handling of MSVC's 32bit compiler is far from optimal. So you really
> seem to be missing some important flag for GCC. Could it be that you
> still have debug symbols enabled?

I'm also surprised by your results because with Eigen we always found
that GCC outperformed MSVC.

On my computer (Core2, 2.66GHz, 64bits system, gcc 4.4), and with many
compilations in the background, I get the following timings (block
size = 128):

2048^2, float : 1.1 sec
2048^2, double : 2.5 sec

the compilation command:

g++-4.4 -I.. -O3 -lrt -DNDEBUG bench_qr.cpp && ./a.out

the test program:

#include <Eigen/QR>
#include "BenchTimer.h"
#include <iostream>

using namespace Eigen;
int main()
  typedef MatrixXd Mat;
  int s = 2048;
  Mat m = Mat::Random(s,s);
  BenchTimer t;
  HouseholderQR<Mat> qr(m);
  BENCH(t, 4, 1, qr.compute(m));
  std::cout << t.value() << "s\n";

With -O3 it is a bit slower. Actually, with Eigen the recommended
flags are simply -O2 -DNDEBUG for a 64bits system, and add -msse2 for
a 32 bits system to enable SSE optimizations.

Regarding openmp, with gcc you can enable it with -fopenmp, however
here it seems there is no gain because the blocks are too small...


>> On another note I ran qr decomposition for a 2048x2048 random matrix under
>> scilab on windows, because scilab ships with a (binary only) mkl on
>> windows. The computation is done in 2s.
>> I think that the difference may be explained by MSVC disabling openMP on
>> express version of the compiler, as Core i5 does have 4 logical core
>> (2physical+2 Hyperthreaded I think), hence a performance improvement. I
>> would like to know if Eigen does use openMP feature on matrix product,
>> simultineously with vectorisation feature.
> OpenMP is always (!) disabled per default on MSVC as is vectorization
> (only under 32bit builds). For 64bit builds vectorization always takes
> place. OpenMP can be enabled under "Properties -> C/C++ -> Language ->
> Open MP Support". SSE is enabled via "Properties -> C/C++ -> Code
> Generation -> Enable Enhanced Instruction Set". Regarding the
> simultaneous usage, it is possible to use OpenMP and SSE.
> Regards,
> Hauke

Mail converted by MHonArc 2.6.19+