Re: [eigen] benchmarks for large matrices?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


yep, actually I've just tried to compile the latest ATLAS myself even
though it seems to be a bit faster than the older one I used for the
benchmark, Eigen is still faster, especially for non multiple of 4
matrix sizes.

I attached a small benchmark that you can easily try:

compilation:

g++ -O2 -ffast-math -DNDEBUG gemm.cpp -latlas -lcblas -o gemm

then:


time ./gemm

and I get:

eigen: 0.79 s
ATLAS: 1.28s
MKL: 0.44

In each case I used a single thread, my CPU is:
Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz

so the peak performance is 21 GFLOPS, MKL reach ~18.2GFLOPS, eigen
~10.2, and Atlas ~6.25



On Wed, Feb 18, 2009 at 4:37 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/2/18 David Roundy <daveroundy@xxxxxxxxx>:
>> If you're using an ATLAS tuned for a
>> machine with a larger cache, it'd be no surprise that you'd get poor
>> numbers...
>
> I wouldn't expect that, because Gael's CPU is a Core 2 duo T7200, and
> those have 4 MB of cache.
>
> Benoit
>
>
>
#include "Eigen/Array"
using namespace Eigen;

extern "C" {
#include <cblas.h>

void sgemm_(const char *transa, const char *transb, const int *m, const int *n, const int *k,
           const float *alpha, const float *a, const int *lda, const float *b, const int *ldb,
           const float *beta, float *c, const int *ldc);

}

EIGEN_DONT_INLINE void eigenprod(const MatrixXf& a, const MatrixXf& b, MatrixXf& c)
{
  c += a * b;
}

EIGEN_DONT_INLINE void blasprod(const MatrixXf& a, const MatrixXf& b, MatrixXf& c)
{
  static const float fone = 1;
  static const float fzero = 0;
  static const char notrans = 'N';
  static const char trans = 'T';
  static const char nonunit = 'N';
  static const char lower = 'L';
  static const int intone = 1;

  int N = a.rows();
  cblas_sgemm(CblasColMajor,CblasNoTrans,CblasNoTrans,N,N,N,1.0,a.data(),N,b.data(),N,0.0,c.data(),N);
  //sgemm_(&notrans,&notrans,&N,&N,&N,&fone,a.data(),&N,b.data(),&N,&fzero,c.data(),&N);
}

int main(int argc, char ** argv)
{
  MatrixXf a = MatrixXf::Ones(1257,1257);
  MatrixXf b = MatrixXf::Ones(1257,1257);
  MatrixXf c = MatrixXf::Ones(1257,1257);
  
  for (int k=0; k<2; ++k)
  {
    blasprod(a,b,c);
    //eigenprod(a,b,c);
  }
  return 0;
}


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/