Re: [eigen] benchmarks for large matrices?

[ Thread Index | Date Index | More Archives ]

yep, actually I've just tried to compile the latest ATLAS myself even
though it seems to be a bit faster than the older one I used for the
benchmark, Eigen is still faster, especially for non multiple of 4
matrix sizes.

I attached a small benchmark that you can easily try:


g++ -O2 -ffast-math -DNDEBUG gemm.cpp -latlas -lcblas -o gemm


time ./gemm

and I get:

eigen: 0.79 s
ATLAS: 1.28s
MKL: 0.44

In each case I used a single thread, my CPU is:
Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz

so the peak performance is 21 GFLOPS, MKL reach ~18.2GFLOPS, eigen
~10.2, and Atlas ~6.25

On Wed, Feb 18, 2009 at 4:37 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/2/18 David Roundy <daveroundy@xxxxxxxxx>:
>> If you're using an ATLAS tuned for a
>> machine with a larger cache, it'd be no surprise that you'd get poor
>> numbers...
> I wouldn't expect that, because Gael's CPU is a Core 2 duo T7200, and
> those have 4 MB of cache.
> Benoit
#include "Eigen/Array"
using namespace Eigen;

extern "C" {
#include <cblas.h>

void sgemm_(const char *transa, const char *transb, const int *m, const int *n, const int *k,
           const float *alpha, const float *a, const int *lda, const float *b, const int *ldb,
           const float *beta, float *c, const int *ldc);


EIGEN_DONT_INLINE void eigenprod(const MatrixXf& a, const MatrixXf& b, MatrixXf& c)
  c += a * b;

EIGEN_DONT_INLINE void blasprod(const MatrixXf& a, const MatrixXf& b, MatrixXf& c)
  static const float fone = 1;
  static const float fzero = 0;
  static const char notrans = 'N';
  static const char trans = 'T';
  static const char nonunit = 'N';
  static const char lower = 'L';
  static const int intone = 1;

  int N = a.rows();

int main(int argc, char ** argv)
  MatrixXf a = MatrixXf::Ones(1257,1257);
  MatrixXf b = MatrixXf::Ones(1257,1257);
  MatrixXf c = MatrixXf::Ones(1257,1257);
  for (int k=0; k<2; ++k)
  return 0;

Mail converted by MHonArc 2.6.19+