Re: [eigen] benchmarks for large matrices? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] benchmarks for large matrices?
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Wed, 18 Feb 2009 16:56:17 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=vpCzcdxYYXkUxrzwBdI4Lt4i8bVejr6Lu+5wnE8U/cU=; b=IvmHat7eXWGIV7vFYlTIG+sLNvayZGYDpKyhvgOj7nAq2oyiWuYbbXgsn/wMWMJWD+ LbvCWyCYZ/paP85h7lDEU/WQSW2kxg3XCcyQCrUws8y0esM9kXSK06jHQyPxEwBhm2ft l/Gh7V2UyEN5ID/TDsjbZRPLnrVgaxZDqiIFM=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=bocAqctjnbo6Y+mdDwgEmZbxuwyPryDP2E/AGcdgV/GLqY4QR+LniSo9Y74WW/6YJm OKhQkrcOUWEulseLxPU8eC044Kuft5tLlx6UpkRDQ7XaYuGvNkJjLe94ZBsfBbTYNTgT JbBEJTjHhAmr0VwZ2ILed+4Smy3+gKKWgE2Ak=
yep, actually I've just tried to compile the latest ATLAS myself even
though it seems to be a bit faster than the older one I used for the
benchmark, Eigen is still faster, especially for non multiple of 4
matrix sizes.
I attached a small benchmark that you can easily try:
compilation:
g++ -O2 -ffast-math -DNDEBUG gemm.cpp -latlas -lcblas -o gemm
then:
time ./gemm
and I get:
eigen: 0.79 s
ATLAS: 1.28s
MKL: 0.44
In each case I used a single thread, my CPU is:
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz
so the peak performance is 21 GFLOPS, MKL reach ~18.2GFLOPS, eigen
~10.2, and Atlas ~6.25
On Wed, Feb 18, 2009 at 4:37 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/2/18 David Roundy <daveroundy@xxxxxxxxx>:
>> If you're using an ATLAS tuned for a
>> machine with a larger cache, it'd be no surprise that you'd get poor
>> numbers...
>
> I wouldn't expect that, because Gael's CPU is a Core 2 duo T7200, and
> those have 4 MB of cache.
>
> Benoit
>
>
>
#include "Eigen/Array"
using namespace Eigen;
extern "C" {
#include <cblas.h>
void sgemm_(const char *transa, const char *transb, const int *m, const int *n, const int *k,
const float *alpha, const float *a, const int *lda, const float *b, const int *ldb,
const float *beta, float *c, const int *ldc);
}
EIGEN_DONT_INLINE void eigenprod(const MatrixXf& a, const MatrixXf& b, MatrixXf& c)
{
c += a * b;
}
EIGEN_DONT_INLINE void blasprod(const MatrixXf& a, const MatrixXf& b, MatrixXf& c)
{
static const float fone = 1;
static const float fzero = 0;
static const char notrans = 'N';
static const char trans = 'T';
static const char nonunit = 'N';
static const char lower = 'L';
static const int intone = 1;
int N = a.rows();
cblas_sgemm(CblasColMajor,CblasNoTrans,CblasNoTrans,N,N,N,1.0,a.data(),N,b.data(),N,0.0,c.data(),N);
//sgemm_(¬rans,¬rans,&N,&N,&N,&fone,a.data(),&N,b.data(),&N,&fzero,c.data(),&N);
}
int main(int argc, char ** argv)
{
MatrixXf a = MatrixXf::Ones(1257,1257);
MatrixXf b = MatrixXf::Ones(1257,1257);
MatrixXf c = MatrixXf::Ones(1257,1257);
for (int k=0; k<2; ++k)
{
blasprod(a,b,c);
//eigenprod(a,b,c);
}
return 0;
}