Re: [eigen] benchmarks for large matrices?

[ Thread Index | Date Index | More Archives ]

On Wed, Feb 18, 2009 at 3:40 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/2/17 David Roundy <daveroundy@xxxxxxxxx>:
>> Hello eigen folks,
>> I recently took a look at the very impressive benchmarks shown at:
>> and had a couple of questions.  First, would it be a problem to extend these
>> benchmarks to larger matrices?
> A user benchmarked 4000x4000 matrix products against MKL:
> His result was the same that we got in our benchmark for 1000x1000:
> Namely, Eigen is at about 2/3 of the speed of MKL.

yes, a matrix size of 1000^2 is enough to reach the peak performance,
and stress caches as most than you can. So basically, there is no need
to bench larger matrices, the curves should be flat.

>> multiplication of two (21,000 by 100) matrices (I just took these numbers
>> from a recent calculation), multiplied such that the result is a small,
>> square matrix.
> Ah, we dont have benchmarks for that.

indeed we don't have such benchmark but I'm 95% sure that Eigen's
matrix product will perform well in this case. your matrix is large
enough to take advantage of all the caching mechanism.

>> Relating to that, I was curious as to the options given to ATLAS.  If it's
>> running without sse2 support compiled in, then it should be compared with
>> eigen2_novec, and is quite competitive and sometimes wins.  However, if it
>> was compiled with sse2 support, then eigen2 appears much more impressive.
> I cant speak for Gael who did all the benchmarking (as well as all the
> optimization of matrix products), but, as written on this benchmark
> page, his architecture is x86-64, and on this architecture SSE2 is
> enabled by default (because all x86-64 cpus support SSE2). So I would
> be very surprised if his ATLAS didn't use SSE2.

yes, I can guarantee that SSE2 is used because:
 1 - without SSE the limit for my CPU would be 4 GFLOPS while ATLAS
reach 5 on the benchmark,
 2 - the atlas lib is full of addps / mulps

I have to say, those poor results of ATLAS puzzle me because all
benchmarks I can see show that ATLAS is close to MKL... However they
never show the MKL version, and show only relative percentages....
Actually, the caching mechanism of ATLAS seems to be much better than
mine because 1) valgrind just told me that !, and 2) the performance
of Eigen's matrix product does not increase with a faster similar CPU
while ATLAS does. So my feeling is that, in the contrary of Eigen,
ATLAS low level kernels are poorly written (unaligned memory accesses,
bad register caching, etc.). Said that, I still hope to be able to
reach MKL performance by improving the use of caches, and I know how
to do it, I just have to find time...


Mail converted by MHonArc 2.6.19+