Re: [eigen] SGEMM benchmark result against ATLAS

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On an i7, the peak flop measurement is not straightforward because of
turbo-mode.

On Tue, Aug 24, 2010 at 9:15 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> Hi,
>
> Hearing from Keir that he saw untuned ATLAS outperform us by a 30% margin,
> which would be very unusual, I ran our benchBlasGemm a bit. By the way, I
> updated it to make it compile, which involved removing the eigen_..._normal
> path which didn't look useful (?), hope it's OK. Also, it was missing a
> extern "C" around the cblas #include.
>
> So I installed the most optimized ATLAS package that I could on Fedora,
> built with SSE3.
>
> I compiled our benchmark with:
>
> cd eigen/bench/
> g++ -O3 -msse3 -I.. -L /usr/lib64/atlas/ benchBlasGemm.cpp  -o benchBlasGemm
> -lrt -lcblas
>
> And ran it on some 4096x4096 matrices:
>
> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
> 4096 x 4096 x 4096
> cblas: 8.73982 (7.862 GFlops/s)
> eigen : 8.9491 (7.678 GFlops/s)
> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
> 4096 x 4096 x 4096
> cblas: 8.51913 (8.066 GFlops/s)
> eigen : 8.42922 (8.152 GFlops/s)
>
> So _my_ results show Eigen3 and ATLAS running at the same speed roughly,
> albeit with a great variability.
>
> This is still perplexing for 2 reasons:
>  - we used to beat ATLAS by a wide margin.
>  - the roughly 8 GFlops here are not too good. My CPU is a Core i7 at 1..66
> GHz. So x4 (because of float) and x2 (pipelining of addps and mulps) we
> should aim at 13.33 GFlops. So we are running here at only 60% of the
> theoretical maximum; I think we used to do much better than that.
>
> So let me ask Gael and Keir:
> * Keir: what do you get on this benchmark? How did you get this result where
> ATLAS outperformed us by 30%?
> * Gael: suppose I want to get deeper into this, where do I start?
>
> Cheers,
> Benoit
>



-- 
Rohit Garg

http://rpg-314.blogspot.com/



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/