[eigen] Re: SGEMM benchmark result against ATLAS

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


A question for Benoit: Is this running the threaded of eigen and atlas?

Keir

On Tue, Aug 24, 2010 at 10:52 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
I too have atlas 3.8.3, and am using gcc 4.4 on linux x86-64. So I
can't really conclude anything, sorry.
Benoit

2010/8/24 Francesco Callari <fgcallari@xxxxxxxxx>:
> Hmmm, I think this is the info I can share:
> ATLAS build configuration.
> ====================
> ATLAS v3.8.3
> GCC 4.<redacted>
> GLIBC 2.<redacted>
> Configuration flags: 64-bit build using the chosen gcc for everything
> compiler.
> cc=${TOP}/bin/gcc
> f77=${TOP}/bin/gfortran
> mhz=<redacted>
>
> ./configure \
>     -C xc ${cc} -C gc ${cc} -C ic ${cc} -C dm ${cc} -C sm ${cc} \
>     -C dk ${cc} -C sk ${cc} \
>     -C if ${f77} \
>     -b 64 \
>     -D c -DPentiumCPS=${mhz}
>
>
>
> On Tue, Aug 24, 2010 at 10:39 AM, Franco Callari <fgc@xxxxxxxxxx> wrote:
>>
>>
>> ---------- Forwarded message ----------
>> From: Keir Mierle <mierle@xxxxxxxxx>
>> Date: Tue, Aug 24, 2010 at 1:19 AM
>> Subject: Fwd: SGEMM benchmark result against ATLAS
>>
>>
>> Hey, care to forward any info about how you configured ATLAS?
>>
>> ---------- Forwarded message ----------
>> From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> Date: Mon, Aug 23, 2010 at 8:45 PM
>> Subject: SGEMM benchmark result against ATLAS
>> To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
>> Cc: Keir Mierle <mierle@xxxxxxxxx>, Gael Guennebaud
>> <gael.guennebaud@xxxxxxxxx>
>>
>>
>> Hi,
>>
>> Hearing from Keir that he saw untuned ATLAS outperform us by a 30% margin,
>> which would be very unusual, I ran our benchBlasGemm a bit. By the way, I
>> updated it to make it compile, which involved removing the eigen_...._normal
>> path which didn't look useful (?), hope it's OK. Also, it was missing a
>> extern "C" around the cblas #include.
>>
>> So I installed the most optimized ATLAS package that I could on Fedora,
>> built with SSE3.
>>
>> I compiled our benchmark with:
>>
>> cd eigen/bench/
>> g++ -O3 -msse3 -I.. -L /usr/lib64/atlas/ benchBlasGemm.cpp  -o
>> benchBlasGemm -lrt -lcblas
>>
>> And ran it on some 4096x4096 matrices:
>>
>> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>> 4096 x 4096 x 4096
>> cblas: 8.73982 (7.862 GFlops/s)
>> eigen : 8.9491 (7.678 GFlops/s)
>> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>> 4096 x 4096 x 4096
>> cblas: 8.51913 (8.066 GFlops/s)
>> eigen : 8.42922 (8.152 GFlops/s)
>>
>> So _my_ results show Eigen3 and ATLAS running at the same speed roughly,
>> albeit with a great variability.
>>
>> This is still perplexing for 2 reasons:
>>  - we used to beat ATLAS by a wide margin.
>>  - the roughly 8 GFlops here are not too good. My CPU is a Core i7 at 1.66
>> GHz. So x4 (because of float) and x2 (pipelining of addps and mulps) we
>> should aim at 13.33 GFlops. So we are running here at only 60% of the
>> theoretical maximum; I think we used to do much better than that.
>>
>> So let me ask Gael and Keir:
>> * Keir: what do you get on this benchmark? How did you get this result
>> where ATLAS outperformed us by 30%?
>> * Gael: suppose I want to get deeper into this, where do I start?
>>
>> Cheers,
>> Benoit
>>
>>
>>
>>
>> --
>> Francesco Callari <fgc@xxxxxxxxxxx>
>>
>>             EC67 BEBE 62AC 8415 7591  2B12 A6CD D5EE D8CB D0ED
>>
>> Violence is the last refuge of the incompetent  (I. Asimov)
>
>
>
> --
> Franco Callari <fgcallari@xxxxxxxxx>
>
>             EC67 BEBE 62AC 8415 7591  2B12 A6CD D5EE D8CB D0ED
>
> I am not bound to win, but I am bound to be true. I am not bound to succeed,
> but I am bound to live by the light that I have. (Abraham Lincoln)
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/