> Hmmm, I think this is the info I can share:
> ATLAS build configuration.
> ====================
> ATLAS v3.8.3
> GCC 4.<redacted>
> GLIBC 2.<redacted>
> Configuration flags: 64-bit build using the chosen gcc for everything
> compiler.
> cc=${TOP}/bin/gcc
> f77=${TOP}/bin/gfortran
> mhz=<redacted>
>
> ./configure \
> -C xc ${cc} -C gc ${cc} -C ic ${cc} -C dm ${cc} -C sm ${cc} \
> -C dk ${cc} -C sk ${cc} \
> -C if ${f77} \
> -b 64 \
> -D c -DPentiumCPS=${mhz}
>
>
>
> On Tue, Aug 24, 2010 at 10:39 AM, Franco Callari <
fgc@xxxxxxxxxx> wrote:
>>
>>
>> ---------- Forwarded message ----------
>> From: Keir Mierle <
mierle@xxxxxxxxx>
>> Date: Tue, Aug 24, 2010 at 1:19 AM
>> Subject: Fwd: SGEMM benchmark result against ATLAS
>>
>>
>> Hey, care to forward any info about how you configured ATLAS?
>>
>> ---------- Forwarded message ----------
>> From: Benoit Jacob <
jacob.benoit.1@xxxxxxxxx>
>> Date: Mon, Aug 23, 2010 at 8:45 PM
>> Subject: SGEMM benchmark result against ATLAS
>> To: eigen <
eigen@xxxxxxxxxxxxxxxxxxx>
>> Cc: Keir Mierle <
mierle@xxxxxxxxx>, Gael Guennebaud
>> <
gael.guennebaud@xxxxxxxxx>
>>
>>
>> Hi,
>>
>> Hearing from Keir that he saw untuned ATLAS outperform us by a 30% margin,
>> which would be very unusual, I ran our benchBlasGemm a bit. By the way, I
>> updated it to make it compile, which involved removing the eigen_...._normal
>> path which didn't look useful (?), hope it's OK. Also, it was missing a
>> extern "C" around the cblas #include.
>>
>> So I installed the most optimized ATLAS package that I could on Fedora,
>> built with SSE3.
>>
>> I compiled our benchmark with:
>>
>> cd eigen/bench/
>> g++ -O3 -msse3 -I.. -L /usr/lib64/atlas/ benchBlasGemm.cpp -o
>> benchBlasGemm -lrt -lcblas
>>
>> And ran it on some 4096x4096 matrices:
>>
>> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>> 4096 x 4096 x 4096
>> cblas: 8.73982 (7.862 GFlops/s)
>> eigen : 8.9491 (7.678 GFlops/s)
>> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>> 4096 x 4096 x 4096
>> cblas: 8.51913 (8.066 GFlops/s)
>> eigen : 8.42922 (8.152 GFlops/s)
>>
>> So _my_ results show Eigen3 and ATLAS running at the same speed roughly,
>> albeit with a great variability.
>>
>> This is still perplexing for 2 reasons:
>> - we used to beat ATLAS by a wide margin.
>> - the roughly 8 GFlops here are not too good. My CPU is a Core i7 at 1.66
>> GHz. So x4 (because of float) and x2 (pipelining of addps and mulps) we
>> should aim at 13.33 GFlops. So we are running here at only 60% of the
>> theoretical maximum; I think we used to do much better than that.
>>
>> So let me ask Gael and Keir:
>> * Keir: what do you get on this benchmark? How did you get this result
>> where ATLAS outperformed us by 30%?
>> * Gael: suppose I want to get deeper into this, where do I start?
>>
>> Cheers,
>> Benoit
>>
>>
>>
>>
>> --
>> Francesco Callari <
fgc@xxxxxxxxxxx>
>>
>> EC67 BEBE 62AC 8415 7591 2B12 A6CD D5EE D8CB D0ED
>>
>> Violence is the last refuge of the incompetent (I. Asimov)
>
>
>
> --
> Franco Callari <
fgcallari@xxxxxxxxx>
>
> EC67 BEBE 62AC 8415 7591 2B12 A6CD D5EE D8CB D0ED
>
> I am not bound to win, but I am bound to be true. I am not bound to succeed,
> but I am bound to live by the light that I have. (Abraham Lincoln)
>