[eigen] Re: SGEMM benchmark result against ATLAS

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2010/8/24 Francesco Callari <fgcallari@xxxxxxxxx>:
> Hi Benoit,
> a few questions:
> 1. Are you building your own ATLAS, or running a a prebuilt one?

Pre-built. Using fedora 13 package.

> 2. If building, could you please post the output of 'make time'? It's the
> last step in the usual build sequence and  compares the speed ATLAS achieves
> on your machine with the comparable one it was configured with-
> 3. Are you running ATLAS single- or multi-threaded? Easy to see: if you
> linked with libatlas.a it is single, if libptatlas.a it's multi.

I linked with atlas/libcblas.so.

> 4. Could you also please time dgemm?

Will try when I find time..!

Benoit

> Thanks
> Franco
>
> On Tue, Aug 24, 2010 at 11:07 AM, Keir Mierle <mierle@xxxxxxxxx> wrote:
>>
>> A question for Benoit: Is this running the threaded of eigen and atlas?
>> Keir
>>
>> On Tue, Aug 24, 2010 at 10:52 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> wrote:
>>>
>>> I too have atlas 3.8.3, and am using gcc 4.4 on linux x86-64. So I
>>> can't really conclude anything, sorry.
>>> Benoit
>>>
>>> 2010/8/24 Francesco Callari <fgcallari@xxxxxxxxx>:
>>> > Hmmm, I think this is the info I can share:
>>> > ATLAS build configuration.
>>> > ====================
>>> > ATLAS v3.8.3
>>> > GCC 4.<redacted>
>>> > GLIBC 2.<redacted>
>>> > Configuration flags: 64-bit build using the chosen gcc for everything
>>> > compiler.
>>> > cc=${TOP}/bin/gcc
>>> > f77=${TOP}/bin/gfortran
>>> > mhz=<redacted>
>>> >
>>> > ./configure \
>>> >     -C xc ${cc} -C gc ${cc} -C ic ${cc} -C dm ${cc} -C sm ${cc} \
>>> >     -C dk ${cc} -C sk ${cc} \
>>> >     -C if ${f77} \
>>> >     -b 64 \
>>> >     -D c -DPentiumCPS=${mhz}
>>> >
>>> >
>>> >
>>> > On Tue, Aug 24, 2010 at 10:39 AM, Franco Callari <fgc@xxxxxxxxxx>
>>> > wrote:
>>> >>
>>> >>
>>> >> ---------- Forwarded message ----------
>>> >> From: Keir Mierle <mierle@xxxxxxxxx>
>>> >> Date: Tue, Aug 24, 2010 at 1:19 AM
>>> >> Subject: Fwd: SGEMM benchmark result against ATLAS
>>> >>
>>> >>
>>> >> Hey, care to forward any info about how you configured ATLAS?
>>> >>
>>> >> ---------- Forwarded message ----------
>>> >> From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>>> >> Date: Mon, Aug 23, 2010 at 8:45 PM
>>> >> Subject: SGEMM benchmark result against ATLAS
>>> >> To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
>>> >> Cc: Keir Mierle <mierle@xxxxxxxxx>, Gael Guennebaud
>>> >> <gael.guennebaud@xxxxxxxxx>
>>> >>
>>> >>
>>> >> Hi,
>>> >>
>>> >> Hearing from Keir that he saw untuned ATLAS outperform us by a 30%
>>> >> margin,
>>> >> which would be very unusual, I ran our benchBlasGemm a bit. By the
>>> >> way, I
>>> >> updated it to make it compile, which involved removing the
>>> >> eigen_..._normal
>>> >> path which didn't look useful (?), hope it's OK. Also, it was missing
>>> >> a
>>> >> extern "C" around the cblas #include.
>>> >>
>>> >> So I installed the most optimized ATLAS package that I could on
>>> >> Fedora,
>>> >> built with SSE3.
>>> >>
>>> >> I compiled our benchmark with:
>>> >>
>>> >> cd eigen/bench/
>>> >> g++ -O3 -msse3 -I.. -L /usr/lib64/atlas/ benchBlasGemm.cpp  -o
>>> >> benchBlasGemm -lrt -lcblas
>>> >>
>>> >> And ran it on some 4096x4096 matrices:
>>> >>
>>> >> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>>> >> 4096 x 4096 x 4096
>>> >> cblas: 8.73982 (7.862 GFlops/s)
>>> >> eigen : 8.9491 (7.678 GFlops/s)
>>> >> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>>> >> 4096 x 4096 x 4096
>>> >> cblas: 8.51913 (8.066 GFlops/s)
>>> >> eigen : 8.42922 (8.152 GFlops/s)
>>> >>
>>> >> So _my_ results show Eigen3 and ATLAS running at the same speed
>>> >> roughly,
>>> >> albeit with a great variability.
>>> >>
>>> >> This is still perplexing for 2 reasons:
>>> >>  - we used to beat ATLAS by a wide margin.
>>> >>  - the roughly 8 GFlops here are not too good. My CPU is a Core i7 at
>>> >> 1.66
>>> >> GHz. So x4 (because of float) and x2 (pipelining of addps and mulps)
>>> >> we
>>> >> should aim at 13.33 GFlops. So we are running here at only 60% of the
>>> >> theoretical maximum; I think we used to do much better than that.
>>> >>
>>> >> So let me ask Gael and Keir:
>>> >> * Keir: what do you get on this benchmark? How did you get this result
>>> >> where ATLAS outperformed us by 30%?
>>> >> * Gael: suppose I want to get deeper into this, where do I start?
>>> >>
>>> >> Cheers,
>>> >> Benoit
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Francesco Callari <fgc@xxxxxxxxxx>
>>> >>
>>> >>             EC67 BEBE 62AC 8415 7591  2B12 A6CD D5EE D8CB D0ED
>>> >>
>>> >> Violence is the last refuge of the incompetent  (I. Asimov)
>>> >
>>> >
>>> >
>>> > --
>>> > Franco Callari <fgcallari@xxxxxxxxx>
>>> >
>>> >             EC67 BEBE 62AC 8415 7591  2B12 A6CD D5EE D8CB D0ED
>>> >
>>> > I am not bound to win, but I am bound to be true. I am not bound to
>>> > succeed,
>>> > but I am bound to live by the light that I have. (Abraham Lincoln)
>>> >
>>
>
>
>
> --
> Franco Callari <fgcallari@xxxxxxxxx>
>
>             EC67 BEBE 62AC 8415 7591  2B12 A6CD D5EE D8CB D0ED
>
> I am not bound to win, but I am bound to be true. I am not bound to succeed,
> but I am bound to live by the light that I have. (Abraham Lincoln)
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/