Re: [eigen] Re: SGEMM benchmark result against ATLAS

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2010/9/2 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
> here are my today results (relative efficiency compared to theoretical
> max peak performance)
>
> Intel(R) Xeon(R) CPU E5540  @ 2.53GHz (iCore 7)
>
> float   : 85%
> double: 85%

Hm, it's amazing how much better your numbers are. I am using GCC 4.4.
Could it be that GCC 4.5 generates much better code? I was thinking
that your product code was low-level enough that it didn't matter...

Benoit

>
>
> Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz (second version of core2)
>
> float: 88%
> double: 78%
>
>
> I used the exact same executables on both computer (compiled with gcc
> 4.5). I don't know why doubles are so slow on the latter since I don't
> remember of such a behavior...
>
> GCC 4.3 produces slightly slower code (~83% of the peak perf).
>
> gael.
>
> On Thu, Sep 2, 2010 at 2:34 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> (Francesco -- I forgot to CC you in the email I just sent about DGEMM.
>> Just mentioning as you probably don't read every email in this
>> list...)
>>
>> I just checked in a debugger, only 1 thread is used by ATLAS too (we
>> already knew that for Eigen).
>>
>> I am linking with -lf77blas.
>>
>> Benoit
>>
>> 2010/8/24 Francesco Callari <fgcallari@xxxxxxxxx>:
>>> Hi Benoit,
>>> a few questions:
>>> 1. Are you building your own ATLAS, or running a a prebuilt one?
>>> 2. If building, could you please post the output of 'make time'? It's the
>>> last step in the usual build sequence and  compares the speed ATLAS achieves
>>> on your machine with the comparable one it was configured with-
>>> 3. Are you running ATLAS single- or multi-threaded? Easy to see: if you
>>> linked with libatlas.a it is single, if libptatlas.a it's multi.
>>> 4. Could you also please time dgemm?
>>> Thanks
>>> Franco
>>>
>>> On Tue, Aug 24, 2010 at 11:07 AM, Keir Mierle <mierle@xxxxxxxxx> wrote:
>>>>
>>>> A question for Benoit: Is this running the threaded of eigen and atlas?
>>>> Keir
>>>>
>>>> On Tue, Aug 24, 2010 at 10:52 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>>>> wrote:
>>>>>
>>>>> I too have atlas 3.8.3, and am using gcc 4.4 on linux x86-64. So I
>>>>> can't really conclude anything, sorry.
>>>>> Benoit
>>>>>
>>>>> 2010/8/24 Francesco Callari <fgcallari@xxxxxxxxx>:
>>>>> > Hmmm, I think this is the info I can share:
>>>>> > ATLAS build configuration.
>>>>> > ====================
>>>>> > ATLAS v3.8.3
>>>>> > GCC 4.<redacted>
>>>>> > GLIBC 2.<redacted>
>>>>> > Configuration flags: 64-bit build using the chosen gcc for everything
>>>>> > compiler.
>>>>> > cc=${TOP}/bin/gcc
>>>>> > f77=${TOP}/bin/gfortran
>>>>> > mhz=<redacted>
>>>>> >
>>>>> > ./configure \
>>>>> >     -C xc ${cc} -C gc ${cc} -C ic ${cc} -C dm ${cc} -C sm ${cc} \
>>>>> >     -C dk ${cc} -C sk ${cc} \
>>>>> >     -C if ${f77} \
>>>>> >     -b 64 \
>>>>> >     -D c -DPentiumCPS=${mhz}
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Tue, Aug 24, 2010 at 10:39 AM, Franco Callari <fgc@xxxxxxxxxx>
>>>>> > wrote:
>>>>> >>
>>>>> >>
>>>>> >> ---------- Forwarded message ----------
>>>>> >> From: Keir Mierle <mierle@xxxxxxxxx>
>>>>> >> Date: Tue, Aug 24, 2010 at 1:19 AM
>>>>> >> Subject: Fwd: SGEMM benchmark result against ATLAS
>>>>> >>
>>>>> >>
>>>>> >> Hey, care to forward any info about how you configured ATLAS?
>>>>> >>
>>>>> >> ---------- Forwarded message ----------
>>>>> >> From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>>>>> >> Date: Mon, Aug 23, 2010 at 8:45 PM
>>>>> >> Subject: SGEMM benchmark result against ATLAS
>>>>> >> To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
>>>>> >> Cc: Keir Mierle <mierle@xxxxxxxxx>, Gael Guennebaud
>>>>> >> <gael.guennebaud@xxxxxxxxx>
>>>>> >>
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >> Hearing from Keir that he saw untuned ATLAS outperform us by a 30%
>>>>> >> margin,
>>>>> >> which would be very unusual, I ran our benchBlasGemm a bit. By the
>>>>> >> way, I
>>>>> >> updated it to make it compile, which involved removing the
>>>>> >> eigen_..._normal
>>>>> >> path which didn't look useful (?), hope it's OK. Also, it was missing
>>>>> >> a
>>>>> >> extern "C" around the cblas #include.
>>>>> >>
>>>>> >> So I installed the most optimized ATLAS package that I could on
>>>>> >> Fedora,
>>>>> >> built with SSE3.
>>>>> >>
>>>>> >> I compiled our benchmark with:
>>>>> >>
>>>>> >> cd eigen/bench/
>>>>> >> g++ -O3 -msse3 -I.. -L /usr/lib64/atlas/ benchBlasGemm.cpp  -o
>>>>> >> benchBlasGemm -lrt -lcblas
>>>>> >>
>>>>> >> And ran it on some 4096x4096 matrices:
>>>>> >>
>>>>> >> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>>>>> >> 4096 x 4096 x 4096
>>>>> >> cblas: 8.73982 (7.862 GFlops/s)
>>>>> >> eigen : 8.9491 (7.678 GFlops/s)
>>>>> >> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>>>>> >> 4096 x 4096 x 4096
>>>>> >> cblas: 8.51913 (8.066 GFlops/s)
>>>>> >> eigen : 8.42922 (8.152 GFlops/s)
>>>>> >>
>>>>> >> So _my_ results show Eigen3 and ATLAS running at the same speed
>>>>> >> roughly,
>>>>> >> albeit with a great variability.
>>>>> >>
>>>>> >> This is still perplexing for 2 reasons:
>>>>> >>  - we used to beat ATLAS by a wide margin.
>>>>> >>  - the roughly 8 GFlops here are not too good. My CPU is a Core i7 at
>>>>> >> 1.66
>>>>> >> GHz. So x4 (because of float) and x2 (pipelining of addps and mulps)
>>>>> >> we
>>>>> >> should aim at 13.33 GFlops. So we are running here at only 60% of the
>>>>> >> theoretical maximum; I think we used to do much better than that.
>>>>> >>
>>>>> >> So let me ask Gael and Keir:
>>>>> >> * Keir: what do you get on this benchmark? How did you get this result
>>>>> >> where ATLAS outperformed us by 30%?
>>>>> >> * Gael: suppose I want to get deeper into this, where do I start?
>>>>> >>
>>>>> >> Cheers,
>>>>> >> Benoit
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Francesco Callari <fgc@xxxxxxxxxx>
>>>>> >>
>>>>> >>             EC67 BEBE 62AC 8415 7591  2B12 A6CD D5EE D8CB D0ED
>>>>> >>
>>>>> >> Violence is the last refuge of the incompetent  (I. Asimov)
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Franco Callari <fgcallari@xxxxxxxxx>
>>>>> >
>>>>> >             EC67 BEBE 62AC 8415 7591  2B12 A6CD D5EE D8CB D0ED
>>>>> >
>>>>> > I am not bound to win, but I am bound to be true. I am not bound to
>>>>> > succeed,
>>>>> > but I am bound to live by the light that I have. (Abraham Lincoln)
>>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Franco Callari <fgcallari@xxxxxxxxx>
>>>
>>>             EC67 BEBE 62AC 8415 7591  2B12 A6CD D5EE D8CB D0ED
>>>
>>> I am not bound to win, but I am bound to be true. I am not bound to succeed,
>>> but I am bound to live by the light that I have. (Abraham Lincoln)
>>>
>>
>>
>>
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/