Re: [eigen] Re: SGEMM benchmark result against ATLAS |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Re: SGEMM benchmark result against ATLAS
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Thu, 2 Sep 2010 15:00:05 +0200
- Cc: Francesco Callari <fgcallari@xxxxxxxxx>
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=nljRmHA8kPZSrC9U+zXLrSt1/CA/92JdHZFRqwbrlM8=; b=fQqu3n9/jEX8+5Wsxu+j2WvkchfzKVy7if0JjVaVXmrvf3wZfRZm5yDpS3cT4zIohS Tkn47XnHIApmSl4i0YGAfG04EoeTlGhxT5w8+fLReKayFwuv/Gb205QNvf/30Vm/quBR N0HyTeoFf+lxCYwAS6rFb78AMgFRySWnbT/Jk=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=uy+I92RKemtbxxE0FzGEhMzGmtpCMxycpMSlBNz3qEvdKC8iU1qRFRK7/YDn2jywfi 9QIDHRRhj4QW65DkDISOKg/Vn7B2aqcJ02hqzpM108Doe94XXA8q+x6bs6RbWsPIcGrx cm1PFOokiSEhyerFnS70M61WFBITKV+QhMmaU=
here are my today results (relative efficiency compared to theoretical
max peak performance)
Intel(R) Xeon(R) CPU E5540 @ 2.53GHz (iCore 7)
float : 85%
double: 85%
Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz (second version of core2)
float: 88%
double: 78%
I used the exact same executables on both computer (compiled with gcc
4.5). I don't know why doubles are so slow on the latter since I don't
remember of such a behavior...
GCC 4.3 produces slightly slower code (~83% of the peak perf).
gael.
On Thu, Sep 2, 2010 at 2:34 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> (Francesco -- I forgot to CC you in the email I just sent about DGEMM.
> Just mentioning as you probably don't read every email in this
> list...)
>
> I just checked in a debugger, only 1 thread is used by ATLAS too (we
> already knew that for Eigen).
>
> I am linking with -lf77blas.
>
> Benoit
>
> 2010/8/24 Francesco Callari <fgcallari@xxxxxxxxx>:
>> Hi Benoit,
>> a few questions:
>> 1. Are you building your own ATLAS, or running a a prebuilt one?
>> 2. If building, could you please post the output of 'make time'? It's the
>> last step in the usual build sequence and compares the speed ATLAS achieves
>> on your machine with the comparable one it was configured with-
>> 3. Are you running ATLAS single- or multi-threaded? Easy to see: if you
>> linked with libatlas.a it is single, if libptatlas.a it's multi.
>> 4. Could you also please time dgemm?
>> Thanks
>> Franco
>>
>> On Tue, Aug 24, 2010 at 11:07 AM, Keir Mierle <mierle@xxxxxxxxx> wrote:
>>>
>>> A question for Benoit: Is this running the threaded of eigen and atlas?
>>> Keir
>>>
>>> On Tue, Aug 24, 2010 at 10:52 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>>> wrote:
>>>>
>>>> I too have atlas 3.8.3, and am using gcc 4.4 on linux x86-64. So I
>>>> can't really conclude anything, sorry.
>>>> Benoit
>>>>
>>>> 2010/8/24 Francesco Callari <fgcallari@xxxxxxxxx>:
>>>> > Hmmm, I think this is the info I can share:
>>>> > ATLAS build configuration.
>>>> > ====================
>>>> > ATLAS v3.8.3
>>>> > GCC 4.<redacted>
>>>> > GLIBC 2.<redacted>
>>>> > Configuration flags: 64-bit build using the chosen gcc for everything
>>>> > compiler.
>>>> > cc=${TOP}/bin/gcc
>>>> > f77=${TOP}/bin/gfortran
>>>> > mhz=<redacted>
>>>> >
>>>> > ./configure \
>>>> > -C xc ${cc} -C gc ${cc} -C ic ${cc} -C dm ${cc} -C sm ${cc} \
>>>> > -C dk ${cc} -C sk ${cc} \
>>>> > -C if ${f77} \
>>>> > -b 64 \
>>>> > -D c -DPentiumCPS=${mhz}
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Aug 24, 2010 at 10:39 AM, Franco Callari <fgc@xxxxxxxxxx>
>>>> > wrote:
>>>> >>
>>>> >>
>>>> >> ---------- Forwarded message ----------
>>>> >> From: Keir Mierle <mierle@xxxxxxxxx>
>>>> >> Date: Tue, Aug 24, 2010 at 1:19 AM
>>>> >> Subject: Fwd: SGEMM benchmark result against ATLAS
>>>> >>
>>>> >>
>>>> >> Hey, care to forward any info about how you configured ATLAS?
>>>> >>
>>>> >> ---------- Forwarded message ----------
>>>> >> From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>>>> >> Date: Mon, Aug 23, 2010 at 8:45 PM
>>>> >> Subject: SGEMM benchmark result against ATLAS
>>>> >> To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
>>>> >> Cc: Keir Mierle <mierle@xxxxxxxxx>, Gael Guennebaud
>>>> >> <gael.guennebaud@xxxxxxxxx>
>>>> >>
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> Hearing from Keir that he saw untuned ATLAS outperform us by a 30%
>>>> >> margin,
>>>> >> which would be very unusual, I ran our benchBlasGemm a bit. By the
>>>> >> way, I
>>>> >> updated it to make it compile, which involved removing the
>>>> >> eigen_..._normal
>>>> >> path which didn't look useful (?), hope it's OK. Also, it was missing
>>>> >> a
>>>> >> extern "C" around the cblas #include.
>>>> >>
>>>> >> So I installed the most optimized ATLAS package that I could on
>>>> >> Fedora,
>>>> >> built with SSE3.
>>>> >>
>>>> >> I compiled our benchmark with:
>>>> >>
>>>> >> cd eigen/bench/
>>>> >> g++ -O3 -msse3 -I.. -L /usr/lib64/atlas/ benchBlasGemm.cpp -o
>>>> >> benchBlasGemm -lrt -lcblas
>>>> >>
>>>> >> And ran it on some 4096x4096 matrices:
>>>> >>
>>>> >> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>>>> >> 4096 x 4096 x 4096
>>>> >> cblas: 8.73982 (7.862 GFlops/s)
>>>> >> eigen : 8.9491 (7.678 GFlops/s)
>>>> >> [bjacob@cahouette bench]$ ./benchBlasGemm 4096
>>>> >> 4096 x 4096 x 4096
>>>> >> cblas: 8.51913 (8.066 GFlops/s)
>>>> >> eigen : 8.42922 (8.152 GFlops/s)
>>>> >>
>>>> >> So _my_ results show Eigen3 and ATLAS running at the same speed
>>>> >> roughly,
>>>> >> albeit with a great variability.
>>>> >>
>>>> >> This is still perplexing for 2 reasons:
>>>> >> - we used to beat ATLAS by a wide margin.
>>>> >> - the roughly 8 GFlops here are not too good. My CPU is a Core i7 at
>>>> >> 1.66
>>>> >> GHz. So x4 (because of float) and x2 (pipelining of addps and mulps)
>>>> >> we
>>>> >> should aim at 13.33 GFlops. So we are running here at only 60% of the
>>>> >> theoretical maximum; I think we used to do much better than that.
>>>> >>
>>>> >> So let me ask Gael and Keir:
>>>> >> * Keir: what do you get on this benchmark? How did you get this result
>>>> >> where ATLAS outperformed us by 30%?
>>>> >> * Gael: suppose I want to get deeper into this, where do I start?
>>>> >>
>>>> >> Cheers,
>>>> >> Benoit
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Francesco Callari <fgc@xxxxxxxxxx>
>>>> >>
>>>> >> EC67 BEBE 62AC 8415 7591 2B12 A6CD D5EE D8CB D0ED
>>>> >>
>>>> >> Violence is the last refuge of the incompetent (I. Asimov)
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Franco Callari <fgcallari@xxxxxxxxx>
>>>> >
>>>> > EC67 BEBE 62AC 8415 7591 2B12 A6CD D5EE D8CB D0ED
>>>> >
>>>> > I am not bound to win, but I am bound to be true. I am not bound to
>>>> > succeed,
>>>> > but I am bound to live by the light that I have. (Abraham Lincoln)
>>>> >
>>>
>>
>>
>>
>> --
>> Franco Callari <fgcallari@xxxxxxxxx>
>>
>> EC67 BEBE 62AC 8415 7591 2B12 A6CD D5EE D8CB D0ED
>>
>> I am not bound to win, but I am bound to be true. I am not bound to succeed,
>> but I am bound to live by the light that I have. (Abraham Lincoln)
>>
>
>
>