Re: [eigen] a branch for SMP (openmp) experimentations

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] a branch for SMP (openmp) experimentations
From: Aron Ahmadia <aja2111@xxxxxxxxxxxx>
Date: Fri, 26 Feb 2010 23:17:54 +0300
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type; bh=m3UpMlW4CF2eDqSaTqlXSXVaOWXmLy0YZjjYMKly1EI=; b=F3c9W8N2OMRgN8Br95ktO6NyFbg+HcZWLm6WzsIXmQrJjVcR7z7ckbbycbUwIGAb22 rAKuUTY4N6r+yrKcMUisH19P2F11Ea27PnpuFtSuHhZ1oWUCiHTsAk3ZI481p2DKAHXu NwwG0ZtliLVjBcUI68Iq3uH5O0dwHLjioRQmQ=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=Vl+mOaYcvOk6kIwYztHBgdafL3iAu6mDId97ehKdFIYqetURfJVQ/nNqBqzEAdK7zu HnE3v9nukC574E/8ZISQbfHbDcqZyTCO3jAnARY/QUMzRIkqlswCLjJY2IPEOrRq3Sl+ LYv0I8Os4pdweWjsAjxS23wqDHlWIb9pnrT64=

Some nice bench results coming off the X5550 @ 2.67GHz

(single-precision)
[aron@kw2050]~/sandbox/eigen-smp/bench% g++ bench_gemm.cpp -DNDEBUG
-DHAVE_BLAS -I.. -O2 -fopenmp -lrt -lblas -o ./bench  && /usr/bin/time
-p ./bench
blas  cpu   0.133795s  	2.00632 GFLOPS 	(14.0114s)
blas  real  0.133813s  	2.00605 GFLOPS 	(13.3878s)
eigen cpu   0.0191605s  	14.0098 GFLOPS 	(1.92616s)
eigen real  0.0024013s  	111.787 GFLOPS 	(0.241387s)
real 13.79
user 16.08
sys 0.13

For whatever reason, the BLAS isn't built multi-threaded, but its
performance is pretty terrible even single-threaded.  If these numbers
are to be believed, Gael's multi-threaded multiply scales with 99.7%
efficiency on the X5550, averaging 2.6/4 SIMD fused multiply-add
operations per cycle in single precision.

(double-precision)
[aron@kw2050]~/sandbox/eigen-smp/bench% g++ bench_gemm.cpp -DNDEBUG
-DHAVE_BLAS -I.. -O2 -fopenmp -lrt -lblas -o ./bench  && /usr/bin/time
-p ./bench
Warning, your parallel product is crap! <I need to fix this>

blas  cpu   0.13462s  	1.99402 GFLOPS 	(14.0937s)
blas  real  0.134625s  	1.99395 GFLOPS 	(13.4901s)
eigen cpu   0.0363907s  	7.37649 GFLOPS 	(3.70925s)
eigen real  0.00455555s  	58.925 GFLOPS 	(0.465924s)
real 14.11
user 17.95
sys 0.11

Again, near-perfect scaling, and eigen is averaging 1.4/2 SIMD fused
multiply-add operations per cycle in double precision.

I'll look at this more later this week, and I'd like to more carefully
verify these numbers since they're pretty astonishing to me.  Gael,
I'm happy to give you an honorary A+ in my Parallel Computing
Paradigms course if these are legit.

A

On Fri, Feb 26, 2010 at 4:26 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
>
> Thank you for link too :)
>
> And to entertain everybody following our adventures, here are the mandatory
> pictures:
>
> * single core: http://dl.dropbox.com/u/260133/matrix_matrix.pdf
> * quad cores: http://dl.dropbox.com/u/260133/matrix_matrix-smp.pdf
>
> gael
>
>
> On Fri, Feb 26, 2010 at 1:02 PM, Aron Ahmadia <aja2111@xxxxxxxxxxxx> wrote:
>>
>> Those are some good notes, thanks Frank.
>>
>> It's easy to get confused there because he's assuming a distributed
>> memory layout, but still, that might be a useful technique to try and
>> apply.
>>
>> A
>>
>> On Fri, Feb 26, 2010 at 2:57 PM, FMDSPAM <fmdspam@xxxxxxxxx> wrote:
>> > Am 26.02.2010 11:28, schrieb Aron Ahmadia:
>> >
>> > <snip>
>> >
>> > Okay, this might be a bit tricky, so forgive me if I'm
>> > over-complicating things, can we introduce another subdivision?:
>> >
>> >
>> >
>> > Forgive me my shameless plug. A short discussion on that topic I've
>> > found
>> > some day here .
>> > Most of what he is discussing, and what you are doing, are bejond my
>> > skills.
>> > but possible it helps.
>> >
>> > Frank.
>> >
>> >
>>
>>
>
>

Follow-Ups:
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud

References:
- [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Aron Ahmadia
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Aron Ahmadia
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Aron Ahmadia
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: FMDSPAM
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Aron Ahmadia
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] Eigen3 ->Eigen2 performance regression: patch.
Next by Date: [eigen] intiial ARM NEON results
Previous by thread: Re: [eigen] a branch for SMP (openmp) experimentations
Next by thread: Re: [eigen] a branch for SMP (openmp) experimentations

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/