Re: [eigen] a branch for SMP (openmp) experimentations |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] a branch for SMP (openmp) experimentations*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Sat, 27 Feb 2010 15:48:41 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=RuTzVrd2hG4/yKcpQJXvR7U2z2qONUh3lu7Eu09KjPU=; b=vL7b41gPa87On0h/hkX88ElkoatInajFI7ffvTHZihGz0nrlOtpSrS8VKixzp4dP8w pII21MTwvYlfG2iYtZag2wW2kwk172PLgWnSJfhXwIzy9SfJQck1ucriJjDGg1iJ8f/O mDbX9GAHWkB8I5qVPM49FzTXMpfcnfFUOBv7k=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=jImodriopF0FCUHoojnpvtXxRT3oIIAlQrRbOgwheGIdAndqYeE5ciu1HjDaDri+BL P6tGp2L2OJ6ONCaYxtjYHsetNhy8gmNgfNzaJUeqvquU0kU6d8+Ux6JID+ulixqIWrnW ek0UfM2mHhpRt85AVn1Z3qpfl/n/2kBjmUyQ8=

hi,

nice results!

however, in order to estimate the efficiency wrt the number of threads, you should run it with OMP_NUM_THREADS=1, and not use the CPU time returned in the multithreaded case which is meaningless in this case. Then I expect a ratio much lower than 99% !

basically, if mono threaded then use the "cpu" time, otherwise use the "real" time.

gael

On Fri, Feb 26, 2010 at 9:17 PM, Aron Ahmadia <aja2111@xxxxxxxxxxxx> wrote:

Some nice bench results coming off the X5550 @ 2.67GHz

(single-precision)

[aron@kw2050]~/sandbox/eigen-smp/bench% g++ bench_gemm.cpp -DNDEBUG

-DHAVE_BLAS -I.. -O2 -fopenmp -lrt -lblas -o ./bench && /usr/bin/time

-p ./bench

blas cpu 0.133795s 2.00632 GFLOPS (14.0114s)

blas real 0.133813s 2.00605 GFLOPS (13.3878s)

eigen cpu 0.0191605s 14.0098 GFLOPS (1.92616s)

eigen real 0.0024013s 111.787 GFLOPS (0.241387s)

real 13.79

user 16.08

sys 0.13

For whatever reason, the BLAS isn't built multi-threaded, but its

performance is pretty terrible even single-threaded. If these numbers

are to be believed, Gael's multi-threaded multiply scales with 99.7%

efficiency on the X5550, averaging 2.6/4 SIMD fused multiply-add

operations per cycle in single precision.

(double-precision)

[aron@kw2050]~/sandbox/eigen-smp/bench% g++ bench_gemm.cpp -DNDEBUG

-DHAVE_BLAS -I.. -O2 -fopenmp -lrt -lblas -o ./bench && /usr/bin/time

-p ./bench

Warning, your parallel product is crap! <I need to fix this>

blas cpu 0.13462s 1.99402 GFLOPS (14.0937s)

blas real 0.134625s 1.99395 GFLOPS (13.4901s)

eigen cpu 0.0363907s 7.37649 GFLOPS (3.70925s)

eigen real 0.00455555s 58.925 GFLOPS (0.465924s)

real 14.11

user 17.95

sys 0.11

Again, near-perfect scaling, and eigen is averaging 1.4/2 SIMD fused

multiply-add operations per cycle in double precision.

I'll look at this more later this week, and I'd like to more carefully

verify these numbers since they're pretty astonishing to me. Gael,

I'm happy to give you an honorary A+ in my Parallel Computing

Paradigms course if these are legit.

A

On Fri, Feb 26, 2010 at 4:26 PM, Gael Guennebaud

> Thank you for link too :)

>

> And to entertain everybody following our adventures, here are the mandatory

> pictures:

>

> * single core: http://dl.dropbox.com/u/260133/matrix_matrix.pdf

> * quad cores: http://dl.dropbox.com/u/260133/matrix_matrix-smp..pdf

>

> gael

>

>

> On Fri, Feb 26, 2010 at 1:02 PM, Aron Ahmadia <aja2111@xxxxxxxxxxxx> wrote:

>>

>> Those are some good notes, thanks Frank.

>>

>> It's easy to get confused there because he's assuming a distributed

>> memory layout, but still, that might be a useful technique to try and

>> apply.

>>

>> A

>>

>> On Fri, Feb 26, 2010 at 2:57 PM, FMDSPAM <fmdspam@xxxxxxxxx> wrote:

>> > Am 26.02.2010 11:28, schrieb Aron Ahmadia:

>> >

>> > <snip>

>> >

>> > Okay, this might be a bit tricky, so forgive me if I'm

>> > over-complicating things, can we introduce another subdivision?:

>> >

>> >

>> >

>> > Forgive me my shameless plug. A short discussion on that topic I've

>> > found

>> > some day here .

>> > Most of what he is discussing, and what you are doing, are bejond my

>> > skills.

>> > but possible it helps.

>> >

>> > Frank.

>> >

>> >

>>

>>

>

>

**Follow-Ups**:**Re: [eigen] a branch for SMP (openmp) experimentations***From:*Aron Ahmadia

**References**:**[eigen] a branch for SMP (openmp) experimentations***From:*Gael Guennebaud

**Re: [eigen] a branch for SMP (openmp) experimentations***From:*Gael Guennebaud

**Re: [eigen] a branch for SMP (openmp) experimentations***From:*Aron Ahmadia

**Re: [eigen] a branch for SMP (openmp) experimentations***From:*Gael Guennebaud

**Re: [eigen] a branch for SMP (openmp) experimentations***From:*Gael Guennebaud

**Re: [eigen] a branch for SMP (openmp) experimentations***From:*Aron Ahmadia

**Re: [eigen] a branch for SMP (openmp) experimentations***From:*FMDSPAM

**Re: [eigen] a branch for SMP (openmp) experimentations***From:*Aron Ahmadia

**Re: [eigen] a branch for SMP (openmp) experimentations***From:*Gael Guennebaud

**Re: [eigen] a branch for SMP (openmp) experimentations***From:*Aron Ahmadia

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] intiial ARM NEON results** - Next by Date:
**Re: [eigen] portable reallocation...** - Previous by thread:
**Re: [eigen] a branch for SMP (openmp) experimentations** - Next by thread:
**Re: [eigen] a branch for SMP (openmp) experimentations**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |