Re: [eigen] a branch for SMP (openmp) experimentations |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] a branch for SMP (openmp) experimentations
- From: Aron Ahmadia <aja2111@xxxxxxxxxxxx>
- Date: Sat, 27 Feb 2010 18:51:23 +0300
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type:content-transfer-encoding; bh=nXdXqIdHkI3ivCRwfzUk6wSwn/rH74S2StNAuddb0YU=; b=GQGonoKdj2emx+R1oCcLu/Q68bZoiHQt9smykX1+HX+SgmrAaORR+ftR1QbMgDGi0e Jdo9OiTcPm3TulkX+VDSuwP53g7Mw2Q37Nqm6lAV7JwQmdSfH+X+yxihIaBnux0lr4/3 qj/RlDQ4BP/GJM4mcUSp/k/JTEJFGEWiUE2qM=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=s6URwlDe4ZLtInm4RAkePYMHx13KGkim5Dxz629WuXiPybxDbDjNu5VD/KMPOFzB9I PgQqiP99fSVdzzYt05KLlLri6ajNEw/Lme8mX12Hzi1ZyJLeQZzo5jE/xz0+d1AcASN8 awRIw0xNuIlFXS3t4czQI1n+LGhdiBm+JXGBU=
Ahh, this makes much more sense, I was just trying to figure out what
I was doing wrong...
Unfortunately, this machine is occupied for the next 3 days, so I
can't get reliable numbers out of it until then :(
A
On Sat, Feb 27, 2010 at 5:48 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
>
> hi,
>
> nice results!
>
> however, in order to estimate the efficiency wrt the number of threads, you
> should run it with OMP_NUM_THREADS=1, and not use the CPU time returned in
> the multithreaded case which is meaningless in this case. Then I expect a
> ratio much lower than 99% !
>
> basically, if mono threaded then use the "cpu" time, otherwise use the
> "real" time.
>
> gael
>
> On Fri, Feb 26, 2010 at 9:17 PM, Aron Ahmadia <aja2111@xxxxxxxxxxxx> wrote:
>>
>> Some nice bench results coming off the X5550 @ 2.67GHz
>>
>> (single-precision)
>> [aron@kw2050]~/sandbox/eigen-smp/bench% g++ bench_gemm.cpp -DNDEBUG
>> -DHAVE_BLAS -I.. -O2 -fopenmp -lrt -lblas -o ./bench && /usr/bin/time
>> -p ./bench
>> blas cpu 0.133795s 2.00632 GFLOPS (14.0114s)
>> blas real 0.133813s 2.00605 GFLOPS (13.3878s)
>> eigen cpu 0.0191605s 14.0098 GFLOPS (1.92616s)
>> eigen real 0.0024013s 111.787 GFLOPS (0.241387s)
>> real 13.79
>> user 16.08
>> sys 0.13
>>
>> For whatever reason, the BLAS isn't built multi-threaded, but its
>> performance is pretty terrible even single-threaded. If these numbers
>> are to be believed, Gael's multi-threaded multiply scales with 99.7%
>> efficiency on the X5550, averaging 2.6/4 SIMD fused multiply-add
>> operations per cycle in single precision.
>>
>> (double-precision)
>> [aron@kw2050]~/sandbox/eigen-smp/bench% g++ bench_gemm.cpp -DNDEBUG
>> -DHAVE_BLAS -I.. -O2 -fopenmp -lrt -lblas -o ./bench && /usr/bin/time
>> -p ./bench
>> Warning, your parallel product is crap! <I need to fix this>
>>
>> blas cpu 0.13462s 1.99402 GFLOPS (14.0937s)
>> blas real 0.134625s 1.99395 GFLOPS (13.4901s)
>> eigen cpu 0.0363907s 7.37649 GFLOPS (3.70925s)
>> eigen real 0.00455555s 58.925 GFLOPS (0.465924s)
>> real 14.11
>> user 17.95
>> sys 0.11
>>
>> Again, near-perfect scaling, and eigen is averaging 1.4/2 SIMD fused
>> multiply-add operations per cycle in double precision.
>>
>> I'll look at this more later this week, and I'd like to more carefully
>> verify these numbers since they're pretty astonishing to me. Gael,
>> I'm happy to give you an honorary A+ in my Parallel Computing
>> Paradigms course if these are legit.
>>
>> A
>>
>> On Fri, Feb 26, 2010 at 4:26 PM, Gael Guennebaud
>> <gael.guennebaud@xxxxxxxxx> wrote:
>> >
>> > Thank you for link too :)
>> >
>> > And to entertain everybody following our adventures, here are the
>> > mandatory
>> > pictures:
>> >
>> > * single core: http://dl.dropbox.com/u/260133/matrix_matrix.pdf
>> > * quad cores: http://dl.dropbox.com/u/260133/matrix_matrix-smp.pdf
>> >
>> > gael
>> >
>> >
>> > On Fri, Feb 26, 2010 at 1:02 PM, Aron Ahmadia <aja2111@xxxxxxxxxxxx>
>> > wrote:
>> >>
>> >> Those are some good notes, thanks Frank.
>> >>
>> >> It's easy to get confused there because he's assuming a distributed
>> >> memory layout, but still, that might be a useful technique to try and
>> >> apply.
>> >>
>> >> A
>> >>
>> >> On Fri, Feb 26, 2010 at 2:57 PM, FMDSPAM <fmdspam@xxxxxxxxx> wrote:
>> >> > Am 26.02.2010 11:28, schrieb Aron Ahmadia:
>> >> >
>> >> > <snip>
>> >> >
>> >> > Okay, this might be a bit tricky, so forgive me if I'm
>> >> > over-complicating things, can we introduce another subdivision?:
>> >> >
>> >> >
>> >> >
>> >> > Forgive me my shameless plug. A short discussion on that topic I've
>> >> > found
>> >> > some day here .
>> >> > Most of what he is discussing, and what you are doing, are bejond my
>> >> > skills.
>> >> > but possible it helps.
>> >> >
>> >> > Frank.
>> >> >
>> >> >
>> >>
>> >>
>> >
>> >
>>
>>
>
>