Re: [eigen] a branch for SMP (openmp) experimentations |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] a branch for SMP (openmp) experimentations*From*: Aron Ahmadia <aja2111@xxxxxxxxxxxx>*Date*: Sat, 27 Feb 2010 18:51:23 +0300

Ahh, this makes much more sense, I was just trying to figure out what I was doing wrong... Unfortunately, this machine is occupied for the next 3 days, so I can't get reliable numbers out of it until then :( A On Sat, Feb 27, 2010 at 5:48 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote: > > hi, > > nice results! > > however, in order to estimate the efficiency wrt the number of threads, you > should run it with OMP_NUM_THREADS=1, and not use the CPU time returned in > the multithreaded case which is meaningless in this case. Then I expect a > ratio much lower than 99% ! > > basically, if mono threaded then use the "cpu" time, otherwise use the > "real" time. > > gael > > On Fri, Feb 26, 2010 at 9:17 PM, Aron Ahmadia <aja2111@xxxxxxxxxxxx> wrote: >> >> Some nice bench results coming off the X5550 @ 2.67GHz >> >> (single-precision) >> [aron@kw2050]~/sandbox/eigen-smp/bench% g++ bench_gemm.cpp -DNDEBUG >> -DHAVE_BLAS -I.. -O2 -fopenmp -lrt -lblas -o ./bench && /usr/bin/time >> -p ./bench >> blas cpu 0.133795s 2.00632 GFLOPS (14.0114s) >> blas real 0.133813s 2.00605 GFLOPS (13.3878s) >> eigen cpu 0.0191605s 14.0098 GFLOPS (1.92616s) >> eigen real 0.0024013s 111.787 GFLOPS (0.241387s) >> real 13.79 >> user 16.08 >> sys 0.13 >> >> For whatever reason, the BLAS isn't built multi-threaded, but its >> performance is pretty terrible even single-threaded. If these numbers >> are to be believed, Gael's multi-threaded multiply scales with 99.7% >> efficiency on the X5550, averaging 2.6/4 SIMD fused multiply-add >> operations per cycle in single precision. >> >> (double-precision) >> [aron@kw2050]~/sandbox/eigen-smp/bench% g++ bench_gemm.cpp -DNDEBUG >> -DHAVE_BLAS -I.. -O2 -fopenmp -lrt -lblas -o ./bench && /usr/bin/time >> -p ./bench >> Warning, your parallel product is crap! <I need to fix this> >> >> blas cpu 0.13462s 1.99402 GFLOPS (14.0937s) >> blas real 0.134625s 1.99395 GFLOPS (13.4901s) >> eigen cpu 0.0363907s 7.37649 GFLOPS (3.70925s) >> eigen real 0.00455555s 58.925 GFLOPS (0.465924s) >> real 14.11 >> user 17.95 >> sys 0.11 >> >> Again, near-perfect scaling, and eigen is averaging 1.4/2 SIMD fused >> multiply-add operations per cycle in double precision. >> >> I'll look at this more later this week, and I'd like to more carefully >> verify these numbers since they're pretty astonishing to me. Gael, >> I'm happy to give you an honorary A+ in my Parallel Computing >> Paradigms course if these are legit. >> >> A >> >> On Fri, Feb 26, 2010 at 4:26 PM, Gael Guennebaud >> <gael.guennebaud@xxxxxxxxx> wrote: >> > >> > Thank you for link too :) >> > >> > And to entertain everybody following our adventures, here are the >> > mandatory >> > pictures: >> > >> > * single core: http://dl.dropbox.com/u/260133/matrix_matrix.pdf >> > * quad cores: http://dl.dropbox.com/u/260133/matrix_matrix-smp.pdf >> > >> > gael >> > >> > >> > On Fri, Feb 26, 2010 at 1:02 PM, Aron Ahmadia <aja2111@xxxxxxxxxxxx> >> > wrote: >> >> >> >> Those are some good notes, thanks Frank. >> >> >> >> It's easy to get confused there because he's assuming a distributed >> >> memory layout, but still, that might be a useful technique to try and >> >> apply. >> >> >> >> A >> >> >> >> On Fri, Feb 26, 2010 at 2:57 PM, FMDSPAM <fmdspam@xxxxxxxxx> wrote: >> >> > Am 26.02.2010 11:28, schrieb Aron Ahmadia: >> >> > >> >> > <snip> >> >> > >> >> > Okay, this might be a bit tricky, so forgive me if I'm >> >> > over-complicating things, can we introduce another subdivision?: >> >> > >> >> > >> >> > >> >> > Forgive me my shameless plug. A short discussion on that topic I've >> >> > found >> >> > some day here . >> >> > Most of what he is discussing, and what you are doing, are bejond my >> >> > skills. >> >> > but possible it helps. >> >> > >> >> > Frank. >> >> > >> >> > >> >> >> >> >> > >> > >> >> > >

