Re: [eigen] a branch for SMP (openmp) experimentations

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Works like a charm - see attachement.

- Hauke

On Mon, Feb 22, 2010 at 11:28 AM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
>
> Hi,
>
> I have just created a fork there:
>
> http://bitbucket.org/ggael/eigen-smp
>
> to play with SMP support, and more precisely, with OpenMP.
>
> Currently only the general matrix-matrix product is parallelized. I've
> implemented a general 1D parallelizer to factor the parallelization code. It
> is defined there:
>
> Eigen/src/Core/products/Parallelizer.h
>
> and used at the end of this file:
> Eigen/src/Core/products/GeneralMatrixMatrix.h
>
> In the bench/ folder there are two bench_gemm*.cpp files to try it and
> compare to BLAS.
>
> On my core2 duo, I've observed a speed up around 1.9 for relatively small
> matrices.
>
> At work I have an "Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz" but it
> is currently too busy to do really meaningfull experiments. Nevertheless, it
> seems that gotoblas, which is directly using pthread, reports more
> consistent speedups. So perhaps OpenMP is trying to do some too smart
> scheduling and it might be useful to directly deal with pthread?
>
> For the lazy but interested reader, the interesting piece of code is there:
>
>   int threads = omp_get_num_procs();
>   int blockSize = size / threads;
>   #pragma omp parallel for schedule(static,1)
>   for(int i=0; i<threads; ++i)
>   {
>     int blockStart = i*blockSize;
>     int actualBlockSize = std::min(blockSize, size - blockStart);
>
>     func(blockStart, actualBlockSize);
>   }
>
> feelfree to play with it and have fun!
>
> Gael.
>
> PS: to Aron, yesterday the main pb was that our benchtimer reported the
> total execution time, not the "real" one... the other was that my system was
> a bit buzy because of daemons, and stuff like that...
>
without omp
-----------

cpu   0.275584s         7.79248 GFLOPS  (2.89015s)
real  0.2755s   		7.79486 GFLOPS  (2.89s)

cpu   0.269135s         7.97921 GFLOPS  (2.72174s)
real  0.0002705s        7.93894 GFLOPS  (2.99972s)

cpu   0.270066s         7.9517 GFLOPS   (2.71804s)
real  0.27s     		7.95364 GFLOPS  (2.718s)

with omp
--------

cpu   0.15332s          14.0066 GFLOPS  (1.58204s)
real  0.153s    		14.0358 GFLOPS  (1.582s)

cpu   0.150302s         14.2878 GFLOPS  (1.54616s)
real  0.1505s   		14.269 GFLOPS   (1.546s)

cpu   0.149224s         14.391 GFLOPS   (1.63783s)
real  0.1495s   		14.3644 GFLOPS  (1.638s)


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/