On Mon, Feb 22, 2010 at 11:43 AM, Hauke Heibel
<
hauke.heibel@xxxxxxxxxxxxxx> wrote:
> Works like a charm - see attachement.
>
> - Hauke
>
> On Mon, Feb 22, 2010 at 11:28 AM, Gael Guennebaud
> <
gael.guennebaud@xxxxxxxxxx> wrote:
>>
>> Hi,
>>
>> I have just created a fork there:
>>
>>
http://bitbucket.org/ggael/eigen-smp
>>
>> to play with SMP support, and more precisely, with OpenMP.
>>
>> Currently only the general matrix-matrix product is parallelized. I've
>> implemented a general 1D parallelizer to factor the parallelization code. It
>> is defined there:
>>
>> Eigen/src/Core/products/Parallelizer.h
>>
>> and used at the end of this file:
>> Eigen/src/Core/products/GeneralMatrixMatrix.h
>>
>> In the bench/ folder there are two bench_gemm*.cpp files to try it and
>> compare to BLAS.
>>
>> On my core2 duo, I've observed a speed up around 1.9 for relatively small
>> matrices.
>>
>> At work I have an "Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz" but it
>> is currently too busy to do really meaningfull experiments. Nevertheless, it
>> seems that gotoblas, which is directly using pthread, reports more
>> consistent speedups. So perhaps OpenMP is trying to do some too smart
>> scheduling and it might be useful to directly deal with pthread?
>>
>> For the lazy but interested reader, the interesting piece of code is there:
>>
>> int threads = omp_get_num_procs();
>> int blockSize = size / threads;
>> #pragma omp parallel for schedule(static,1)
>> for(int i=0; i<threads; ++i)
>> {
>> int blockStart = i*blockSize;
>> int actualBlockSize = std::min(blockSize, size - blockStart);
>>
>> func(blockStart, actualBlockSize);
>> }
>>
>> feelfree to play with it and have fun!
>>
>> Gael.
>>
>> PS: to Aron, yesterday the main pb was that our benchtimer reported the
>> total execution time, not the "real" one... the other was that my system was
>> a bit buzy because of daemons, and stuff like that...
>>
>