|Re: [eigen] a branch for SMP (openmp) experimentations|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] a branch for SMP (openmp) experimentations
- From: Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>
- Date: Mon, 22 Feb 2010 11:43:33 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=j4sOUcwD8eIsRDzUlmalQpazbnBOiuU8huwjq28vn6I=; b=heYdfq6R1q50Lwx0w8jZ4/2Qlnhy/JnSK9LXeR6qzoXlEgQJiTflqoqQiQuBsQf0MG 0RgZ25gpvh6bJ0VSKdfDB77UqRkZ0uBCNiOtPOhXJOV0x+DAhrmLwd9cNwzFLaaR484+ UQJu74KDSiwtblAcop7TmoAkBZFrH1WSUlDa4=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Bi3pF4Wj3fr+YCYnMG5ATocRhoo/BzaZDMQgBUU7OZrazDew/4nj/0LYC+QEYXfLDN Q5HTvMMZJjH8MXkpZ3nEezD/RnUBy65sHBT1GQYlsGiYyYet8G6Z07jCG/s4jiNKIl0Y rxObJCr0aCEsgQ3HHkGV6/EYLToBjk6d4q9I0=
Works like a charm - see attachement.
On Mon, Feb 22, 2010 at 11:28 AM, Gael Guennebaud
> I have just created a fork there:
> to play with SMP support, and more precisely, with OpenMP.
> Currently only the general matrix-matrix product is parallelized. I've
> implemented a general 1D parallelizer to factor the parallelization code. It
> is defined there:
> and used at the end of this file:
> In the bench/ folder there are two bench_gemm*.cpp files to try it and
> compare to BLAS.
> On my core2 duo, I've observed a speed up around 1.9 for relatively small
> At work I have an "Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz" but it
> is currently too busy to do really meaningfull experiments. Nevertheless, it
> seems that gotoblas, which is directly using pthread, reports more
> consistent speedups. So perhaps OpenMP is trying to do some too smart
> scheduling and it might be useful to directly deal with pthread?
> For the lazy but interested reader, the interesting piece of code is there:
> int threads = omp_get_num_procs();
> int blockSize = size / threads;
> #pragma omp parallel for schedule(static,1)
> for(int i=0; i<threads; ++i)
> int blockStart = i*blockSize;
> int actualBlockSize = std::min(blockSize, size - blockStart);
> func(blockStart, actualBlockSize);
> feelfree to play with it and have fun!
> PS: to Aron, yesterday the main pb was that our benchtimer reported the
> total execution time, not the "real" one... the other was that my system was
> a bit buzy because of daemons, and stuff like that...
cpu 0.275584s 7.79248 GFLOPS (2.89015s)
real 0.2755s 7.79486 GFLOPS (2.89s)
cpu 0.269135s 7.97921 GFLOPS (2.72174s)
real 0.0002705s 7.93894 GFLOPS (2.99972s)
cpu 0.270066s 7.9517 GFLOPS (2.71804s)
real 0.27s 7.95364 GFLOPS (2.718s)
cpu 0.15332s 14.0066 GFLOPS (1.58204s)
real 0.153s 14.0358 GFLOPS (1.582s)
cpu 0.150302s 14.2878 GFLOPS (1.54616s)
real 0.1505s 14.269 GFLOPS (1.546s)
cpu 0.149224s 14.391 GFLOPS (1.63783s)
real 0.1495s 14.3644 GFLOPS (1.638s)