Re: [eigen] a branch for SMP (openmp) experimentations

Re: [eigen] a branch for SMP (openmp) experimentations

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] a branch for SMP (openmp) experimentations
From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
Date: Fri, 26 Feb 2010 12:38:30 +0100
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=WuHp3r57zZiwY4LsgRKR/wA+RO7teBkz/NJfAwrnbGw=; b=MOpG9u92F3iAVd3M3KcCtN/PQ/Q2uUFFAsdPQNVl1qA52trIVQoos/JBqXWPlHyxy1 DQB/EtppYHxV/923gpIaWOSuvMLQn8UeWisrGCCafAwj4vYr1/ls+yg9XCVZ3s985YdD p3hmvCOdmnvL9P3szL4BrVTeWcjfDa/cuIcrs=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=Aygo8bS3WhAwGVfdftXPP+tgHv1J4ziFvAYV96osaa64LVZgyTZdT2l4GcWCVRgwML FLM3vXSHfF6uRNozcG82RnXiG3T612ipPcyuwN5Rm53Q7hnnrGzyn3A0Ghf9zwJLvuY7 0U74vMQyKz+CIgoeWWZUZrcib1cR6WLfUGJfk=

ok, so adding a barrier *before* the packing fixed the issue, even when the packing to B' is distributed:

for(k=0;k<nb_k;++k)
{
   #pragma omp barrier
   pack B_k
   #pragma omp barrier

// now it's safe :)
}

So I've committed this new version, and the perf for matrices of size 2048, and using 4 cores are as follow (Gflops):

previous strategy: 57.7
new strategy: 62.2
new strategy using GOTO's low level routines (for the packings and the gebp kernel): 67.4
new strategy without the two barriers (then the results are not correct): 64.7
new strategy without the two barriers and using GOTO's routines: 68.

gael.

On Fri, Feb 26, 2010 at 11:58 AM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

On Fri, Feb 26, 2010 at 10:44 AM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

There is also something very strange: if I change the code so that all threads pack the exactly same B_k to the same shared B' and keep the barrier, then I still don't get a correct result... (if each thread have there own B', then it's fine)

arf, I'm too much used to GPU computing where all threads of a wrap follows the same execution path. Here I realized that even though all threads have to do exactly the same amount of work they can be totally de-synchronized: the barrier occurs with different horizontal panel Bk of B ! To be more precise, the outermost loop looks like this:

for(k=0;k<nb_k;++k)
{
   pack_b(k);

   #pragma omp barrier

   // here some threads have k=0 while others have k=1....
}

I guess that means that packing b is faster than creating a thread, and so the first barrier occurs before all threads have been launched ! So we really have to take care at how we synchronize the threads.

gael

References:
- [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Hauke Heibel
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Aron Ahmadia
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Aron Ahmadia
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud
- Re: [eigen] a branch for SMP (openmp) experimentations
  - From: Gael Guennebaud

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] a branch for SMP (openmp) experimentations
Next by Date: Re: [eigen] a branch for SMP (openmp) experimentations
Previous by thread: Re: [eigen] a branch for SMP (openmp) experimentations
Next by thread: [eigen] [patch] LDLt decomposition with rank-deficient matrices

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/