Re: [eigen] FLENS C++ expression template Library has excellent docume

Re: [eigen] FLENS C++ expression template Library has excellent documentation

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] FLENS C++ expression template Library has excellent documentation
From: Rohit Garg <rpg.314@xxxxxxxxx>
Date: Sat, 18 Apr 2009 11:10:48 +0530
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=Km9vmW9CvBHAakp5elBTfugvO/4T0PFkgvNVBy8e/jU=; b=SUW7UFAHpbMhfqfjg5SI5bmihb8iUPFGtzCPoWmBNMZ/5jZrzvLlf+Fcqdb9FxLWX/ //K2Md63q1NGytziUCKXugiYrg/8fAMbwmEvmEPxFB3KlaOJg/4u2GMhrh+dc12LyD+D cknboDlpgSkp19zRXCMQxV+RNS2EPW6q1C0Kc=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=HH2ocPhxbxvTKCPIvQDULTtgsn2ysx+FJhI2c5t9n/5yMl+/m6Qs+4418c7jnBzrI6 B3uctWgfWG+2mWfCu2zM1qsCqyjf6XD6FzFnb+8PZBdpcOER9andZlojbg8H2mFVlp0X htLCymZpXO9z5jhTIthWf8OFR28lYncJ7VCjo=

On Sat, Apr 18, 2009 at 10:53 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/4/18 Rohit Garg <rpg.314@xxxxxxxxx>:
>>> So, in which area does Intel MKL still have a long-term lead? I would
>>> say parallelization. We haven't started that yet and it is probably a
>>> very, very tough one. It's what I have in mind when I say that a
>>> BLAS/LAPACK wrapper is still welcome.
>>
>> Why do you think parallelization is very difficult? Do you mean
>> parallelization nfrastructure?  AFAICS, using openmp will be cool. Let
>> compiler handle all the dirty buisness etc This is something I want to
>> explore (time availability is of course important !) so I would like
>> some heads up.
>
> I'm very ignorant of these matters. If parallelization can be done in
> a generic way, then yes I understand that it's just a matter of
> OpenMP-ifying for loops. Although even so, there remain issues to
> sort: what if the user application doesn't want Eigen to launch more
> than N threads, because it is already launching a lot of threads of
> its own? OpenMP 2 didnt seem to help much in that situation, maybe 3
> is better.
>

I was thinking of something like -DEIGEN_PARALLEL 4 (ie opt in
parallelization) at compile time to launch that many threads at
compile time. BLAS 1 should be trivial to parallelize and BLAS2
shouldn't be too difficult either.

> But can efficient parallelization really be done in a generic way? It
> seems to me that different algorithms may require different strategies
> for parallelization. For example, look at matrix product.
> Parallelizing it will probably mean doing the product by blocks, each
> thread computing a different block. But there are many different ways
> of splitting the product into blocks. Which one is most efficient? One
> wants to minimize thread interdependency, but also to minimize memory
> accesses... conflicting goals! So it's not obvious at all which block
> product strategy to take. With other algorithms, another issue will be
> load balancing. All this makes me doubt that efficient parallelization
> can be obtained just by OpenMP-ifying some for loops!

BLAS3 is indeed tricky. Since most CPU's have both private and shared
caches, ie L1 is private per core and L2/3 is shared, we don't want
too much inter core cache coherency traffic either. We're gona nedd
some serious profiling to get the load balancing right.

> So as I see it the work for parallelizing algorithms splits into 2 phases,
>
> 1) change the algorithm itself to make it well parallelizable
> 2) implement using e.g. OpenMP (or that could also be another thing if
> e.g. we want to leverage GPUs)

GPUs too? seriously? Then you gotta go the opencl route. And then
parallelism is even more fun. GPUs want nice (multiple of 16/32)
sizes, and what about multi GPU parallelism? And you seriously want
zero copies across the PCI bus.

> A good canditate for parallelization would be Block Cholesky (LLt) or
> Block LU (with partial pivoting). (both of these are on my todo but
> i'm lagging, so feel free to not wait for me ;) ). More generally, any
> algorithm working per-blocks.

Actually I am interested in block Cholesky too, but I am lagging a bit
as well .......:(

> That's typical of scientific applications. You're not the only one to
> think like that. Then at the other end of the spectrum there are many
> people who consider compilation times to be very important for
> productivity, and there are also large free software projects *cough*
> KDE *cough* that take hours to compile already and where developers
> have to recompile the whole thing often...

Ahh... is there a compile flag which lets GCC take it's own sweet time
to compile?

Cheers,
-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay

Follow-Ups:
- Re: [eigen] FLENS C++ expression template Library has excellent documentation
  - From: Gael Guennebaud

References:
- [eigen] FLENS C++ expression template Library has excellent documentation
  - From: Andre Krause
- Re: [eigen] FLENS C++ expression template Library has excellent documentation
  - From: Benoit Jacob
- Re: [eigen] FLENS C++ expression template Library has excellent documentation
  - From: Christian Mayer
- Re: [eigen] FLENS C++ expression template Library has excellent documentation
  - From: Benoit Jacob
- Re: [eigen] FLENS C++ expression template Library has excellent documentation
  - From: Rohit Garg
- Re: [eigen] FLENS C++ expression template Library has excellent documentation
  - From: Benoit Jacob

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] FLENS C++ expression template Library has excellent documentation
Next by Date: Re: [eigen] FLENS C++ expression template Library has excellent documentation
Previous by thread: Re: [eigen] FLENS C++ expression template Library has excellent documentation
Next by thread: Re: [eigen] FLENS C++ expression template Library has excellent documentation

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/