|Re: [eigen] FLENS C++ expression template Library has excellent documentation|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] FLENS C++ expression template Library has excellent documentation
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Sat, 18 Apr 2009 10:00:29 -0400
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=23LVbts1SFDoxU1j+Gh20AKnXba9IfOQp5OUqUgJGsQ=; b=o6Q0E53ICw42vhOkmIYoC2teegjhHNwZqZ/iJeZYJ0UfR/ccgrhc1+cI5ivtM4cNiZ NDIrVuv+nzoKqkJ17EdTXvqrmc9dXiWxlocFbqtmyDYtYwjF3cp/2VGTysrIN8wKwTld h8AHmmOePqp1ykvDTAhKc7RukCOyeIBgpGfX4=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=PHXZrJ1O5sx3SyGMvCLtTDAdyTPJnNdVLE9KAugpzAxJv6x/z/Tk9oOO6Vnq7uRWXC A2y94B/Wm1audONDENdq0bdYzihkBKepqt+fqk04/5YNOlQdOK11+pPYuZ6hIJpXoJY+ aHYkpeB2nfAFfU4DyVNoY96T8pwaUVa3zrhM8=
2009/4/18 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
> On Sat, Apr 18, 2009 at 9:24 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>>>> I was thinking of something like -DEIGEN_PARALLEL 4 (ie opt in
>>>> parallelization) at compile time to launch that many threads at
>>>> compile time. BLAS 1 should be trivial to parallelize and BLAS2
>>>> shouldn't be too difficult either.
>>> unfortunately no, the problem is that we need to be able to control
>>> that per expression, and we ave to come up with a good API for
>>> that.Since eigen is a pure template library we cannot have global
>>> states for that.
>> I see. Since the compiler is generating a lot of code for us, we need
>> to parallelize the = assignment operator as it is the one that does
>> all the actual calculations. I don't want to have global state either.
>> May be an eigen_parallel macro which just does omp_set_procs(). So all
>> the functions which are parallel are automatically parallelized w/o
>> user bothering about it. And it can be done per expression. Since
>> changing it per expression is kinda corner case, may be this is a good
>> starting point, at least for discussing what kind of API we want.
> yes, we have to study in more details the possibilities offered by
> openMP and how it behaves when nesting parallel loops
I agree; i think that a very important think to check is whether the
OpenMP state (e.g. omp_set_procs) is thread-local or not.
If it is, then I agree that macros as Rohit suggests may work.
Otherwise, even though Eigen is a pure template library, there is
perhaps something we can try to let it have its own thread-local state
(that should of course remain optional, only affect those who use
parallelization). The idea would be to store the state (e.g. number of
threads to use) in a thread-local variable, and perhaps provide macros
to let the user application declare that thread-local variable...
short-term we could rely on compiler extensions for thread-local,
long-term this is part of C++0x.
>>> actually we have already experimented wit openMP, have a look at the
>>> file disabled/EvalOMP.h
>> Curious, why was it disabled?
> mainly because of the above API issue
And also we didn't get a significant performance improvement outside
of BLAS level 1 ops. The problem is that for level>=2,
cache-friendliness becomes important so we tend to bypass the generic
xpr operator= and have specialized loops that are less obvious to