Re: [eigen] FLENS C++ expression template Library has excellent documentation

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2009/4/18 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
> On Sat, Apr 18, 2009 at 9:24 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>>>> I was thinking of something like -DEIGEN_PARALLEL 4 (ie opt in
>>>> parallelization) at compile time to launch that many threads at
>>>> compile time. BLAS 1 should be trivial to parallelize and BLAS2
>>>> shouldn't be too difficult either.
>>>
>>> unfortunately no, the problem is that we need to be able to control
>>> that per expression, and we ave to come up with a good API for
>>> that.Since eigen is a pure template library we cannot have global
>>> states for that.
>>
>> I see. Since the compiler is generating a lot of code for us, we need
>> to parallelize the = assignment operator as it is the one that does
>> all the actual calculations. I don't want to have global state either.
>> May be an eigen_parallel macro which just does omp_set_procs(). So all
>> the functions which are parallel are automatically parallelized w/o
>> user bothering about it. And it can be done per expression. Since
>> changing it per expression is kinda corner case, may be this is a good
>> starting point, at least for discussing what kind of API we want.
>
> yes, we have to study in more details the possibilities offered by
> openMP and how it behaves when nesting parallel loops

I agree; i think that a very important think to check is whether the
OpenMP state (e.g. omp_set_procs) is thread-local or not.

If it is, then I agree that macros as Rohit suggests may work.

Otherwise, even though Eigen is a pure template library, there is
perhaps something we can try to let it have its own thread-local state
(that should of course remain optional, only affect those who use
parallelization). The idea would be to store the state (e.g. number of
threads to use) in a thread-local variable, and perhaps provide macros
to let the user application declare that thread-local variable...
short-term we could rely on compiler extensions for thread-local,
long-term this is part of C++0x.

>>> actually we have already experimented wit openMP, have a look at the
>>> file disabled/EvalOMP.h
>>
>> Curious, why was it disabled?
>
> mainly because of the above API issue

And also we didn't get a significant performance improvement outside
of BLAS level 1 ops. The problem is that for level>=2,
cache-friendliness becomes important so we tend to bypass the generic
xpr operator= and have specialized loops that are less obvious to
parallelize.

Benoit



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/