Re: [eigen] nesting

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On Thu, Feb 4, 2010 at 8:03 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2010/2/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>
>> arf indeed, anytime an expression is evaluated at nesting time, a matrix
>> gets stored by value and potentially it can be copied multiple times... I
>> think this is one more argument in favor of a top-down evaluation mechanism.
>> In short, the idea behind this is:
>>
>> 1 - remove all evaluation at nesting time, and simply build the complete
>> expression tree
>> 2 - at evaluation time, you send your complete expression to a template
>> evaluator which will recursively analyze your expression from the top,
>> reorder some subexpression, and evaluate some sub expressions when needed.
>
> As you say below, the main concern with this approach is compilation
> times. With this approach, when we see
>
>  A * B * C * D
>
> we really construct a deeply nested product of product of...
>
> By contrast, with the current approach, since we evaluate right away
> before continuing constructing the expression tree. So all we are
> dealing with is products of matrices. Thus, this can limit a lot the
> number of different template evaluations that we make. I regard this
> as a quite nifty optimization and would like to retain it!
>
> The idea that you propose would indeed be necessary if it were
> difficult to know where to introduce temporaries before the whole
> expression is known. But is that the case?

yes I think there are many cases where temporaries or useless
evaluations could be avoided, again:

R = A*B + C*D;

or

(A*B).block()

or

(A*B + C).block()

or

R = s*(A*B +C*D)
(2 temporaries initialized to 0+ 2 calls to gemm + one scalar product)
=>
R = s*A*B;
R+=s*C*D;
(only 2 calls to gemm)

> When we have an expression
> like:
>
>  A + B*C
>
> then it seems to me that the logic is pretty simple: either B*C is an
> outerproduct, in which case we should have neither EvalBeforeNesting
> nor EvalBeforeAssigning, or it is NOT an outer product, in which case
> we should have the 2 bits.

that's more complicated:
- large outer products must keep EvalBeforeNesting for performance reasons
- removing the EvalBeforeNesting bit here has a very very low impact
on the performances.


Our real problem is the following:

A*B + A1 + A2 + A3 + A4 + A5

creates 5 temporary matrices, the result of A*B is copied 4 times....

gael

>
> Am I missing something? I am especially afraid of being missing
> something about the blas_traits and how you implemented that stuff ---
> you know better than me.
>
> Benoit
>
>
>
>> Such an analyzer/evaluator would look like the current ei_blas_traits...
>> Some examples of what could be done with such an approach:
>>
>> (A + B).block() => (A.block() + B.block())
>>
>> E.noalias() += A*B + C*D;
>>
>> =>
>>
>> E.noalias() += A*B;
>> E.noalias() += C*D;
>>
>> This also offers more parallelization opportunities.
>>
>> Sounds good, but of course I'm really scared about compilation times... This
>> is why I did not talk that much about that idea so far.
>>
>> gael.
>>
>>
>> On Thu, Feb 4, 2010 at 2:35 PM, Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>
>> wrote:
>>>
>>> Hi,
>>>
>>> While looking into the "performance degradation" issue from the forum
>>> I found out that it is due to temporaries - as Benoit already guessed.
>>>
>>> I am a little bit afraid, that what I once proposed, namely copying
>>> expressions by value, is now backfiring. The reason is that initially
>>> I assumed expressions to be tiny little objects with close to no copy
>>> costs. The issue is related to those expressions holding temporaries.
>>> Copying them (e.g. a product expression) means copying all the data
>>> including the temporary and that will happen as many times as we nest
>>> expressions.
>>>
>>> The only solution I can think about at the moment is the
>>> specialization of ei_nested for those types and to go back to nesting
>>> by reference for these heavy weight guys.
>>>
>>> - Hauke
>>
>>
>>
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/