Re: [eigen] nesting

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Oh, I see. You're really thinking far ahead of where I'm thinking. All
what you're saying makes sense. Good to have you thinking about that!!

Benoit

2010/2/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
> On Thu, Feb 4, 2010 at 8:03 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> 2010/2/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>>
>>> arf indeed, anytime an expression is evaluated at nesting time, a matrix
>>> gets stored by value and potentially it can be copied multiple times... I
>>> think this is one more argument in favor of a top-down evaluation mechanism.
>>> In short, the idea behind this is:
>>>
>>> 1 - remove all evaluation at nesting time, and simply build the complete
>>> expression tree
>>> 2 - at evaluation time, you send your complete expression to a template
>>> evaluator which will recursively analyze your expression from the top,
>>> reorder some subexpression, and evaluate some sub expressions when needed.
>>
>> As you say below, the main concern with this approach is compilation
>> times. With this approach, when we see
>>
>>  A * B * C * D
>>
>> we really construct a deeply nested product of product of...
>>
>> By contrast, with the current approach, since we evaluate right away
>> before continuing constructing the expression tree. So all we are
>> dealing with is products of matrices. Thus, this can limit a lot the
>> number of different template evaluations that we make. I regard this
>> as a quite nifty optimization and would like to retain it!
>>
>> The idea that you propose would indeed be necessary if it were
>> difficult to know where to introduce temporaries before the whole
>> expression is known. But is that the case?
>
> yes I think there are many cases where temporaries or useless
> evaluations could be avoided, again:
>
> R = A*B + C*D;
>
> or
>
> (A*B).block()
>
> or
>
> (A*B + C).block()
>
> or
>
> R = s*(A*B +C*D)
> (2 temporaries initialized to 0+ 2 calls to gemm + one scalar product)
> =>
> R = s*A*B;
> R+=s*C*D;
> (only 2 calls to gemm)
>
>> When we have an expression
>> like:
>>
>>  A + B*C
>>
>> then it seems to me that the logic is pretty simple: either B*C is an
>> outerproduct, in which case we should have neither EvalBeforeNesting
>> nor EvalBeforeAssigning, or it is NOT an outer product, in which case
>> we should have the 2 bits.
>
> that's more complicated:
> - large outer products must keep EvalBeforeNesting for performance reasons
> - removing the EvalBeforeNesting bit here has a very very low impact
> on the performances.
>
>
> Our real problem is the following:
>
> A*B + A1 + A2 + A3 + A4 + A5
>
> creates 5 temporary matrices, the result of A*B is copied 4 times....
>
> gael
>
>>
>> Am I missing something? I am especially afraid of being missing
>> something about the blas_traits and how you implemented that stuff ---
>> you know better than me.
>>
>> Benoit
>>
>>
>>
>>> Such an analyzer/evaluator would look like the current ei_blas_traits....
>>> Some examples of what could be done with such an approach:
>>>
>>> (A + B).block() => (A.block() + B.block())
>>>
>>> E.noalias() += A*B + C*D;
>>>
>>> =>
>>>
>>> E.noalias() += A*B;
>>> E.noalias() += C*D;
>>>
>>> This also offers more parallelization opportunities.
>>>
>>> Sounds good, but of course I'm really scared about compilation times... This
>>> is why I did not talk that much about that idea so far.
>>>
>>> gael.
>>>
>>>
>>> On Thu, Feb 4, 2010 at 2:35 PM, Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> While looking into the "performance degradation" issue from the forum
>>>> I found out that it is due to temporaries - as Benoit already guessed.
>>>>
>>>> I am a little bit afraid, that what I once proposed, namely copying
>>>> expressions by value, is now backfiring. The reason is that initially
>>>> I assumed expressions to be tiny little objects with close to no copy
>>>> costs. The issue is related to those expressions holding temporaries.
>>>> Copying them (e.g. a product expression) means copying all the data
>>>> including the temporary and that will happen as many times as we nest
>>>> expressions.
>>>>
>>>> The only solution I can think about at the moment is the
>>>> specialization of ei_nested for those types and to go back to nesting
>>>> by reference for these heavy weight guys.
>>>>
>>>> - Hauke
>>>
>>>
>>>
>>
>>
>>
>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/