[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] nesting
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Thu, 4 Feb 2010 23:05:20 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=Eyd/oDZ5EUu4u2gVftoMB8md8/axQMambdKZNjisMHU=; b=gE9DzIA3UcWpPZKiqejxnAzji1F2ADJY1Exxz0MlkkTmoIsTlcMF5pcu7J3EUj/b1E HFrzDNbannrYNVXd0ZwjA4uig5//ny6maZuuLkgjNR5fwp/WowppgDiUSIeI2voSI4uJ JWe/bri+3LhUAAHlWOpm6rrpu+7zmF1Ffn3tI=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=WtYmKzVBXCSrMvzzUVY80os8rb1KWCircBalx5qdkb8wOfN1ndcQfYyfMJbgvPoq40 jMuJgC+SFLb7TXcIboCjU59aajydb4QF7unMDbBgir/wD5bU1/JSMlOzzDchdDDxqQtz blleOsMn/1X5I++tuWKktFva97+HW7rS3dgvU=
On Thu, Feb 4, 2010 at 8:03 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2010/2/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>
>> arf indeed, anytime an expression is evaluated at nesting time, a matrix
>> gets stored by value and potentially it can be copied multiple times... I
>> think this is one more argument in favor of a top-down evaluation mechanism.
>> In short, the idea behind this is:
>>
>> 1 - remove all evaluation at nesting time, and simply build the complete
>> expression tree
>> 2 - at evaluation time, you send your complete expression to a template
>> evaluator which will recursively analyze your expression from the top,
>> reorder some subexpression, and evaluate some sub expressions when needed.
>
> As you say below, the main concern with this approach is compilation
> times. With this approach, when we see
>
> A * B * C * D
>
> we really construct a deeply nested product of product of...
>
> By contrast, with the current approach, since we evaluate right away
> before continuing constructing the expression tree. So all we are
> dealing with is products of matrices. Thus, this can limit a lot the
> number of different template evaluations that we make. I regard this
> as a quite nifty optimization and would like to retain it!
>
> The idea that you propose would indeed be necessary if it were
> difficult to know where to introduce temporaries before the whole
> expression is known. But is that the case?
yes I think there are many cases where temporaries or useless
evaluations could be avoided, again:
R = A*B + C*D;
or
(A*B).block()
or
(A*B + C).block()
or
R = s*(A*B +C*D)
(2 temporaries initialized to 0+ 2 calls to gemm + one scalar product)
=>
R = s*A*B;
R+=s*C*D;
(only 2 calls to gemm)
> When we have an expression
> like:
>
> A + B*C
>
> then it seems to me that the logic is pretty simple: either B*C is an
> outerproduct, in which case we should have neither EvalBeforeNesting
> nor EvalBeforeAssigning, or it is NOT an outer product, in which case
> we should have the 2 bits.
that's more complicated:
- large outer products must keep EvalBeforeNesting for performance reasons
- removing the EvalBeforeNesting bit here has a very very low impact
on the performances.
Our real problem is the following:
A*B + A1 + A2 + A3 + A4 + A5
creates 5 temporary matrices, the result of A*B is copied 4 times....
gael
>
> Am I missing something? I am especially afraid of being missing
> something about the blas_traits and how you implemented that stuff ---
> you know better than me.
>
> Benoit
>
>
>
>> Such an analyzer/evaluator would look like the current ei_blas_traits...
>> Some examples of what could be done with such an approach:
>>
>> (A + B).block() => (A.block() + B.block())
>>
>> E.noalias() += A*B + C*D;
>>
>> =>
>>
>> E.noalias() += A*B;
>> E.noalias() += C*D;
>>
>> This also offers more parallelization opportunities.
>>
>> Sounds good, but of course I'm really scared about compilation times... This
>> is why I did not talk that much about that idea so far.
>>
>> gael.
>>
>>
>> On Thu, Feb 4, 2010 at 2:35 PM, Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>
>> wrote:
>>>
>>> Hi,
>>>
>>> While looking into the "performance degradation" issue from the forum
>>> I found out that it is due to temporaries - as Benoit already guessed.
>>>
>>> I am a little bit afraid, that what I once proposed, namely copying
>>> expressions by value, is now backfiring. The reason is that initially
>>> I assumed expressions to be tiny little objects with close to no copy
>>> costs. The issue is related to those expressions holding temporaries.
>>> Copying them (e.g. a product expression) means copying all the data
>>> including the temporary and that will happen as many times as we nest
>>> expressions.
>>>
>>> The only solution I can think about at the moment is the
>>> specialization of ei_nested for those types and to go back to nesting
>>> by reference for these heavy weight guys.
>>>
>>> - Hauke
>>
>>
>>
>
>
>