[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] nesting
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Thu, 4 Feb 2010 17:15:12 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=aOr9J6+RCXm6gud2lxZS0uQls0AwBeGRfDmcUfJJm0c=; b=rMnahgvsr3gV2SuUAkhBSBMNroK0euEvdX8Pz+6KUQ7Cu70Gs6v/r7CnsbZetKB0HV BW5ShWGMvmp/S6OU/1Db7j6a4ALS+w2yOru+29Nql2Y70VyCmtl3L3Wjw9xgc2z0UfXM f8Js/yu54Vvos6lOGouvfCAiWVrv4174s1ynA=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=vKkzVDPStR597GIUr6KrzbaZemp1uo9gq8zJORRZyMXPkC/EtDKHn+0B5DHNKomuhm 2Uymnl6rhsG4Kdyd8zUDZqUdWl/+eI+4Bgdbpjc8UqZTBYKS54qNA5dA/4jQZbclibSS 0LA7RvHRpufRq006bXaTZyHhmFmvpgviJU/5Q=
2010/2/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
> yes I think there are many cases where temporaries or useless
> evaluations could be avoided, again:
>
> R = A*B + C*D;
and, sorry, I really did reply too fast. In that example, whether or
not this needs temporaries depends on whether the expressions
R,A,B,C,D alias each other, right? Which we won't even try to handle.
So the current behavior seems to me to be the only possible one... in
other words, if you remove temporaries there, what happens with
A = A*B + B*A ???
Benoit
>
> or
>
> (A*B).block()
>
> or
>
> (A*B + C).block()
>
> or
>
> R = s*(A*B +C*D)
> (2 temporaries initialized to 0+ 2 calls to gemm + one scalar product)
> =>
> R = s*A*B;
> R+=s*C*D;
> (only 2 calls to gemm)
>
>> When we have an expression
>> like:
>>
>> A + B*C
>>
>> then it seems to me that the logic is pretty simple: either B*C is an
>> outerproduct, in which case we should have neither EvalBeforeNesting
>> nor EvalBeforeAssigning, or it is NOT an outer product, in which case
>> we should have the 2 bits.
>
> that's more complicated:
> - large outer products must keep EvalBeforeNesting for performance reasons
> - removing the EvalBeforeNesting bit here has a very very low impact
> on the performances.
>
>
> Our real problem is the following:
>
> A*B + A1 + A2 + A3 + A4 + A5
>
> creates 5 temporary matrices, the result of A*B is copied 4 times....
>
> gael
>
>>
>> Am I missing something? I am especially afraid of being missing
>> something about the blas_traits and how you implemented that stuff ---
>> you know better than me.
>>
>> Benoit
>>
>>
>>
>>> Such an analyzer/evaluator would look like the current ei_blas_traits....
>>> Some examples of what could be done with such an approach:
>>>
>>> (A + B).block() => (A.block() + B.block())
>>>
>>> E.noalias() += A*B + C*D;
>>>
>>> =>
>>>
>>> E.noalias() += A*B;
>>> E.noalias() += C*D;
>>>
>>> This also offers more parallelization opportunities.
>>>
>>> Sounds good, but of course I'm really scared about compilation times... This
>>> is why I did not talk that much about that idea so far.
>>>
>>> gael.
>>>
>>>
>>> On Thu, Feb 4, 2010 at 2:35 PM, Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> While looking into the "performance degradation" issue from the forum
>>>> I found out that it is due to temporaries - as Benoit already guessed.
>>>>
>>>> I am a little bit afraid, that what I once proposed, namely copying
>>>> expressions by value, is now backfiring. The reason is that initially
>>>> I assumed expressions to be tiny little objects with close to no copy
>>>> costs. The issue is related to those expressions holding temporaries.
>>>> Copying them (e.g. a product expression) means copying all the data
>>>> including the temporary and that will happen as many times as we nest
>>>> expressions.
>>>>
>>>> The only solution I can think about at the moment is the
>>>> specialization of ei_nested for those types and to go back to nesting
>>>> by reference for these heavy weight guys.
>>>>
>>>> - Hauke
>>>
>>>
>>>
>>
>>
>>
>
>
>