[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] nesting
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Thu, 4 Feb 2010 23:23:47 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=SbxAej+yCaClyFeebEvUhys3verUADEmY7qPzz/ZAjE=; b=T8i98PCE/R1+lLfT7etpVHdOG/jTv42kKA+ZJtEJ6zVf6XFz+RLEMlmSBptHaXO7fg Mjfap9GIHZANN3xHOs8W4211lAzH2H5n5UX1sTIzjJBP6qw3vDcDGd9LQHtjp5rv8efU aaaRGzBdUqI7BKjlCUE4sX9e33zm4zwaRT99k=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=rUOIVP71ZcIAvwH9cMaFNZwziZPLddkZzizeluH0OSvXDy3hvpY1Xbr4Bpj6nuT2+3 tmU8Lp6zCHtwigcRwGliiFUkayuxKsyYRhJ3frFRRcnBFtweXRYQ3TvGZHhL3AbGd5CM 2Je2l7ci5+V74b33gnf8XZlS0TWDMFhxoG9GU=
On Thu, Feb 4, 2010 at 11:17 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> On Thu, Feb 4, 2010 at 11:11 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> Just one thing that I don't follow:
>>
>> 2010/2/4 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>> Our real problem is the following:
>>>
>>> A*B + A1 + A2 + A3 + A4 + A5
>>>
>>> creates 5 temporary matrices, the result of A*B is copied 4 times....
>>
>> Why is it so? After A*B has evaluated into a temporary matrix, isn't
>> it the same as
>>
>> tmp + A1 + A2 + A3 + A4 + A5
>>
>> ? That doesn't evaluate at every step ...!
>
> this is the problem that Hauke is talking about: since we nest by
> value, tmp is stored in the expression of tmp + A1, so sizeof(tmp+A1)
> = big, so does (tmp+A1)+A2, etc. I did not check but I think that the
> way it works.
>
> on the other hand if you write:
>
> (A*B).eval() + A1 + A2 + A3 + A4 + A5
>
> then it is fine fine because the temporary is explicitly created on
> the stack and stored by reference by the binary expressions...
I checked:
r = x*x +y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y;
=> 0.22sec
r = (x*x).eval()
+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y+y;
=> 0.063sec
gael.
>
> gael
>
>
>> Benoit
>>
>>>
>>> gael
>>>
>>>>
>>>> Am I missing something? I am especially afraid of being missing
>>>> something about the blas_traits and how you implemented that stuff ---
>>>> you know better than me.
>>>>
>>>> Benoit
>>>>
>>>>
>>>>
>>>>> Such an analyzer/evaluator would look like the current ei_blas_traits...
>>>>> Some examples of what could be done with such an approach:
>>>>>
>>>>> (A + B).block() => (A.block() + B.block())
>>>>>
>>>>> E.noalias() += A*B + C*D;
>>>>>
>>>>> =>
>>>>>
>>>>> E.noalias() += A*B;
>>>>> E.noalias() += C*D;
>>>>>
>>>>> This also offers more parallelization opportunities.
>>>>>
>>>>> Sounds good, but of course I'm really scared about compilation times... This
>>>>> is why I did not talk that much about that idea so far.
>>>>>
>>>>> gael.
>>>>>
>>>>>
>>>>> On Thu, Feb 4, 2010 at 2:35 PM, Hauke Heibel <hauke.heibel@xxxxxxxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> While looking into the "performance degradation" issue from the forum
>>>>>> I found out that it is due to temporaries - as Benoit already guessed.
>>>>>>
>>>>>> I am a little bit afraid, that what I once proposed, namely copying
>>>>>> expressions by value, is now backfiring. The reason is that initially
>>>>>> I assumed expressions to be tiny little objects with close to no copy
>>>>>> costs. The issue is related to those expressions holding temporaries.
>>>>>> Copying them (e.g. a product expression) means copying all the data
>>>>>> including the temporary and that will happen as many times as we nest
>>>>>> expressions.
>>>>>>
>>>>>> The only solution I can think about at the moment is the
>>>>>> specialization of ei_nested for those types and to go back to nesting
>>>>>> by reference for these heavy weight guys.
>>>>>>
>>>>>> - Hauke
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>