Re: [eigen] Nesting by reference of by value ? |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
On Wed, Nov 18, 2009 at 6:49 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:
ok, gcc 4.2 has same issue here.On Wed, Nov 18, 2009 at 6:45 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
2009/11/18 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>That wouldn't be the first time that g++ 4.3 is stupid, right?
> Hi,
>
> I've just played a bit with Hauke's nesting refactoring fork
> (https://bitbucket.org/hauke/nesting-refactoring/).
>
> Let me recall that currently expressions are nested by reference that
> enforces the use NestByValue when a function has to return a nested
> _expression_. See for instance adjoint() which returns
> Transpose<NestByValue<CwiseUnaryOp<ei_scalar_conjugate<Scalar>, Derived> >
>>. As you can see this is pretty annoying. In Hauke's fork lightweight
> expressions (i.e., all but Matrix) are automatically nested by value. So
> need for the NestByValue workaround.
>
> So now the question is what about the performances ? Well I tried a very
> simple example:
>
> Vector4f a, b, c, d;
> c = a+b+c+d;
>
> and here are the respective assembly codes generated by g++ 4.3.3
It would be interesting to see g++ 4.4.
Benoit
gcc 4.4 generates the same good code in both case:
movaps 112(%rsp), %xmm0
addps 96(%rsp), %xmm0
addps 80(%rsp), %xmm0
addps 64(%rsp), %xmm0
movaps %xmm0, 80(%rsp)
ok, actually I forgot the rules #1 when benchmarking gcc, never put your critical code in the main function, but put it in a separated, not inlined, function. So now, for the same computation, gcc 4.3 and 4.4 generate good code in both cases. gcc 4.2 still generates the same poor code as above.
Then I tried the same computation but with VectorXf instead of Vector4f.. Then both gcc 4.2 and 4.3 generates a better code for the inner vectorized loop when nesting by value. I observed a significant speedup here. However, gcc 4.4 generates the same code in both cases.
Then I added some scalar multiple and sub matrix operations, and well, there is no real winner, especially with gcc 4.4 which consistently generates similar code. So finally, for me this change is safe regarding the performances.
Now it would be interesting to bench MSVC as well since it seems this compiler has more difficulties to manage Eigen's code, but this is something I cannot do.
gael.
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |