Re: [eigen] Nesting by reference of by value ?

[ Thread Index | Date Index | More Archives ]

In case you still have the test-code you used it would be great if you could send it to me - I'll have time on the weekend to do the tests and check the ASM code.

- Hauke

On Wed, Nov 18, 2009 at 7:35 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

On Wed, Nov 18, 2009 at 6:49 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

On Wed, Nov 18, 2009 at 6:45 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
2009/11/18 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
> Hi,
> I've just played a bit with Hauke's nesting refactoring fork
> (
> Let me recall that currently expressions are nested by reference that
> enforces the use NestByValue when a function has to return a nested
> _expression_. See for instance adjoint() which returns
> Transpose<NestByValue<CwiseUnaryOp<ei_scalar_conjugate<Scalar>, Derived> >
>>. As you can see this is pretty annoying. In Hauke's fork lightweight
> expressions (i.e., all but Matrix) are automatically nested by value. So
> need for the NestByValue workaround.
> So now the question is what about the performances ? Well I tried a very
> simple example:
> Vector4f a, b, c, d;
> c = a+b+c+d;
> and here are the respective assembly codes generated by g++ 4.3.3

That wouldn't be the first time that g++ 4.3 is stupid, right?

It would be interesting to see g++ 4.4.


ok, gcc 4.2 has same issue here.

gcc 4.4 generates the same good code in both case:

    movaps    112(%rsp), %xmm0
    addps    96(%rsp), %xmm0

    addps    80(%rsp), %xmm0
    addps    64(%rsp), %xmm0
    movaps    %xmm0, 80(%rsp)

ok, actually I forgot the rules #1 when benchmarking gcc, never put your critical code in the main function, but put it in a separated, not inlined,  function. So now, for the same computation, gcc 4.3 and 4.4 generate good code in both cases. gcc 4.2 still generates the same poor code as above.

Then I tried the same computation but with VectorXf instead of Vector4f.. Then both gcc 4.2 and 4.3 generates a better code for the inner vectorized loop when nesting by value. I observed a significant speedup here. However, gcc 4.4 generates the same code in both cases.

Then I added some scalar multiple and sub matrix operations, and well, there is no real winner, especially with gcc 4.4 which consistently generates similar code. So finally, for me this change is safe regarding the performances.

Now it would be interesting to bench MSVC as well since it seems this compiler has more difficulties to manage Eigen's code, but this is something I cannot do.


Mail converted by MHonArc 2.6.19+