Let me recall that currently expressions are nested by reference that enforces the use NestByValue when a function has to return a nested _expression_. See for instance adjoint() which returns Transpose<NestByValue<CwiseUnaryOp<ei_scalar_conjugate<Scalar>, Derived> > >. As you can see this is pretty annoying. In Hauke's fork lightweight expressions (i.e., all but Matrix) are automatically nested by value. So need for the NestByValue workaround.
So now the question is what about the performances ? Well I tried a very simple example:
Vector4f a, b, c, d; c = a+b+c+d;
and here are the respective assembly codes generated by g++ 4.3.3 (-O2 -DNDEBUG):
So clearly, gcc has a lot of difficulties to optimize this simple code. In both cases we can see a lot of useless copies from the stack to the stack, but the situation with the nesting by value is much much worse, unfortunately.