[eigen] Re: recent improvements in the products

On Wed, Jul 8, 2009 at 7:14 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

Hi,

just an email to keep the list uptodate about the recent progress I made in the matrix-matrix and matrix-vector products. So far only a limited set of expressions (only block, transpose, and operator+=) was efficiently handled by our fast product kernels, for instance:

C = A*B;
C = A.transpose() * B.block(....);
C.col(i) += A * B.row(j).transpose();

This was not bad, but this does not cover all the possibilities offered by the xGEMM BLAS routine. So I recently extended the matrix product logic to be able to catch a large variety of expressions which can be mapped to a single xGEMM call (or something equivalent to xGEMM). All these changes has been done in this branch: http://bitbucket.org/bjacob/eigen2_special_matrices/ to simplify future merging. So here are some examples of expressions which each compiles to a single function call to our own xGEMM-like routine without any temporary:

m3 -= (s1 * m1 * m2).lazy();
m3 = (m1.adjoint() * (s1*m2).conjugate()).lazy
m3 -= ((-m1*s2) * s1*m2.adjoint()).lazy()
v3 += ((-m1.adjoint() * s2) * (s1 * v1.adjoint())).lazy()

legend:
m = matrix
v = vector
s = scalar

In all these expressions we automatically extract a single scalar multiple of the complete _expression_ and tag the rhs and lhs to know if we have to take the complex conjugate of their coefficients bypassing all the nested expressions.

There are still a few limitations though:

The first one is:

m3 = (s * (m1 * m2)).lazy();

which is evaluated as: "m3 = s * (m1 * m2).eval()" that requires 2 evaluations and a temporary.
This is because the rhs matrix product is automatically evaluated by the scalar product _expression_. Therefore it is not possible to catch such an _expression_ this way:

template<typename Lhs, typename Rhs>
MatrixBase::operator=(const CwiseUnaryOp<ei_scalar_multiple<Scalar>, Product<Lhs,Rhs> >& xpr);

it is too late!

So basically the user will to make sure he/her ommits the parentheseses:

m3 = (s * m1 * m2);

Likewise:

m3 = m3 - m1 * m2;

have to be written using -=:

m3 -= (m1 * m2).lazy();

(moreover we cannot detect at compile that the two "m3' are indeed the same matrices)

The second limitation is in expressions such as:

m3 = ((m1*s1).transpose() * m2).lazy();

which is evaluated as:

m3 = (((m1*s1).transpose()).eval() * m2).lazy();

This is because the analyze of the product factors stops at the first Transpose(), and therefore it only sees an _expression_ which have to be evaluated before we can perform the product.

Another thing is that the product order is not respected. For instance:

m3 -= ((s1 * m1).conjugate() * (-m2*s2)).lazy();

is transformed into:

m3 += (( (-1) * ei_conj(s1) * (-1) * s2 ) * (m1.conjugate() * m2)).lazy()

However I don't think that's big deal, because whatsoever the floating point operations in the matrix product are already performed in an unspecified order.

That's it.

gael.