On Wed, Jul 8, 2009 at 7:14 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
Hi,
just an email to keep the list uptodate about the recent progress I made in the matrix-matrix and matrix-vector products. So far only a limited set of expressions (only block, transpose, and operator+=) was efficiently handled by our fast product kernels, for instance:
C = A*B;
C = A.transpose() * B.block(....);
C.col(i) += A * B.row(j).transpose();
This was not bad, but this does not cover all the possibilities offered by the xGEMM BLAS routine. So I recently extended the matrix product logic to be able to catch a large variety of expressions which can be mapped to a single xGEMM call (or something equivalent to xGEMM). All these changes has been done in this branch: http://bitbucket.org/bjacob/eigen2_special_matrices/ to simplify future merging. So here are some examples of expressions which each compiles to a single function call to our own xGEMM-like routine without any temporary:
m3 -= (s1 * m1 * m2).lazy();
m3 = (m1.adjoint() * (s1*m2).conjugate()).lazy
m3 -= ((-m1*s2) * s1*m2.adjoint()).lazy()
v3 += ((-m1.adjoint() * s2) * (s1 * v1.adjoint())).lazy()
legend:
m = matrix
v = vector
s = scalar
In all these expressions we automatically extract a single scalar multiple of the complete _expression_ and tag the rhs and lhs to know if we have to take the complex conjugate of their coefficients bypassing all the nested expressions.
There are still a few limitations though:
The first one is:
m3 = (s * (m1 * m2)).lazy();
which is evaluated as: "m3 = s * (m1 * m2).eval()" that requires 2 evaluations and a temporary.
This is because the rhs matrix product is automatically evaluated by the scalar product _expression_. Therefore it is not possible to catch such an _expression_ this way:
template<typename Lhs, typename Rhs>
MatrixBase::operator=(const CwiseUnaryOp<ei_scalar_multiple<Scalar>, Product<Lhs,Rhs> >& xpr);
it is too late!
So basically the user will to make sure he/her ommits the parentheseses:
m3 = (s * m1 * m2);
Likewise:
m3 = m3 - m1 * m2;
have to be written using -=:
m3 -= (m1 * m2).lazy();
(moreover we cannot detect at compile that the two "m3' are indeed the same matrices)
The second limitation is in expressions such as:
m3 = ((m1*s1).transpose() * m2).lazy();
which is evaluated as:
m3 = (((m1*s1).transpose()).eval() * m2).lazy();
This is because the analyze of the product factors stops at the first Transpose(), and therefore it only sees an _expression_ which have to be evaluated before we can perform the product.
Another thing is that the product order is not respected. For instance:
m3 -= ((s1 * m1).conjugate() * (-m2*s2)).lazy();
is transformed into:
m3 += (( (-1) * ei_conj(s1) * (-1) * s2 ) * (m1.conjugate() * m2)).lazy()
However I don't think that's big deal, because whatsoever the floating point operations in the matrix product are already performed in an unspecified order.
That's it.
gael.