[eigen] recent improvements in the products |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: [eigen] recent improvements in the products
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Wed, 8 Jul 2009 19:14:59 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=y3GVoQ7Ig38nCAk7QoghHX2QM/D4fF0lZHo0zQrMgjc=; b=cc0AaQN45nzTzMTYGOG857rZrGBaA4qGauRZukBuNcMLTvtQNtjikV2UdnnKOtYlZ9 u4H3EkswDoi2VC/2WSM9Vz95zZ6BCak75JGyHmj7C11Rcu4+hu54LofQQCvodIV+Xica QKLz8rditQGe5BhXTAXwTDf4eJeLwWR9MOST4=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=YZaSjptxYeo1P7fMeVQigCtrnD498fzyVQb/cgocP2m7iUNjirEd99PTT8h7y0w4Ox 0XBsdNP34lOEJ2/4hISfxYuTMMwTv8iXT1COcEr3FjBb94PeJacsvUPXjDbB8Q1GlZlB KeoUzV+rKuABgaIU/mTO0m3wm8LPPItpu+be8=
Hi,
just an email to keep the list uptodate about the recent progress I made in the matrix-matrix and matrix-vector products. So far only a limited set of expressions (only block, transpose, and operator+=) was efficiently handled by our fast product kernels, for instance:
C = A*B;
C = A.transpose() * B.block(....);
C.col(i) += A * B.row(j).transpose();
This was not bad, but this does not cover all the possibilities offered by the xGEMM BLAS routine. So I recently extended the matrix product logic to be able to catch a large variety of expressions which can be mapped to a single xGEMM call (or something equivalent to xGEMM). All these changes has been done in this branch: http://bitbucket.org/bjacob/eigen2_special_matrices/ to simplify future merging. So here are some examples of expressions which each compiles to a single function call to our own xGEMM-like routine without any temporary:
m3 -= (s1 * m1 * m2).lazy();
m3 = (m1.adjoint() * (s1*m2).conjugate()).lazy
m3 -= ((-m1*s2) * s1*m2.adjoint()).lazy()
v3 += ((-m1.adjoint() * s2) * (s1 * v1.adjoint())).lazy()
legend:
m = matrix
v = vector
s = scalar
In all these expressions we automatically extract a single scalar multiple of the complete _expression_ and tag the rhs and lhs to know if we have to take the complex conjugate of their coefficients bypassing all the nested expressions.
There are still a few limitations though:
The first one is:
m3 = (s * (m1 * m2)).lazy();
which is evaluated as: "m3 = s * (m1 * m2).eval()" that requires 2 evaluations and a temporary.
This is because the rhs matrix product is automatically evaluated by the scalar product _expression_. Therefore it is not possible to catch such an _expression_ this way:
template<typename Lhs, typename Rhs>
MatrixBase::operator=(const CwiseUnaryOp<ei_scalar_multiple<Scalar>, Product<Lhs,Rhs> >& xpr);
it is too late!
So basically the user will to make sure he/her ommits the parentheseses:
m3 = (s * m1 * m2);
Likewise:
m3 = m3 - m1 * m2;
have to be written using -=:
m3 -= (m1 * m2).lazy();
(moreover we cannot detect at compile that the two "m3' are indeed the same matrices)
The second limitation is in expressions such as:
m3 = ((m1*s1).transpose() * m2).lazy();
which is evaluated as:
m3 = (((m1*s1).transpose()).eval() * m2).lazy();
This is because the analyze of the product factors stops at the first Transpose(), and therefore it only sees an _expression_ which have to be evaluated before we can perform the product.
Another thing is that the product order is not respected. For instance:
m3 -= ((s1 * m1).conjugate() * (-m2*s2)).lazy();
is transformed into:
m3 += (( (-1) * ei_conj(s1) * (-1) * s2 ) * (m1.conjugate() * m2)).lazy()
However I don't think that's big deal, because whatsoever the floating point operations in the matrix product are already performed in an unspecified order.
That's it.
gael.