I really cannot reproduce, on my system all the variants using Eigen3 are faster than the best I can get out of Eigen2 (I used double). The result of my quick experiments also shown that; A.block<size1, size2>(r, c).noalias() -= B * C; is indeed the best you can do, and both the "noalias" and static sized block are useful. It is in particular faster than: A.block<size1, size2>(r, c) -= B.lazyProduct(C); which uses an expression based product algorithm (tailored for very small products). gael. ----------------------------------------- #include <iostream> #include <Eigen/Dense> #include <bench/BenchTimer.h> using namespace Eigen; typedef double Scalar; typedef Matrix<Scalar,Dynamic,Dynamic, RowMajor> Mat; EIGEN_DONT_INLINE void foo1(Scalar* dat1, Scalar* dat2, Mat& A, int i, int j) { Block<Mat,9,9>(A,i,j).noalias() -= (Map< Matrix<Scalar,9,3,RowMajor> >(dat1) * Map< Matrix<Scalar,3,9,RowMajor> >(dat2)); } int main (int argc, char** argv) { Matrix<Scalar,27,1> data1, data2; data1.setRandom(); data2.setRandom(); Mat A(100,100); BenchTimer t1; int tries = 10; int rep = 10000; BENCH(t1, tries, rep, foo1(data1.data(), data2.data(), A, 2,3);); std::cerr << t1.best() << "s\n"; return (0); } On Mon, Jan 9, 2012 at 2:40 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote: > 2012/1/9 Sameer Agarwal <sameeragarwal@xxxxxxxxxx>: >> Hi Guys, >> We are in the process of a significant code migration from eigen2 to >> eigen3. The code uses Eigen::Map to map chunks of memory into RowMajor >> matrices and operates on them. The primary operation is of the form >> >> A.block(r, c, size1, size2) -= B * C; >> >> A is a mapped matrix. >> C is a mapped matrix. >> B is an actual Eigen matrix. >> >> All matrices are RowMajor. For the example being considered, size1 = >> size2 = 9. B is 9x3, and C is 3x9. >> C and B are statically sized. >> >> Moving from eigen2 to eigen3 has resulting in a 30% performance >> regression. Has something changed significantly in the way Eigen3 >> handles mapped matrices, or about the structure of matrix-matrix >> multiplication in Eigen3 that would cause this? >> >> The compiler flags are all the same between our use of eigen2 and >> eigen3. Profiling indicates that much of the time is being spent >> inside Eigen::internal::gebp_kernel::operator. >> >> I understand that this is not sufficient information to reproduce this >> problem, so I am going to try and create a minimal case which can >> reproduce this performance regression. In the meanwhile any insight >> into this would be useful. Also is it possible to statically size >> blocks like matrices? > > Yes, as explained on http://eigen.tuxfamily.org/dox/TutorialBlockOperations.html > (also see Jitse's email, using that syntax). > > I agree with Jitse's suggestion of playing with .noalias() and with > EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD, especially given your very > special size where one of the two dimensions only is greater than the > default threshold, it's very tempting to suspect that's the cause of > your regression. > Regarding noalias(), see this page: > http://eigen.tuxfamily.org/dox/TopicWritingEfficientProductExpression.html > Cheers, > Benoit > >> >> Thank you, >> Sameer >> >> > >

