Re: [eigen] Blas performance on mapped matrices

[ Thread Index | Date Index | More Archives ]

I really cannot reproduce, on my system all the variants using Eigen3
are faster than the best I can get out of Eigen2 (I used double).

The result of my quick experiments also shown that;

A.block<size1, size2>(r, c).noalias() -= B * C;

is indeed the best you can do, and both the "noalias" and static sized
block are useful. It is in particular faster than:

A.block<size1, size2>(r, c) -= B.lazyProduct(C);

which uses an expression based product algorithm (tailored for very
small products).



#include <iostream>
#include <Eigen/Dense>
#include <bench/BenchTimer.h>
using namespace Eigen;

typedef double Scalar;
typedef Matrix<Scalar,Dynamic,Dynamic, RowMajor> Mat;

EIGEN_DONT_INLINE void foo1(Scalar* dat1, Scalar* dat2, Mat& A, int i, int j)
  Block<Mat,9,9>(A,i,j).noalias() -= (Map< Matrix<Scalar,9,3,RowMajor>
>(dat1) * Map< Matrix<Scalar,3,9,RowMajor> >(dat2));

int main (int argc, char** argv)
  Matrix<Scalar,27,1> data1, data2;

  Mat A(100,100);

  BenchTimer t1;
  int tries = 10;
  int rep = 10000;

  BENCH(t1, tries, rep, foo1(,, A, 2,3););
  std::cerr << << "s\n";

  return (0);

On Mon, Jan 9, 2012 at 2:40 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2012/1/9 Sameer Agarwal <sameeragarwal@xxxxxxxxxx>:
>> Hi Guys,
>> We are in the process of a significant code migration from eigen2 to
>> eigen3. The code uses Eigen::Map to map chunks of memory into RowMajor
>> matrices and operates on them. The primary operation is of the form
>> A.block(r, c, size1, size2) -= B * C;
>> A is a mapped matrix.
>> C is a mapped matrix.
>> B is an actual Eigen matrix.
>> All matrices are RowMajor. For the example being considered, size1 =
>> size2 = 9. B is 9x3, and C is 3x9.
>> C and B are statically sized.
>> Moving from eigen2 to eigen3 has resulting in a 30% performance
>> regression. Has something changed significantly in the way Eigen3
>> handles mapped matrices, or about the structure of matrix-matrix
>> multiplication in Eigen3 that would cause this?
>> The compiler flags are all the same between our use of eigen2 and
>> eigen3. Profiling indicates that much of the time is being spent
>> inside Eigen::internal::gebp_kernel::operator.
>> I understand that this is not sufficient information to reproduce this
>> problem, so I am going to try and create a minimal case which can
>> reproduce this performance regression. In the meanwhile any insight
>> into this would be useful.  Also is it possible to statically size
>> blocks like matrices?
> Yes, as explained on
> (also see Jitse's email, using that syntax).
> I agree with Jitse's suggestion of playing with .noalias() and with
> special size where one of the two dimensions only is greater than the
> default threshold, it's very tempting to suspect that's the cause of
> your regression.
> Regarding noalias(),  see this page:
> Cheers,
> Benoit
>> Thank you,
>> Sameer

Mail converted by MHonArc 2.6.19+