Re: [eigen] Blas performance on mapped matrices

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


I really cannot reproduce, on my system all the variants using Eigen3
are faster than the best I can get out of Eigen2 (I used double).

The result of my quick experiments also shown that;

A.block<size1, size2>(r, c).noalias() -= B * C;

is indeed the best you can do, and both the "noalias" and static sized
block are useful. It is in particular faster than:

A.block<size1, size2>(r, c) -= B.lazyProduct(C);

which uses an expression based product algorithm (tailored for very
small products).

gael.

-----------------------------------------

#include <iostream>
#include <Eigen/Dense>
#include <bench/BenchTimer.h>
using namespace Eigen;

typedef double Scalar;
typedef Matrix<Scalar,Dynamic,Dynamic, RowMajor> Mat;

EIGEN_DONT_INLINE void foo1(Scalar* dat1, Scalar* dat2, Mat& A, int i, int j)
{
  Block<Mat,9,9>(A,i,j).noalias() -= (Map< Matrix<Scalar,9,3,RowMajor>
>(dat1) * Map< Matrix<Scalar,3,9,RowMajor> >(dat2));
}

int main (int argc, char** argv)
{
  Matrix<Scalar,27,1> data1, data2;
  data1.setRandom();
  data2.setRandom();

  Mat A(100,100);

  BenchTimer t1;
  int tries = 10;
  int rep = 10000;


  BENCH(t1, tries, rep, foo1(data1.data(), data2.data(), A, 2,3););
  std::cerr << t1.best() << "s\n";


  return (0);
}


On Mon, Jan 9, 2012 at 2:40 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2012/1/9 Sameer Agarwal <sameeragarwal@xxxxxxxxxx>:
>> Hi Guys,
>> We are in the process of a significant code migration from eigen2 to
>> eigen3. The code uses Eigen::Map to map chunks of memory into RowMajor
>> matrices and operates on them. The primary operation is of the form
>>
>> A.block(r, c, size1, size2) -= B * C;
>>
>> A is a mapped matrix.
>> C is a mapped matrix.
>> B is an actual Eigen matrix.
>>
>> All matrices are RowMajor. For the example being considered, size1 =
>> size2 = 9. B is 9x3, and C is 3x9.
>> C and B are statically sized.
>>
>> Moving from eigen2 to eigen3 has resulting in a 30% performance
>> regression. Has something changed significantly in the way Eigen3
>> handles mapped matrices, or about the structure of matrix-matrix
>> multiplication in Eigen3 that would cause this?
>>
>> The compiler flags are all the same between our use of eigen2 and
>> eigen3. Profiling indicates that much of the time is being spent
>> inside Eigen::internal::gebp_kernel::operator.
>>
>> I understand that this is not sufficient information to reproduce this
>> problem, so I am going to try and create a minimal case which can
>> reproduce this performance regression. In the meanwhile any insight
>> into this would be useful.  Also is it possible to statically size
>> blocks like matrices?
>
> Yes, as explained on http://eigen.tuxfamily.org/dox/TutorialBlockOperations.html
> (also see Jitse's email, using that syntax).
>
> I agree with Jitse's suggestion of playing with .noalias() and with
> EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD, especially given your very
> special size where one of the two dimensions only is greater than the
> default threshold, it's very tempting to suspect that's the cause of
> your regression.
> Regarding noalias(),  see this page:
> http://eigen.tuxfamily.org/dox/TopicWritingEfficientProductExpression.html
> Cheers,
> Benoit
>
>>
>> Thank you,
>> Sameer
>>
>>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/