Re: [eigen] cache-friendly matrix inverse

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [eigen] cache-friendly matrix inverse
From: Jitse Niesen <jitse@xxxxxxxxxxxxxxxxx>
Date: Thu, 14 May 2009 14:29:40 +0100 (BST)

On Thu, 14 May 2009, Benoit Jacob wrote:

So, we agree that multiplying by the inverse will be faster for small
enough sizes, and that LU solving will be faster for large enough
sizes, and the only question is what the crossover point is.

The number of floating point operations is exactly the same for bothstrategies. But after thinking it over, I have to admit that that's notthe whole story. The index arithmetic is more difficult when doing an LUsolve. You also have to undo the permutation. And then you indeed havehardware issues, as you explain:

I don't have any idea about that to be honest, and since we're talking
about small sizes here it'll even depend on e.g. whether the size of a
column (assuming column-major storage) is a multiple of the SIMD
vector size, etc... so one can't give a very general answer. About
testing with Eigen, well since we're talking about small sizes
unrolling will be very important, currently the matrix-vector product
is unrolled but not the triangular solver so i don't think this will
produce useful numbers.

I tested anyway. Indeed the inverse is massively faster for smallmatrices. But it's also massively faster for small dynamic-size matrices(there is no loop unrolling in this case, is there?). And it's stillclearly faster for 100 x 100 matrices.

One important caveat: I did not test for accuracy. Using the LUdecomposition is supposed to be more stable.

I can see two possibilities. Either the perceived wisdom in numericalanalysis is wrong. Or there is a lot of scope to optimize LU solve.

The timings are below. I attach the code. I compiled with GCC 4.3.3 with-O2 -DNDEBUG -msse2 -msse3 -mssse3 on an Intel Core2 computer.


Matrix3f   , 50000000 solves. inverse: 0.020 s   LU: 7.010 s   PLU: 3.950 s
Matrix3d   , 50000000 solves. inverse: 0.060 s   LU: 7.500 s   PLU: 5.650 s
MatrixXf(3), 50000000 solves. inverse: 1.370 s   LU: 15.580 s   PLU: 6.320 s
MatrixXd(3), 50000000 solves. inverse: 1.500 s   LU: 16.130 s   PLU: 6.570 s
Matrix4f   , 30000000 solves. inverse: 0.070 s   LU: 5.910 s   PLU: 4.720 s
Matrix4d   , 30000000 solves. inverse: 0.140 s   LU: 6.240 s   PLU: 4.310 s
MatrixXf(4), 30000000 solves. inverse: 1.090 s   LU: 10.900 s   PLU: 5.450 s
MatrixXd(4), 30000000 solves. inverse: 0.980 s   LU: 10.980 s   PLU: 5.420 s
MatrixXf(100), 200000 solves. inverse: 0.810 s   LU: 2.360 s   PLU: 2.070 s
MatrixXd(100), 200000 solves. inverse: 1.750 s   LU: 3.370 s   PLU: 2.970 s

One final note. PartialLU<Matrix3f>::solve() produces the followingwarning (twice):


..../Eigen/src/Core/SolveTriangular.h: In member function void Eigen::MatrixBase<Derived>::solveTriangularInPlace(const Eigen::MatrixBase<OtherDerived>&) const [with OtherDerived = Eigen::Matrix<float, 3, 1, 0, 3, 1>, Derived = Eigen::Flagged<Eigen::Matrix<float, 3, 3, 0, 3, 3>, 1024u, 0u>]:
..../Eigen/src/Core/SolveTriangular.h:189: warning: array subscript is above array bounds

I did not investigate further.

Cheers,
Jitse

#include <boost/timer.hpp>
#include <Eigen/Array>
#include <Eigen/Core>
#include <Eigen/LU>
#include <string>

// import most common Eigen types 
USING_PART_OF_NAMESPACE_EIGEN

template <typename TM, typename TV>
void benchmark(int repeats, const char* name, int size = 0)
{
  TM A;
  TV b, x;
  if (size == 0) {
    // fixed-size matrices
    A = TM::Random();
    b = TV::Random();
  } else {
    // dynamic-size matrices
    A = TM::Random(size,size);
    b = TV::Random(size);
  }

  TM Ainv = A.inverse();
  Eigen::LU<TM> lu(A);
  Eigen::PartialLU<TM> plu(A);

  float result;
  boost::timer tm;

  tm.restart();
  for (int i=0; i<repeats; i++)
    x = (Ainv*b).lazy();
  result = tm.elapsed();
  printf("%s, %d solves. inverse: %.3f s", name, repeats, result);
  
  tm.restart();
  for (int i=0; i<repeats; i++)
    lu.solve(b, &x);
  result = tm.elapsed();
  printf("   LU: %.3f s", result);

  tm.restart();
  for (int i=0; i<repeats; i++)
    plu.solve(b, &x);
  result = tm.elapsed();
  printf("   PLU: %.3f s\n", result);

}

int main(int, char *[])
{
  benchmark<Matrix3f, Vector3f>(50000000, "Matrix3f   ");
  benchmark<Matrix3d, Vector3d>(50000000, "Matrix3d   ");
  benchmark<MatrixXf, VectorXf>(50000000, "MatrixXf(3)", 3);
  benchmark<MatrixXd, VectorXd>(50000000, "MatrixXd(3)", 3);
  benchmark<Matrix4f, Vector4f>(30000000, "Matrix4f   ");
  benchmark<Matrix4d, Vector4d>(30000000, "Matrix4d   ");
  benchmark<MatrixXf, VectorXf>(30000000, "MatrixXf(4)", 4);
  benchmark<MatrixXd, VectorXd>(30000000, "MatrixXd(4)", 4);
  benchmark<MatrixXf, VectorXf>(200000, "MatrixXf(100)", 100);
  benchmark<MatrixXd, VectorXd>(200000, "MatrixXd(100)", 100);
}

Follow-Ups:
- Re: [eigen] cache-friendly matrix inverse
  - From: Benoit Jacob

References:
- [eigen] cache-friendly matrix inverse
  - From: Benoit Jacob
- Re: [eigen] cache-friendly matrix inverse
  - From: Robert Lupton the Good
- Re: [eigen] cache-friendly matrix inverse
  - From: Christian Mayer
- Re: [eigen] cache-friendly matrix inverse
  - From: Robert Lupton the Good
- Re: [eigen] cache-friendly matrix inverse
  - From: Christian Mayer
- Re: [eigen] cache-friendly matrix inverse
  - From: Benoit Jacob
- Re: [eigen] cache-friendly matrix inverse
  - From: Christian Mayer
- Re: [eigen] cache-friendly matrix inverse
  - From: Benoit Jacob
- Re: [eigen] cache-friendly matrix inverse
  - From: Benoit Jacob

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] Migration to mercurial : final tests
Next by Date: Re: [eigen] cache-friendly matrix inverse
Previous by thread: Re: [eigen] cache-friendly matrix inverse
Next by thread: Re: [eigen] cache-friendly matrix inverse

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/