[eigen] Slow matrix-matrix multiply

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Guys,

We use Eigen fairly heavily in Ceres. In particular we use it for doing small dense block operations. One of the key algorithms in Ceres is its Schur Eliminator, which implements the schur complement trick for bundle adjustment problems. So the performance of the eliminator is crucial to the overall bundle adjustment performance in Ceres.


We replaced one of the more frequently called eigen expressions with a simple three loop GEMM implementation (with some template sizing tricks) and it instantly gives us >10% speedups. Doing the same to two other GEMM expressions givs us an overall 30% speedup. The sizes of the matrices involved is fairly small; in our benchmark, our matrices are of sizes 6x3, 3x3, 3x6, and are sized at compile time.


Reproducing this requires a bit of work:


You are going to need the latest Ceres Solver version from


git clone https://ceres-solver.googlesource.com/


and then the patch (https://ceres-solver-review.googlesource.com/#/c/2851/)


git fetch https://ceres-solver.googlesource.com/ceres-solver refs/changes/51/2851/1 && git format-patch -1 --stdout FETCH_HEAD


With this, if you have a directory called build-release inside the main source directory you should be able to replicate the execution log at the bottom of this email (the numbers on my macbook pro with eigen 3..1.2 + clang).


You will notice that the over all solver time goes from 1.96 seconds to 1.8 seconds. In particular the time spent in the “Eliminate” step goes from 0.2 seconds to 0.17 seconds.  I have kept the patch simple and short to illustrate the issue, but eliminating eigen in two other places give us an improvement of over 30%. This is a big deal for us.


We are curious to understand the reason for this, and see if Eigen can be updated to handle this, or if we need to have our own custom routines for doing these operations.


Sameer



======Execution log


[minimal] build-release: cmake ../

[minimal] build-release: make bundle_adjuster

[minimal] build-release: ./bin/bundle_adjuster --input=../data/problem-16-22106-pre.txt -linear_solver dense_schur -ordering user -logtostderr -v=3

<snip>


  1: f: 1.980525e+05 d: 3.99e+06 g: 5.34e+06 h: 2.40e+03 rho: 9.60e-01 mu: 3.00e+04 li:  1 it: 3.80e-01 tt: 5.26e-01

I0401 17:54:10.712560 2032845184 wall_time.cc:74]


SchurComplementSolver::Solve

                                  Delta   Cumulative

                     Setup :    0.00003      0.00003

                 Eliminate :    0.19949      0.19952

              ReducedSolve :    0.00030      0.19982

            BackSubstitute :    0.03203      0.23186

                     Total :    0.00001      0.23186

<snip>


Ceres Solver Report

-------------------


<snip>


Time (in seconds):

Preprocessor                            0.033


 Residual Evaluations                  0.051

 Jacobian Evaluations                  0.579

 Linear Solver                         1.173

Minimizer                               1.892


Postprocessor                           0.002

Total                                   1.958


=================================================================



[minimal] build-release: cmake ../ -DCUSTOM_GEMM=ON

[minimal] build-release: make bundle_adjuster

[minimal] build-release: ./bin/bundle_adjuster --input=../data/problem-16-22106-pre.txt -linear_solver dense_schur -ordering user -logtostderr -v=3

<snip>


  1: f: 1.980525e+05 d: 3.99e+06 g: 5.34e+06 h: 2.40e+03 rho: 9.60e-01 mu: 3.00e+04 li:  1 it: 3.61e-01 tt: 5.08e-01

I0401 17:55:15.289700 2032845184 wall_time.cc:74]


SchurComplementSolver::Solve

                                  Delta   Cumulative

                     Setup :    0.00003      0.00003

                 Eliminate :    0.17413      0.17416

              ReducedSolve :    0.00030      0.17446

            BackSubstitute :    0.03074      0.20521

                     Total :    0.00001      0.20521


<snip>


Ceres Solver Report

-------------------

<snip>


Time (in seconds):

Preprocessor                            0.036


 Residual Evaluations                  0.050

 Jacobian Evaluations                  0.562

 Linear Solver                         1.073

Minimizer                               1.773


Postprocessor                           0.002

Total                                   1.842




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/