[eigen] Slow matrix-matrix multiply |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Hi Guys,
We use Eigen fairly heavily in Ceres. In particular we use it for doing small dense block operations. One of the key algorithms in Ceres is its Schur Eliminator, which implements the schur complement trick for bundle adjustment problems. So the performance of the eliminator is crucial to the overall bundle adjustment performance in Ceres.
We replaced one of the more frequently called eigen expressions with a simple three loop GEMM implementation (with some template sizing tricks) and it instantly gives us >10% speedups. Doing the same to two other GEMM expressions givs us an overall 30% speedup. The sizes of the matrices involved is fairly small; in our benchmark, our matrices are of sizes 6x3, 3x3, 3x6, and are sized at compile time.
Reproducing this requires a bit of work:
You are going to need the latest Ceres Solver version from
git clone https://ceres-solver.googlesource.com/
and then the patch (https://ceres-solver-review.googlesource.com/#/c/2851/)
git fetch https://ceres-solver.googlesource.com/ceres-solver refs/changes/51/2851/1 && git format-patch -1 --stdout FETCH_HEAD
With this, if you have a directory called build-release inside the main source directory you should be able to replicate the execution log at the bottom of this email (the numbers on my macbook pro with eigen 3..1.2 + clang).
You will notice that the over all solver time goes from 1.96 seconds to 1.8 seconds. In particular the time spent in the “Eliminate” step goes from 0.2 seconds to 0.17 seconds. I have kept the patch simple and short to illustrate the issue, but eliminating eigen in two other places give us an improvement of over 30%. This is a big deal for us.
We are curious to understand the reason for this, and see if Eigen can be updated to handle this, or if we need to have our own custom routines for doing these operations.
Sameer
======Execution log
[minimal] build-release: cmake ../
[minimal] build-release: make bundle_adjuster
[minimal] build-release: ./bin/bundle_adjuster --input=../data/problem-16-22106-pre.txt -linear_solver dense_schur -ordering user -logtostderr -v=3
<snip>
1: f: 1.980525e+05 d: 3.99e+06 g: 5.34e+06 h: 2.40e+03 rho: 9.60e-01 mu: 3.00e+04 li: 1 it: 3.80e-01 tt: 5.26e-01
I0401 17:54:10.712560 2032845184 wall_time.cc:74]
SchurComplementSolver::Solve
Delta Cumulative
Setup : 0.00003 0.00003
Eliminate : 0.19949 0.19952
ReducedSolve : 0.00030 0.19982
BackSubstitute : 0.03203 0.23186
Total : 0.00001 0.23186
<snip>
Ceres Solver Report
-------------------
<snip>
Time (in seconds):
Preprocessor 0.033
Residual Evaluations 0.051
Jacobian Evaluations 0.579
Linear Solver 1.173
Minimizer 1.892
Postprocessor 0.002
Total 1.958
=================================================================
[minimal] build-release: cmake ../ -DCUSTOM_GEMM=ON
[minimal] build-release: make bundle_adjuster
[minimal] build-release: ./bin/bundle_adjuster --input=../data/problem-16-22106-pre.txt -linear_solver dense_schur -ordering user -logtostderr -v=3
<snip>
1: f: 1.980525e+05 d: 3.99e+06 g: 5.34e+06 h: 2.40e+03 rho: 9.60e-01 mu: 3.00e+04 li: 1 it: 3.61e-01 tt: 5.08e-01
I0401 17:55:15.289700 2032845184 wall_time.cc:74]
SchurComplementSolver::Solve
Delta Cumulative
Setup : 0.00003 0.00003
Eliminate : 0.17413 0.17416
ReducedSolve : 0.00030 0.17446
BackSubstitute : 0.03074 0.20521
Total : 0.00001 0.20521
<snip>
Ceres Solver Report
-------------------
<snip>
Time (in seconds):
Preprocessor 0.036
Residual Evaluations 0.050
Jacobian Evaluations 0.562
Linear Solver 1.073
Minimizer 1.773
Postprocessor 0.002
Total 1.842
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |