Hi Guys,
Here are some more numbers on my macbook(i7 2.3Ghz) , its not completely quiet but I think these numbers all show a steady trend so they can be trusted.
First some explanation. old refers to commit 222ca20 in the ceres
tree. New refers to HEAD. We made a number of changes between two versions. Some of them are just the way we are using eigen, managing memory etc, and the other have to do with optionally using new BLAS routines instead of eigen. The suffix, eigen/blas refers to whether eigen or our custom blas routines are being used for the small block gemm and gemv operations in schur eliminator.
I tested both Clang 4.2 and GCC 4.2.1 with problems from the UW BAL dataset. I am only reporting the time spent in the Ceres SPARSE_SCHUR linear solver.
The first thing to note is that for both compilers there is significant improvement in performance from old-eigen to new-eigen. This is fairly substantial and true for both compilers. But Clang seems to be generally a bit worse than GCC.
There does not seem to be much difference between Eigen/Custom BLAS with GCC, except for two problems, but for Clang (Despite the improved inlining flags), performance improves pretty consistently.
Problem 1. 16 cameras 22106 points
old-eigen new-eigen new-blas
gcc 2.1 1.0 1.0
clang 2.1 1.0 1.0
Problem 2. 49 cameras 7776 points
old-eigen new-eigen new-blas
gcc 5.0 2.6 2.6
clang 5.0 2.6 2.5
Problem 3. 245 cameras 198739 points
old-eigen new-eigen new-blas
gcc 47 31 31.5
clang 50.5 32.3 30.2
Problem 4. 257 cameras 65132 points
old-eigen new-eigen new-blas
gcc 15 8.2 8.3
clang 15 8.3 7.6
Problem 5. 356 cameras 226730 points
old-eigen new-eigen new-blas
gcc 54 36 36
clang 56 37 34
Problem 6. 744 cameras 543562 points
old-eigen new-eigen new-blas
gcc 199 155 151
clang 210 156 145
Problem 7. 1031 cameras 110968 points
old-eigen new-eigen new-blas
gcc 57 42 43
clang 57 42 40
The thing which is common to all the problems above is that the matrices in question are all statically sized. Another problem we are interested in involves semi-statically sized matrices, and there the performance improvements are much more dramatic.
8. 8 Cameras 1 Shared calibration 2190 points
old-eigen new-eigen new-blas
gcc 0.17 0..15 0.04
clang 0.16 0.13 0.03
In summary, for fixed sized matrices Clang and Eigen have some talking to do. For dynamic/semi-static matrices it seems (based on one example) for both GCC and Clang custom routines beat Eigen.
Sameer