| [eigen] Two performance regressions from Eigen2 to Eigen3 with bisected changes |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
// Copyright 2012 Google Inc. All Rights Reserved.
// Author: keir@xxxxxxxxxx (Keir Mierle)
//
// Compile with GCC 4.4.3-4ubuntu5:
//
// g++ -DV2 -O2 small_matrix_products.cc -o small_matrix_products_v2 -I src/eigen-2.0.17
// g++ -DV3 -O2 small_matrix_products.cc -o small_matrix_products_v3 -I src/eigen-3.0.4
// g++ -DV3 -O2 small_matrix_products.cc -o small_matrix_products_v3 -I wrk/eigen
//
// Then try a few combinations:
//
// for x in 2 3; do; time "./small_matrix_products_v$x" 2 9 10000000; done
// for x in 2 3; do; time "./small_matrix_products_v$x" 2 3 10000000; done
// for x in 2 3; do; time "./small_matrix_products_v$x" 2 3 10000000; done
//
// On my desktop, which is a 4-core Xeon X5550, Eigen 3 gets beaten all over the
// playground by Eigen 2 in this benchmark, for all tried values of N and M:
//
// N M Iterations V2 (sec) V3 (sec)
// 2 3 10000000 0.306 1.293
// 2 6 10000000 0.363 1.384
// 2 9 10000000 0.424 1.489
// 2 16 10000000 0.564 1.808
// 2 32 10000000 0.901 2.492
// 3 2 10000000 0.289 1.341
// 6 2 10000000 0.424 1.366
// 9 2 10000000 0.490 1.490
// 16 2 10000000 0.751 1.734
// 32 2 10000000 1.283 2.316
// 16 16 10000000 3.322 5.051
// 32 32 10000000 12.480 16.167
// 256 256 5000 0.656 0.638 First time eigen3's faster! Without
// 512 512 5000 2.672 2.706 Similar times again.
//
// In our case we can't use fixed size matrices because the size aren't known at
// compile time. The ones of particular interest to use are 2x3, 2x9, 9x9, and
// their transposes.
//
// I also accidentally made the table above without the ".lazy()" and
// ".noalias()". In that case, what I found interesting was that for the 256x256
// case, eigen2 was 2x faster. With noalias and lazy, the times are comparable.
#include <cstdlib>
#include <iostream>
#include "Eigen/Core"
using namespace Eigen;
using namespace std;
const int num_packed_coeffs = 1024*1024;
double packed_matrices[num_packed_coeffs];
// y += A * x;
// y is N x 1, A is N x M, x is M x 1
void BenchSmallProduct(int n, int m, int num_iterations) {
// Our matrices are all packed inside other arrays, so alignment is random.
Map<Matrix<double, Dynamic, Dynamic> > A(packed_matrices + 105, n, m);
Map<Matrix<double, Dynamic, 1> > y(packed_matrices + 105 + n * m, n, 1);
Map<Matrix<double, Dynamic, 1> > x(packed_matrices + 105 + n * m + n, m, 1);
for (int i = 0; i < num_iterations; i++) {
#if EIGEN_WORLD_VERSION == 2
//y += A * x;
//y += (A * x).lazy();
y.noalias() += A * x;
#elif EIGEN_WORLD_VERSION == 3
y.noalias() += A * x;
#else
#error "Something weird happened."
#endif
}
// Final print to prevent this from getting optimized out.
cout << "Result: " << y.transpose() << endl;
cout << "World version: " << EIGEN_WORLD_VERSION << endl;
}
int main(int argc, char **argv) {
int n = atoi(argv[1]);
int m = atoi(argv[2]);
int num_iterations = atoi(argv[3]);
for (int i = 0; i < num_packed_coeffs; i++) {
packed_matrices[i] = 2.0 * i / num_packed_coeffs;
}
BenchSmallProduct(n, m, num_iterations);
return 0;
}
Attachment:
small_matrix_products.sh
Description: Bourne shell script
| Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |