[eigen] Two performance regressions from Eigen2 to Eigen3 with bisected changes |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
// Copyright 2012 Google Inc. All Rights Reserved. // Author: keir@xxxxxxxxxx (Keir Mierle) // // Compile with GCC 4.4.3-4ubuntu5: // // g++ -DV2 -O2 small_matrix_products.cc -o small_matrix_products_v2 -I src/eigen-2.0.17 // g++ -DV3 -O2 small_matrix_products.cc -o small_matrix_products_v3 -I src/eigen-3.0.4 // g++ -DV3 -O2 small_matrix_products.cc -o small_matrix_products_v3 -I wrk/eigen // // Then try a few combinations: // // for x in 2 3; do; time "./small_matrix_products_v$x" 2 9 10000000; done // for x in 2 3; do; time "./small_matrix_products_v$x" 2 3 10000000; done // for x in 2 3; do; time "./small_matrix_products_v$x" 2 3 10000000; done // // On my desktop, which is a 4-core Xeon X5550, Eigen 3 gets beaten all over the // playground by Eigen 2 in this benchmark, for all tried values of N and M: // // N M Iterations V2 (sec) V3 (sec) // 2 3 10000000 0.306 1.293 // 2 6 10000000 0.363 1.384 // 2 9 10000000 0.424 1.489 // 2 16 10000000 0.564 1.808 // 2 32 10000000 0.901 2.492 // 3 2 10000000 0.289 1.341 // 6 2 10000000 0.424 1.366 // 9 2 10000000 0.490 1.490 // 16 2 10000000 0.751 1.734 // 32 2 10000000 1.283 2.316 // 16 16 10000000 3.322 5.051 // 32 32 10000000 12.480 16.167 // 256 256 5000 0.656 0.638 First time eigen3's faster! Without // 512 512 5000 2.672 2.706 Similar times again. // // In our case we can't use fixed size matrices because the size aren't known at // compile time. The ones of particular interest to use are 2x3, 2x9, 9x9, and // their transposes. // // I also accidentally made the table above without the ".lazy()" and // ".noalias()". In that case, what I found interesting was that for the 256x256 // case, eigen2 was 2x faster. With noalias and lazy, the times are comparable. #include <cstdlib> #include <iostream> #include "Eigen/Core" using namespace Eigen; using namespace std; const int num_packed_coeffs = 1024*1024; double packed_matrices[num_packed_coeffs]; // y += A * x; // y is N x 1, A is N x M, x is M x 1 void BenchSmallProduct(int n, int m, int num_iterations) { // Our matrices are all packed inside other arrays, so alignment is random. Map<Matrix<double, Dynamic, Dynamic> > A(packed_matrices + 105, n, m); Map<Matrix<double, Dynamic, 1> > y(packed_matrices + 105 + n * m, n, 1); Map<Matrix<double, Dynamic, 1> > x(packed_matrices + 105 + n * m + n, m, 1); for (int i = 0; i < num_iterations; i++) { #if EIGEN_WORLD_VERSION == 2 //y += A * x; //y += (A * x).lazy(); y.noalias() += A * x; #elif EIGEN_WORLD_VERSION == 3 y.noalias() += A * x; #else #error "Something weird happened." #endif } // Final print to prevent this from getting optimized out. cout << "Result: " << y.transpose() << endl; cout << "World version: " << EIGEN_WORLD_VERSION << endl; } int main(int argc, char **argv) { int n = atoi(argv[1]); int m = atoi(argv[2]); int num_iterations = atoi(argv[3]); for (int i = 0; i < num_packed_coeffs; i++) { packed_matrices[i] = 2.0 * i / num_packed_coeffs; } BenchSmallProduct(n, m, num_iterations); return 0; }
Attachment:
small_matrix_products.sh
Description: Bourne shell script
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |