[eigen] Eigen2 --> Eigen3 perf regression patch.

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


I've got another performance regression from eigen2 --> eigen3; it looks similar to the one I posted in the forums which was resolved by changing the ei_product_type_selector mapping, so that's what I tried here too.


typedef Matrix<double,Dynamic,2> QMatrix;
        QMatrix Q = QMatrix::Random(DIMS,2);
        Vector2d v = Vector2d::Random(DIMS);
        VectorXd r = VectorXd::Random(DIMS);

//Then loop this
#if EIGEN3
            r.noalias() = Q * v;
#else
            r = (Q * v).lazy();
#endif

I tried the above and a variant with a transposed Matrix<double,2,Dynamic> (i.e. a row-major matrix).  DIMS was 25.


For the untransposed, column-major variant:
 - a 'v' suffix indicates vectorization was on and EIGEN_DONT_VECTORIZE not defined
 - timings are best of 5 consecutive runs.
 - all tests were done on 64-bit with quite a few optimization options; fiddling with these changed the numbers (particularly for the unvectorized variants) but not the trends. 

EigenBench2 on GCC: (-12.7643) 0.555743s
EigenBench3 on GCC: (-12.7643) 1.1912s
EigenBench3 on GCC: (-12.7643) 1.25178s (patched)

EigenBench2 on MSC: (-12.7643) 1.22194s
EigenBench3 on MSC: (-12.7643) 1.46516s
EigenBench3 on MSC: (-12.7643) 1.35213s (patched)

EigenBench2v on GCC: (-12.7643) 0.563602s
EigenBench3v on GCC: (-12.7643) 1.10728s
EigenBench3v on GCC: (-12.7643) 0.600393s (patched)

EigenBench2v on MSC: (-12.7643) 0.919594s
EigenBench3v on MSC: (-12.7643) 1.21339s
EigenBench3v on MSC: (-12.7643) 1.00615s (patched)

Without Eigen's vectorization, performance remains fairly poor, but with vectorization, after the patch performance is fairly close to Eigen2.


For the transposed, row-major variant:

EigenBench2 on GCC: t(-12.7643) 0.619455s
EigenBench3 on GCC: t(-12.7643) 1.01131s
EigenBench3 on GCC: t(-12.7643) 1.05591s (patched)

EigenBench2 on MSC: t(-12.7643) 1.22824s
EigenBench3 on MSC: t(-12.7643) 1.97307s
EigenBench3 on MSC: t(-12.7643) 1.25479s (patched)

EigenBench2v on GCC: t(-12.7643) 0.701048s
EigenBench3v on GCC: t(-12.7643) 2.49451s
EigenBench3v on GCC: t(-12.7643) 0.617562s (patched)

EigenBench2v on MSC: t(-12.7643) 0.686412s
EigenBench3v on MSC: t(-12.7643) 2.1448s
EigenBench3v on MSC: t(-12.7643) 1.07653s (patched)


This basically exhibits the same trends.


Attached:
 - testEig.cpp:  a short test case demonstrating the slowdown.  In addition to NDEBUG you should define EIGEN2 or EIGEN3 corresponding to the version you're including to select between noalias and lazy.  Defining EIGEN_DONT_VECTORIZE and/or TRANSPOSED selects the appropriate variants.
 - eigen_rev2571.patch: one word patch :-)


--eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163

Attachment: testEig.cpp
Description: Binary data

Attachment: eigen_rev2571.patch
Description: Binary data



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/