[eigen] Eigen2 --> Eigen3 perf regression patch. |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: [eigen] Eigen2 --> Eigen3 perf regression patch.
- From: Eamon Nerbonne <emn13@xxxxxxxxxxxx>
- Date: Tue, 2 Mar 2010 12:18:12 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=dImyaioqVYc49vnU7gerJpgbTb92NzE0gB+Asnm4BsA=; b=ce33273CUhuVEKSB/lC2UBCB+mdmuyEqD1UaCa6gxhiKRsxGWRh6P9jd/o/qqm9LIT w0XUx7gb+7vNm6GkAuHH+Ww9LQUsrNsRdn0UHxzhoKv96ASGVgZ2616Lb/hQ7tRGjKRH gIqnLqDcq46c8OkGlETGb9bqKRPJRpD46ariA=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=blzASmzJQthICgsdl13edoh0fi04aeYmGfRamBCgmQTKI3OaNLXssD9c3ovL8m/ztm Y4z76fN6Bsk4MWkCrGPJ7p4MR3jDKGOGCv/TG0zWYCAQt8FN5OeVxRe620mZ0m4LEwyx yTGZiCigABF3C+GBZNcfG4RbIgcifKFHwZnEE=
I've got another performance regression from eigen2 --> eigen3; it looks similar to the one I posted in the forums which was resolved by changing the ei_product_type_selector mapping, so that's what I tried here too.
typedef Matrix<double,Dynamic,2> QMatrix;
QMatrix Q = QMatrix::Random(DIMS,2);
Vector2d v = Vector2d::Random(DIMS);
VectorXd r = VectorXd::Random(DIMS);
//Then loop this
#if EIGEN3
r.noalias() = Q * v;
#else
r = (Q * v).lazy();
#endif
I tried the above and a variant with a transposed Matrix<double,2,Dynamic> (i.e. a row-major matrix). DIMS was 25.
For the untransposed, column-major variant:
- a 'v' suffix indicates vectorization was on and EIGEN_DONT_VECTORIZE not defined
- timings are best of 5 consecutive runs.
- all tests were done on 64-bit with quite a few optimization options; fiddling with these changed the numbers (particularly for the unvectorized variants) but not the trends.
EigenBench2 on GCC: (-12.7643) 0.555743s
EigenBench3 on GCC: (-12.7643) 1.1912s
EigenBench3 on GCC: (-12.7643) 1.25178s (patched)
EigenBench2 on MSC: (-12.7643) 1.22194s
EigenBench3 on MSC: (-12.7643) 1.46516s
EigenBench3 on MSC: (-12.7643) 1.35213s (patched)
EigenBench2v on GCC: (-12.7643) 0.563602s
EigenBench3v on GCC: (-12.7643) 1.10728s
EigenBench3v on GCC: (-12.7643) 0.600393s (patched)
EigenBench2v on MSC: (-12.7643) 0.919594s
EigenBench3v on MSC: (-12.7643) 1.21339s
EigenBench3v on MSC: (-12.7643) 1.00615s (patched)
Without Eigen's vectorization, performance remains fairly poor, but with vectorization, after the patch performance is fairly close to Eigen2.
For the transposed, row-major variant:
EigenBench2 on GCC: t(-12.7643) 0.619455s
EigenBench3 on GCC: t(-12.7643) 1.01131s
EigenBench3 on GCC: t(-12.7643) 1.05591s (patched)
EigenBench2 on MSC: t(-12.7643) 1.22824s
EigenBench3 on MSC: t(-12.7643) 1.97307s
EigenBench3 on MSC: t(-12.7643) 1.25479s (patched)
EigenBench2v on GCC: t(-12.7643) 0.701048s
EigenBench3v on GCC: t(-12.7643) 2.49451s
EigenBench3v on GCC: t(-12.7643) 0.617562s (patched)
EigenBench2v on MSC: t(-12.7643) 0.686412s
EigenBench3v on MSC: t(-12.7643) 2.1448s
EigenBench3v on MSC: t(-12.7643) 1.07653s (patched)
This basically exhibits the same trends.
Attached:
- testEig.cpp: a short test case demonstrating the slowdown. In addition to NDEBUG you should define EIGEN2 or EIGEN3 corresponding to the version you're including to select between noalias and lazy. Defining EIGEN_DONT_VECTORIZE and/or TRANSPOSED selects the appropriate variants.
- eigen_rev2571.patch: one word patch :-)
--eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163
Attachment:
testEig.cpp
Description: Binary data
Attachment:
eigen_rev2571.patch
Description: Binary data