[eigen] Transform class performance and inconsistencies |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Hi,
I was curious about the performance of the Transform class in the
Geometry module for different cases and did some benchmarking. While
doing that, I noticed some things that I certainly would not have
expected. What I wanted to find out was the performance of a Transform *
Vector and Transform * Transform operation for both float and double as
well as aligned and unaligned.
First case:
typedef Transform<float, 3, Isometry, AutoAlign> Trans;
typedef Matrix<float, 3, 1, AutoAlign> Vec;
Vec v = Vec::Zero(); Trans t = Trans::Identity();
loop 100000000 (using function call in separate compilation unit):
Vec res = t * v;
will result in 760 kcycles. The same code, but with
typedef Transform<float, 3, Isometry, DontAlign> Trans;
typedef Matrix<float, 3, 1, DontAlign> Vec;
takes 540 kcycles. I have no Idea why this would happen, since Vector3d
is not an alignable type, right?
Second case:
Same as first case, but now I use
typedef Matrix<float, 4, 1, AutoAlign> Vec;
and
typedef Matrix<float, 4, 1, DontAlign> Vec;
and that gets me 850 for the aligned and 560 kcycles for the unaligned
case. Here I would have expected to get a peformance boost since I
thought Vector4f is vectorizable.
Third case:
Again using
typedef Matrix<float, 4, 1, AutoAlign> Vec;
typedef Matrix<float, 4, 1, DontAlign> Vec;
but now I use
Vec res = t.matrix() * v;
This results in 410 for the aligned and 1080 for the unaligned case.
This is more what I would have expected, and I guess the performance
penalty comes from the fact the the transform is tagged as an Isometry,
which means it doesn have to perform the full matrix product, but only
the affine part in the unaligned case. The aligned case makes up for it
using vectorization. Note: the performance gain can be seen for both
float and double (I leave out the number to not add to the confusion)
Fourth case:
This time peforming a transform * transform product
typedef Transform<float, 3, Isometry, AutoAlign> Trans;
Trans t1, t2;
Trans res = t1 * t2
gives 1850 for aligned and 1730 for unaligned.
when I do
Trans res = t1.matrix() * t2.matrix()
its 1150 for aligned and 3620 for unaligned. So an increase in the
aligned case. Notably though is that this increase is not visible for
double.
Fifth case:
Switching to projective transform
typedef Transform<float, 3, Projective, AutoAlign> Trans;
typedef Matrix<float, 3, 1, AutoAlign> Vec;
Vec v = Vec::Zero(); Trans t = Trans::Identity();
Vec res = t * v;
actually results in a compile error. This is very unexpected. to get it
to compile:
Vec res = (t * v.homogeneous()).head<3>();
is required. When Transform is set to Projective, the cases with the
obvious alignment are faster than the unaligned ones (again not for
transform * transform in the double case).
A lot of this I did not expect based on the documentation... Maybe
someone can enlighten me?
Test environment: GCC 4.6.3 latest eigen head. Core i7 CPU. Tests
compiled with -O3
cheers,
Jakob