[eigen] Transform class performance and inconsistencies

```Hi,

```
I was curious about the performance of the Transform class in the Geometry module for different cases and did some benchmarking. While doing that, I noticed some things that I certainly would not have expected. What I wanted to find out was the performance of a Transform * Vector and Transform * Transform operation for both float and double as well as aligned and unaligned.
```
First case:
typedef Transform<float, 3, Isometry, AutoAlign> Trans;
typedef Matrix<float, 3, 1, AutoAlign> Vec;
Vec v = Vec::Zero(); Trans t = Trans::Identity();
loop 100000000 (using function call in separate compilation unit):
Vec res = t * v;

will result in 760 kcycles. The same code, but with

typedef Transform<float, 3, Isometry, DontAlign> Trans;
typedef Matrix<float, 3, 1, DontAlign> Vec;

```
takes 540 kcycles. I have no Idea why this would happen, since Vector3d is not an alignable type, right?
```
Second case:
Same as first case, but now I use
typedef Matrix<float, 4, 1, AutoAlign> Vec;
and
typedef Matrix<float, 4, 1, DontAlign> Vec;

```
and that gets me 850 for the aligned and 560 kcycles for the unaligned case. Here I would have expected to get a peformance boost since I thought Vector4f is vectorizable.
```
Third case:
Again using
typedef Matrix<float, 4, 1, AutoAlign> Vec;
typedef Matrix<float, 4, 1, DontAlign> Vec;
but now I use
Vec res = t.matrix() * v;

```
This results in 410 for the aligned and 1080 for the unaligned case. This is more what I would have expected, and I guess the performance penalty comes from the fact the the transform is tagged as an Isometry, which means it doesn have to perform the full matrix product, but only the affine part in the unaligned case. The aligned case makes up for it using vectorization. Note: the performance gain can be seen for both float and double (I leave out the number to not add to the confusion)
```
Fourth case:
This time peforming a transform * transform product
typedef Transform<float, 3, Isometry, AutoAlign> Trans;
Trans t1, t2;
Trans res = t1 * t2

gives 1850 for aligned and 1730 for unaligned.
when I do
Trans res = t1.matrix() * t2.matrix()
```
its 1150 for aligned and 3620 for unaligned. So an increase in the aligned case. Notably though is that this increase is not visible for double.
```
Fifth case:
Switching to projective transform
typedef Transform<float, 3, Projective, AutoAlign> Trans;
typedef Matrix<float, 3, 1, AutoAlign> Vec;
Vec v = Vec::Zero(); Trans t = Trans::Identity();
Vec res = t * v;

```
actually results in a compile error. This is very unexpected. to get it to compile:
```Vec res = (t * v.homogeneous()).head<3>();

```
is required. When Transform is set to Projective, the cases with the obvious alignment are faster than the unaligned ones (again not for transform * transform in the double case).
```
```
A lot of this I did not expect based on the documentation... Maybe someone can enlighten me? Test environment: GCC 4.6.3 latest eigen head. Core i7 CPU. Tests compiled with -O3
```
cheers,

Jakob

```

 Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/