Re: [eigen] Transform class performance and inconsistencies |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen <eigen@xxxxxxxxxxxxxxxxxxx>*Subject*: Re: [eigen] Transform class performance and inconsistencies*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Fri, 14 Dec 2012 20:30:14 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=NyKFjdjtu5PUDN5l0g58zSjT9WDVMemCkzTM9UCS58U=; b=XpaUgq7WOrC1+FMaSxPdwtWI4r07ZnvRm5GJ47QaM19EP1b55usP+L/AILY0yYDctx DZIM193rcLNW4DHnM+/Ny5dH4Iqt+YlbebUYUAQBgQXLEml4kqYd/d/1QyzSh+IDplBm RJIvGyZw/kZ434MyxherZJaAA88XWRU9mEIHkzaYmqEucdI5G9EHxjkwIIRr55lJWtR9 YxYeqPhMLrGtPsMgkiRaH5M5HzD9ScgcILE7r7gNlx0lu6BgA2wlwYYWz1M7XQWfZtqq tRMmijhUzQtZL1IKUoiffMbZKWQUL4DwUgGvkLzdLZ8x3SxJVDa5xmg56F3GW9DEKRNl BViA==

Hi,

it's indeed not expect to have vectorized code slower than non vectorized one.

do you have a compilable test file. The only way to be sure about what's going on here is to look at the generated assembly. Even though Eigen strives to produce simple code for the compiler sometimes some weird things happen..

cheers,

gael

On Fri, Dec 14, 2012 at 4:46 PM, Jakob Schwendner <jakob.schwendner@xxxxxxx> wrote:

Hi,

I was curious about the performance of the Transform class in the Geometry module for different cases and did some benchmarking. While doing that, I noticed some things that I certainly would not have expected. What I wanted to find out was the performance of a Transform * Vector and Transform * Transform operation for both float and double as well as aligned and unaligned..

First case:

typedef Transform<float, 3, Isometry, AutoAlign> Trans;

typedef Matrix<float, 3, 1, AutoAlign> Vec;

Vec v = Vec::Zero(); Trans t = Trans::Identity();

loop 100000000 (using function call in separate compilation unit):

Vec res = t * v;

will result in 760 kcycles. The same code, but with

typedef Transform<float, 3, Isometry, DontAlign> Trans;

typedef Matrix<float, 3, 1, DontAlign> Vec;

takes 540 kcycles. I have no Idea why this would happen, since Vector3d is not an alignable type, right?

Second case:

Same as first case, but now I use

typedef Matrix<float, 4, 1, AutoAlign> Vec;

and

typedef Matrix<float, 4, 1, DontAlign> Vec;

and that gets me 850 for the aligned and 560 kcycles for the unaligned case.. Here I would have expected to get a peformance boost since I thought Vector4f is vectorizable.

Third case:

Again using

typedef Matrix<float, 4, 1, AutoAlign> Vec;

typedef Matrix<float, 4, 1, DontAlign> Vec;

but now I use

Vec res = t.matrix() * v;

This results in 410 for the aligned and 1080 for the unaligned case. This is more what I would have expected, and I guess the performance penalty comes from the fact the the transform is tagged as an Isometry, which means it doesn have to perform the full matrix product, but only the affine part in the unaligned case. The aligned case makes up for it using vectorization. Note: the performance gain can be seen for both float and double (I leave out the number to not add to the confusion)

Fourth case:

This time peforming a transform * transform product

typedef Transform<float, 3, Isometry, AutoAlign> Trans;

Trans t1, t2;

Trans res = t1 * t2

gives 1850 for aligned and 1730 for unaligned.

when I do

Trans res = t1.matrix() * t2.matrix()

its 1150 for aligned and 3620 for unaligned. So an increase in the aligned case. Notably though is that this increase is not visible for double.

Fifth case:

Switching to projective transform

typedef Transform<float, 3, Projective, AutoAlign> Trans;

typedef Matrix<float, 3, 1, AutoAlign> Vec;

Vec v = Vec::Zero(); Trans t = Trans::Identity();

Vec res = t * v;

actually results in a compile error. This is very unexpected. to get it to compile:

Vec res = (t * v.homogeneous()).head<3>();

is required. When Transform is set to Projective, the cases with the obvious alignment are faster than the unaligned ones (again not for transform * transform in the double case).

A lot of this I did not expect based on the documentation... Maybe someone can enlighten me?

Test environment: GCC 4.6.3 latest eigen head. Core i7 CPU. Tests compiled with -O3

cheers,

Jakob

**References**:**[eigen] Transform class performance and inconsistencies***From:*Jakob Schwendner

**Messages sorted by:**[ date | thread ]- Prev by Date:
**[eigen] Transform class performance and inconsistencies** - Next by Date:
**Re: [eigen] Transform class performance and inconsistencies** - Previous by thread:
**[eigen] Transform class performance and inconsistencies** - Next by thread:
**Re: [eigen] Transform class performance and inconsistencies**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |