Re: [eigen] Transform class performance and inconsistencies |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [eigen] Transform class performance and inconsistencies
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Mon, 17 Dec 2012 11:35:57 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=R2VrVgv6G38ZtNA8Os3oBuTam6MUkLQ7rWLXFVADrcU=; b=i5hMh4+rlcD+Hnq/P0hTvx7ggJOy6UzxxgJsH5dsmSeruEIe6DR1CW/W7kptFdn3RH yElyddjzHU/hTznUlLuslRC78CckiTkxQUzVpTRVDBtQNvPLjU7HGrnL2VICfa1JSqOg 4A8m95st9kRY4Au6gn0Koo8zQOTq0JwMaLKiFpChnY7RrKEwsAs1u9ARXuknsx2hRV+G tnRb8ATrH6KVTUTMpGSQXpDxPgm1CDuCAS+FFhPh17jW0kZ9e+Bod22TbNF3YZGVOWeM FvI0AY5sBvdTL1fxPzLAF/LZsqmOoI9NQ83k7QDZeKW4njrwPHtsIns0V0R6h7TS9GFj 4gKQ==
ok, thank you.I implemented my own too, and I get the following results ('u' stands for unaligned):
Float:
Vec3 /Iso3 : 0.000645647
Vec3u/Iso3u: 0.000645586
Vec4 /Iso3 : 0.00126092
Vec4u/Iso3u: 0.000688231
Vec4 /Mat4 : 0.000391356
Vec4u/Mat4u: 0.000981292
Double:
Vec3 /Iso3 : 0.000649383
Vec3u/Iso3u: 0.000649382
Vec4 /Iso3 : 0.00126092
Vec4u/Iso3u: 0.000689219
Vec4 /Mat4 : 0.000475642
Vec4u/Mat4u: 0.000979767
obtained with gcc 4.6, -O2 -DNDEBUG.
The only strange result is the case "Vec4 /Iso3" which is unexpectedly twice as slow as "Vec4u/Iso3u". Looking at the assembly, the only difference between the two versions is that in the "Vec4 /Iso3" case the result is first copied into a temporary (one coefficient at a time), that is then copied using 2 movaps to the true result location. Clearly, the compiler should be able to remove this extra copy as it is done with the "Vec4u/Iso3u" case, but here it does not, I do not know why.
cheers,
gael
#include <bench/BenchTimer.h>
#include <iostream>
#include <Eigen/Geometry>
using namespace Eigen;
template<typename R, typename T, typename V>
EIGEN_DONT_INLINE void kernel(R& res, const T& t, const V& v)
{
EIGEN_ASM_COMMENT("LOOKHERE");
res = t * v;
}
template<typename R, typename T, typename V=R> void bench(const std::string& msg)
{
R res;
T A;
V v;
v.setRandom();
A.setIdentity();
BenchTimer t;
BENCH(t, 100, 100000, kernel(res,A,v));
std::cout << msg << ": " << t.best() << "\n";
}
template<typename S> void benchall()
{
typedef Matrix<S, 3, 1> Vec3;
typedef Matrix<S, 3, 1, DontAlign> Vec3u;
typedef Matrix<S, 4, 1> Vec4;
typedef Matrix<S, 4, 1, DontAlign> Vec4u;
typedef Transform<S, 3, Isometry> Iso3;
typedef Transform<S, 3, Isometry, DontAlign> Iso3u;
typedef Matrix<S, 4, 4> Mat4;
typedef Matrix<S, 4, 4, DontAlign> Mat4u;
// bench<Vec3,Iso3> ("Vec3 /Iso3 ");
// bench<Vec3u,Iso3u>("Vec3u/Iso3u");
bench<Vec4,Iso3> ("Vec4 /Iso3 ");
bench<Vec4u,Iso3u>("Vec4u/Iso3u");
// bench<Vec4,Mat4> ("Vec4 /Mat4 ");
// bench<Vec4u,Mat4u>("Vec4u/Mat4u");
}
int main()
{
std::cout << "Float:\n";
benchall<float>();
std::cout << "\nDouble:\n";
// benchall<double>();
return 0;
}