Re: [eigen] Rigid transformations in eigen: use of dual quaternions |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Rigid transformations in eigen: use of dual quaternions
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Sat, 12 Sep 2009 19:57:04 -0400
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=XGex7b5TfpeUDBFqIud3bmh0t9dYTlU7D5COpliheKo=; b=i9broc6vxKsDqgwGcqdMWu43+XaBuIczomJH6amHg9Q/RMmC80rgmTvm2M0PYPIUvr dYGXg7fmMKSaji2bdndSktqxIUODmVMIJyypMCGhokdArmmzwhOuBeB5oic/dEMn3ydA u3VoFgTdHn2yINdYJ25eRyH1a1Ndd867PB27k=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=E0SNbkUFnJnjPxSb83jYIkRSbIFtjPhn1mIKtt0//2cb2NYt13+8anU51MzLX0dp7z RIphW2Bz1gY5Ye3nqoXNU8AfUnWx/ChAS3WzWUB0L6xDxi+wP9cUuaEumVfUb/zXuFrq 9uMqBOiqLZkV7fScIlJc2ebz14xdY7wjJ6o4U=
2009/9/12 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
> On Sat, Sep 12, 2009 at 10:34 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> 2009/9/12 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>>
>>> hi,
>>>
>>> I don't have much right now to address all the issues raised in that
>>> thread, but at least:
>>>
>>> - here we are speaking about 4 scalars only, and so there is no
>>> advantage in using expression template (wrt performance). So here
>>> returning by value is fine.
>>
>> really? for simple enough expressions, when there are enough
>> registers, OK, but for example, without vectorization, there won't be
>> enough registers on x86 to do s1*q1+s2*q2 efficiently, right?
>
> yes, I tried for you and it's even a bit faster.
OK, I tried too on x86, the results are comparable: ETs are never
slower, but the speed difference is completely negligible. With gcc
4.4 too.
Attached file: b.cpp
Bench results:
=== 19:44:29 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=float && ./b
without ET: 1.87037 sec
with ET: 1.84362 sec
=== 19:45:00 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=double && ./b
without ET: 1.83466 sec
with ET: 1.83337 sec
Kudos to your intuition of assembly !!
Just out of curiosity, I experimented to see what's the size where
expression templates start benefit.
=== 19:50:37 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=float
-DVECTORSIZE=6 && ./b
without ET: 2.35337 sec
with ET: 2.30511 sec
=== 19:51:51 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=float
-DVECTORSIZE=7 && ./b
without ET: 3.38604 sec
with ET: 2.55124 sec
=== 19:51:59 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=double
-DVECTORSIZE=6 && ./b
without ET: 2.3045 sec
with ET: 2.30502 sec
=== 19:52:20 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=double
-DVECTORSIZE=7 && ./b
without ET: 3.36907 sec
with ET: 2.61382 sec
As you can see, on x86, on that particular expression s1*v1+s2*v2,
with both float and double, without vectorization, the ETs start
giving a large benefit at sizes >=7, and bring no benefit at sizes
<=6.
Benoit
#include <Eigen/Eigen>
#include <bench/BenchTimer.h>
using namespace std;
using namespace Eigen;
typedef SCALAR S;
#ifndef VECTORSIZE
#define VECTORSIZE 4
#endif
typedef Matrix<S,VECTORSIZE,1> V;
inline V sum(const V& x, const V& y)
{
return x+y;
}
inline V prod(const V& x, const S& s)
{
return x*s;
}
EIGEN_DONT_INLINE void test_without_ET(const V& v1, const S& s1, const V& v2, const S& s2, V& result)
{
EIGEN_ASM_COMMENT("begin without ET");
result = sum(prod(v1,s1),prod(v2,s2));
EIGEN_ASM_COMMENT("end without ET");
}
EIGEN_DONT_INLINE void test_with_ET(const V& v1, const S& s1, const V& v2, const S& s2, V& result)
{
EIGEN_ASM_COMMENT("begin with ET");
result = v1*s1 + v2*s2;
EIGEN_ASM_COMMENT("end with ET");
}
int main()
{
S s1 = ei_random<S>();
S s2 = ei_random<S>();
V v1 = V::Random();
V v2 = V::Random();
V result;
BenchTimer t_without_ET;
t_without_ET.start();
for(int i = 0; i < 100000000; i++)
{
test_without_ET(v1,s1,v2,s2,result);
}
t_without_ET.stop();
cout << "without ET: " << t_without_ET.value() << " sec" << endl;
BenchTimer t_with_ET;
t_with_ET.start();
for(int i = 0; i < 100000000; i++)
{
test_with_ET(v1,s1,v2,s2,result);
}
t_with_ET.stop();
cout << "with ET: " << t_with_ET.value() << " sec" << endl;
}