Re: [eigen] Rigid transformations in eigen: use of dual quaternions

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2009/9/12 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
> On Sat, Sep 12, 2009 at 10:34 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> 2009/9/12 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:
>>>
>>> hi,
>>>
>>> I don't have much right now to address all the issues raised in that
>>> thread, but at least:
>>>
>>> - here we are speaking about 4 scalars only, and so there is no
>>> advantage in using expression template (wrt performance). So here
>>> returning by value is fine.
>>
>> really? for simple enough expressions, when there are enough
>> registers, OK, but for example, without vectorization, there won't be
>> enough registers on x86 to do s1*q1+s2*q2 efficiently, right?
>
> yes, I tried for you and it's even a bit faster.

OK, I tried too on x86, the results are comparable: ETs are never
slower, but the speed difference is completely negligible. With gcc
4.4 too.

Attached file: b.cpp

Bench results:

=== 19:44:29 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=float && ./b
without ET: 1.87037 sec
with ET:    1.84362 sec
=== 19:45:00 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=double && ./b
without ET: 1.83466 sec
with ET:    1.83337 sec

Kudos to your intuition of assembly !!

Just out of curiosity, I experimented to see what's the size where
expression templates start benefit.

=== 19:50:37 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=float
-DVECTORSIZE=6 && ./b
without ET: 2.35337 sec
with ET:    2.30511 sec
=== 19:51:51 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=float
-DVECTORSIZE=7 && ./b
without ET: 3.38604 sec
with ET:    2.55124 sec
=== 19:51:59 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=double
-DVECTORSIZE=6 && ./b
without ET: 2.3045 sec
with ET:    2.30502 sec
=== 19:52:20 ~$ g++ b.cpp -o b -O2 -DNDEBUG -I eigen2 -DSCALAR=double
-DVECTORSIZE=7 && ./b
without ET: 3.36907 sec
with ET:    2.61382 sec

As you can see, on x86, on that particular expression s1*v1+s2*v2,
with both float and double, without vectorization, the ETs start
giving a large benefit at sizes >=7, and bring no benefit at sizes
<=6.

Benoit
#include <Eigen/Eigen>
#include <bench/BenchTimer.h>

using namespace std;
using namespace Eigen;

typedef SCALAR S;

#ifndef VECTORSIZE
#define VECTORSIZE 4
#endif

typedef Matrix<S,VECTORSIZE,1> V;

inline V sum(const V& x, const V& y)
{
  return x+y;
}

inline V prod(const V& x, const S& s)
{
  return x*s;
}

EIGEN_DONT_INLINE void test_without_ET(const V& v1, const S& s1, const V& v2, const S& s2, V& result)
{
  EIGEN_ASM_COMMENT("begin without ET");
  result = sum(prod(v1,s1),prod(v2,s2));
  EIGEN_ASM_COMMENT("end without ET");
}

EIGEN_DONT_INLINE void test_with_ET(const V& v1, const S& s1, const V& v2, const S& s2, V& result)
{
  EIGEN_ASM_COMMENT("begin with ET");
  result = v1*s1 + v2*s2;
  EIGEN_ASM_COMMENT("end with ET");
}

int main()
{
  S s1 = ei_random<S>();
  S s2 = ei_random<S>();
  V v1 = V::Random();
  V v2 = V::Random();
  V result;

  BenchTimer t_without_ET;
  t_without_ET.start();
  for(int i = 0; i < 100000000; i++)
  {
    test_without_ET(v1,s1,v2,s2,result);
  }
  t_without_ET.stop();
  cout << "without ET: " << t_without_ET.value() << " sec" << endl;

  BenchTimer t_with_ET;
  t_with_ET.start();
  for(int i = 0; i < 100000000; i++)
  {
    test_with_ET(v1,s1,v2,s2,result);
  }
  t_with_ET.stop();
  cout << "with ET:    " << t_with_ET.value() << " sec" << endl;
}


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/