Re: [eigen] On tvmet performance

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] On tvmet performance
From: Benoit Jacob <jacob@xxxxxxxxxxxxxxx>
Date: Wed, 29 Aug 2007 11:06:57 +0200 (CEST)
Cc: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>

I noticed that in b.cpp the matrix m had inf values, which could havebiased the result.

I changed the factor 0.05 to 0.005, which solved this problem. The resultsare the same.


Benoit

On Wed, 29 Aug 2007, Benoit Jacob wrote:

Hi Gael,

you forgot to define NDEBUG when compiling. This is extremely important forperformance, and can alone be enough to account for the slowness that younoticed. So your benchmarks don't show anything!

Defining NDEBUG turns off asserts. In Eigen and in tvmet, there are tons ofasserts. For instance, when you access a coordinate of a vector by index asin "vector[index]", there is an assert checking that the index falls withinthe allowed range. This is very slow, but useful for debugging.

With GCC, you are supposed to know that and define NDEBUG by yourself.Microsoft customers aren't assumed to be that smart so MSVC automaticallydefines NDEBUG when you select the "Release" mode.

I went further and made my own benchmark showing tvmet running 25% fasterthan Eigen1.


TVMET part:
command-line: g++ -I/home/kde4/kde/include/ a.cpp -O3 -DNDEBUG -o a
source code:

#include<iostream>
#include<tvmet/Matrix.h>
#include<tvmet/Vector.h>

using namespace std;
using namespace tvmet;

int main(int argc, char *argv[])
{
       Matrix<double,3,3> I;
       I = 1,0,0,
           0,1,0,
           0,0,1;
       Matrix<double,3,3> m;
       m = 1,2,3,
           4,5,6,
           7,8,9;
       for(int a = 0; a < 100000000; a++)
       {
               m = I + 0.05 * (m + m * m);
       }
       cout << m << endl;
       return 0;
}

Eigen1 part:
command line: g++ -I/home/kde4/kde/include/ b.cpp -O3 -DNDEBUG -o b
source code:
#include<iostream>
#include<eigen/matrix.h>

using namespace std;
using namespace Eigen;

int main(int argc, char *argv[])
{
       Matrix3d I;
       I.loadIdentity();
       Matrix3d m;
       m.loadRandom();
       for(int a = 0; a < 100000000; a++)
       {
               m = I + 0.05 * (m + m * m);
       }
       cout << m << endl;
       return 0;
}

These programs were run on my Core 1 Duo 1.66 GHZ in "performance" mode, i.e.the CPU was blocked to maximal frequency.


Result:
TVMET: 6.1 seconds
Eigen1: 8.1 seconds.

So TVMET runs approximately 25% faster than Eigen1.

Cheers
Benoit

PS. You use operator* between vectors, doing an element-wise multiplication.I have not implemented this in Eigen1 and am removing it from tvmet forEigen2, because really it doesn't correspond to anything meaningful from thepoint of view of mathematics. Of course I leave dot product and crossproduct, but this is a different thing. And I wouldn't call either"operator*".



On Wed, 29 Aug 2007, Gael Guennebaud wrote:

Hi all,

I've seen that you are going to use expression template for vectors with
fixed size via the Tiny Vector library.
This puzzled me a bit because I've never seen any performance issue with my
own vector classes (classic implementation) compared to hand coded
expressions.
So, I ran (again) some basic comparisons between my own implementation,
tvmet and hand coded expressions.
After playing a bit with Vector3f and Matrix4/Vector4 arithmetic
expressions, my conclusion is that the tvmet implementation is ALWAYS at
least slightly slower than mine and sometimes much much slower (10x).

Then I'm not sure that using expression template is still a so good ideafor

small vector/matrix since current compilers seems to do a very good job
here.
Moreover, I think that with a code based on tvmet it will be difficult to
enable SSE optimizations...
Have you already compared the performance between Eigen1 and tvmet ?


To be precise let me show you the code of my (stupid) experiments:

   Vector3f aux, a, b, c, d;

   // vector code:
   for (uint k=0 ; k<10000000 ; ++k)
   {
       a += 1e-9f * ( (a+b)*(c+d) + (a+c)*(b+d)*(c+b) * (a-c)*(b-d)*(c-b)
            + (a*b)+(c*d) + (a*a-c)*(b+d*c)*(c*c-b) * (a*c)*(b*d)+(c*b) );
       b -= 1e-9f * a;
       c += 1e-9f * b;
       d -= 1e-9f * c;
       aux += a;
   }

  // hand coded code:
  for (uint k=0 ; k<10000000 ; ++k)
  {
       #define OP(_X) a[_X] += 1e-9 * ( (a[_X]+b[_X])*(c[_X]+d[_X]) +
(a[_X]+c[_X])*(b[_X]+d[_X])*(c[_X]+b[_X]) *
(a[_X]-c[_X])*(b[_X]-d[_X])*(c[_X]-b[_X]) \
            + (a[_X]*b[_X])+(c[_X]*d[_X]) +
(a[_X]*a[_X]-c[_X])*(b[_X]+d[_X]*c[_X])*(c[_X]*c[_X]-b[_X]) *
(a[_X]*c[_X])*(b[_X]*d[_X])+(c[_X]*b[_X]) ); \
       b[_X] -= 1e-9 * a[_X];  c[_X] += 1e-9 * b[_X];  d[_X] -= 1e-9 *
c[_X]; aux[_X] += a[_X];

       OP(0);
       OP(1);
       OP(2);
  }

Compiler: g++ (GCC) 4.1.2,  compiled with -O3
CPU: Intel(R) Core(TM)2 CPU   T7200 (2.00 Ghz)

Results:
- hand coded:        0.579s
- my vector class: 0.502s
- tvmet:                6.772s !!

Note that if I comment the second line of the first (long) expression then
tvmet achieves closer performance (0.37s vs 0.35s).

Actually with tvmet and the long expression, the ASM code contains somecall

to memcpy ... a gcc issue ?




Another example (Matrix*Vector):

 Vector4f acc, a[3], b[3];
 Matrix4f m0[3], m1[3];
 for (uint k=0 ; k<50000000 ; ++k)
 {
    acc += m1[k&0x3] * ((m0[k&0x3] * a[k&0x3]) * b[k&0x3]);
 }

Results:
- basic vector/matrix implementation: 1.24s
- tvmet:                                            3.17s (the ASM looks
OK)



A last one (Matrix*Matrix):

 Vector4f acc, a[3];
 Matrix4f m0[3], m1[3];
 for (uint k=0 ; k<50000000 ; ++k)
 {
    acc += (m1[k&0x3] * m0[k&0x3]) * a[k&0x3];
 }

Results:
- basic vector/matrix implementation: 2.56s
- tvmet:                                            2.85s



By the way, by "classic/basic implementation" I mean something like:

class Vector3f
{
float x, y, z;
inline Vector3f operator + (const Vector3f& v) const
{
   Vector3f aux;
   aux.x = x + v.x;
   aux.y = y + v.y;
   aux.z = z + v.z;
   return aux;
}
};


Gael.

References:
- [eigen] On tvmet performance
  - From: Gael Guennebaud
- Re: [eigen] On tvmet performance
  - From: Benoit Jacob

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] On tvmet performance
Next by Date: Re: [eigen] On tvmet performance
Previous by thread: Re: [eigen] On tvmet performance
Next by thread: Re: [eigen] On tvmet performance

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/