Re: [eigen] On tvmet performance

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Gael Guennebaud wrote:

Hi all,

I've seen that you are going to use expression template for vectors with fixed size via the Tiny Vector library. This puzzled me a bit because I've never seen any performance issue with my own vector classes (classic implementation) compared to hand coded expressions. So, I ran (again) some basic comparisons between my own implementation, tvmet and hand coded expressions. After playing a bit with Vector3f and Matrix4/Vector4 arithmetic expressions, my conclusion is that the tvmet implementation is ALWAYS at least slightly slower than mine and sometimes much much slower (10x). Then I'm not sure that using expression template is still a so good idea for small vector/matrix since current compilers seems to do a very good job here. Moreover, I think that with a code based on tvmet it will be difficult to enable SSE optimizations...
Have you already compared the performance between Eigen1 and tvmet ?


To be precise let me show you the code of my (stupid) experiments:

    Vector3f aux, a, b, c, d;

    // vector code:
    for (uint k=0 ; k<10000000 ; ++k)
    {
        a += 1e-9f * ( (a+b)*(c+d) + (a+c)*(b+d)*(c+b) * (a-c)*(b-d)*(c-b)
             + (a*b)+(c*d) + (a*a-c)*(b+d*c)*(c*c-b) * (a*c)*(b*d)+(c*b) );
        b -= 1e-9f * a;
        c += 1e-9f * b;
        d -= 1e-9f * c;
        aux += a;
    }

   // hand coded code:
   for (uint k=0 ; k<10000000 ; ++k)
   {
#define OP(_X) a[_X] += 1e-9 * ( (a[_X]+b[_X])*(c[_X]+d[_X]) + (a[_X]+c[_X])*(b[_X]+d[_X])*(c[_X]+b[_X]) * (a[_X]-c[_X])*(b[_X]-d[_X])*(c[_X]-b[_X]) \ + (a[_X]*b[_X])+(c[_X]*d[_X]) + (a[_X]*a[_X]-c[_X])*(b[_X]+d[_X]*c[_X])*(c[_X]*c[_X]-b[_X]) * (a[_X]*c[_X])*(b[_X]*d[_X])+(c[_X]*b[_X]) ); \ b[_X] -= 1e-9 * a[_X]; c[_X] += 1e-9 * b[_X]; d[_X] -= 1e-9 * c[_X]; aux[_X] += a[_X]; OP(0);
        OP(1);
        OP(2);
   }

Compiler: g++ (GCC) 4.1.2,  compiled with -O3
CPU: Intel(R) Core(TM)2 CPU   T7200 (2.00 Ghz)

Results:
 - hand coded:        0.579s
 - my vector class: 0.502s
 - tvmet:                6.772s !!

Note that if I comment the second line of the first (long) expression then tvmet achieves closer performance (0.37s vs 0.35s). Actually with tvmet and the long expression, the ASM code contains some call to memcpy ... a gcc issue ?




Another example (Matrix*Vector):

  Vector4f acc, a[3], b[3];
  Matrix4f m0[3], m1[3];
  for (uint k=0 ; k<50000000 ; ++k)
  {
     acc += m1[k&0x3] * ((m0[k&0x3] * a[k&0x3]) * b[k&0x3]);
  }

Results:
 - basic vector/matrix implementation: 1.24s
- tvmet: 3.17s (the ASM looks OK)



A last one (Matrix*Matrix):

  Vector4f acc, a[3];
  Matrix4f m0[3], m1[3];
  for (uint k=0 ; k<50000000 ; ++k)
  {
     acc += (m1[k&0x3] * m0[k&0x3]) * a[k&0x3];
  }

Results:
 - basic vector/matrix implementation: 2.56s
 - tvmet:                                            2.85s



By the way, by "classic/basic implementation" I mean something like:

class Vector3f
{
float x, y, z;
inline Vector3f operator + (const Vector3f& v) const
{
    Vector3f aux;
    aux.x = x + v.x;
    aux.y = y + v.y;
    aux.z = z + v.z;
    return aux;
}
};


Gael.


very interesting. i would like to try the same with windows and visual c++ 2005. can you maybe please email me your code? i will post the results here.

i see that you code your Vector classes using x,y,z. have you experience if it would be slower to implement a vector / matrix class using
float data[3]; to hold the elements, and then do something like this:

inline Vector3f operator + (const Vector3f& v) const
{
    Vector3f aux;
    aux[0] = data[0] + v(0);
    aux[1] = data[1] + v(1);
    aux[2] = data[2] + v(2);
    return aux;
}

? normally i would assume that the compiler optimizes away the additional pointer addition, but who knows what happens really...

i am just asking because i saw some one doing a Matri4x4 using float m11, m12, ... m44 instead od an array for internal storage. maybe it is a tiny slightly bit faster?




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/