[eigen] On tvmet performance

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: [eigen] On tvmet performance
From: "Gael Guennebaud" <gael.guennebaud@xxxxxxxxx>
Date: Wed, 29 Aug 2007 01:43:24 +0200
Dkim-signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type; b=c+nguVRnZev4vtpnl4g7hrZct9s/uJRrASOldyH2FtMM8v3XqGcGcyl0jwhmXOpAEPTiMNtiE4cM1Hp5McV38YdMs+GZjekgB4cevUWF60WA1X+KHABcM3Ihlxn/bixCNcMk6ei7ZSluCNPpecwuQ0rFXNmJho4zkYMALy5CTes=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:mime-version:content-type; b=l/kp1guCJMDeCtlZFKPV4XppqeJo66/vZiMAFeNn1NbvTPn2omKzhpi63yZQTDNem2oD5wBIbq/GckA1he3v2Si598K/Z6yPAEBNywnqTJKgTWd36zB/bBcesjEoWAiPKlpILt9760004QQVIWIqOzV/xww6qkZn9Wr5xt/H4zY=

Hi all,

I've seen that you are going to use _expression_ template for vectors with fixed size via the Tiny Vector library.
This puzzled me a bit because I've never seen any performance issue with my own vector classes (classic implementation) compared to hand coded expressions.
So, I ran (again) some basic comparisons between my own implementation, tvmet and hand coded expressions.
After playing a bit with Vector3f and Matrix4/Vector4 arithmetic expressions, my conclusion is that the tvmet implementation is ALWAYS at least slightly slower than mine and sometimes much much slower (10x).
Then I'm not sure that using _expression_ template is still a so good idea for small vector/matrix since current compilers seems to do a very good job here.
Moreover, I think that with a code based on tvmet it will be difficult to enable SSE optimizations...
Have you already compared the performance between Eigen1 and tvmet ?

To be precise let me show you the code of my (stupid) experiments:

    Vector3f aux, a, b, c, d;

    // vector code:
    for (uint k=0 ; k<10000000 ; ++k)
    {
        a += 1e-9f * ( (a+b)*(c+d) + (a+c)*(b+d)*(c+b) * (a-c)*(b-d)*(c-b)
             + (a*b)+(c*d) + (a*a-c)*(b+d*c)*(c*c-b) * (a*c)*(b*d)+(c*b) );
        b -= 1e-9f * a;
        c += 1e-9f * b;
        d -= 1e-9f * c;
        aux += a;
    }

   // hand coded code:
   for (uint k=0 ; k<10000000 ; ++k)
   {
        #define OP(_X) a[_X] += 1e-9 * ( (a[_X]+b[_X])*(c[_X]+d[_X]) + (a[_X]+c[_X])*(b[_X]+d[_X])*(c[_X]+b[_X]) * (a[_X]-c[_X])*(b[_X]-d[_X])*(c[_X]-b[_X]) \
             + (a[_X]*b[_X])+(c[_X]*d[_X]) + (a[_X]*a[_X]-c[_X])*(b[_X]+d[_X]*c[_X])*(c[_X]*c[_X]-b[_X]) * (a[_X]*c[_X])*(b[_X]*d[_X])+(c[_X]*b[_X]) ); \
        b[_X] -= 1e-9 * a[_X]; c[_X] += 1e-9 * b[_X]; d[_X] -= 1e-9 * c[_X]; aux[_X] += a[_X];

        OP(0);
        OP(1);
        OP(2);
   }

Compiler: g++ (GCC) 4.1.2, compiled with -O3
CPU: Intel(R) Core(TM)2 CPU   T7200 (2.00 Ghz)

Results:
- hand coded:        0.579s
- my vector class: 0.502s
- tvmet:                6.772s !!

Note that if I comment the second line of the first (long) _expression_ then tvmet achieves closer performance (0.37s vs 0.35s).
Actually with tvmet and the long _expression_, the ASM code contains some call to memcpy ... a gcc issue ?

Another example (Matrix*Vector):

Vector4f acc, a[3], b[3];
Matrix4f m0[3], m1[3];
for (uint k=0 ; k<50000000 ; ++k)
{
     acc += m1[k&0x3] * ((m0[k&0x3] * a[k&0x3]) * b[k&0x3]);
}

Results:
- basic vector/matrix implementation: 1.24s
- tvmet:                                            3.17s (the ASM looks OK)

A last one (Matrix*Matrix):

Vector4f acc, a[3];
Matrix4f m0[3], m1[3];
for (uint k=0 ; k<50000000 ; ++k)
{
     acc += (m1[k&0x3] * m0[k&0x3]) * a[k&0x3];
}

Results:
- basic vector/matrix implementation: 2.56s
- tvmet:                                            2.85s

By the way, by "classic/basic implementation" I mean something like:

class Vector3f
{
float x, y, z;
inline Vector3f operator + (const Vector3f& v) const
{
    Vector3f aux;
    aux.x = x + v.x;
    aux.y = y + v.y;
    aux.z = z + v.z;
    return aux;
}
};

Gael.

Follow-Ups:
- Re: [eigen] On tvmet performance
  - From: Andre Krause
- Re: [eigen] On tvmet performance
  - From: Benoit Jacob

Messages sorted by: [ date | thread ]
Prev by Date: [eigen] subscribe
Next by Date: Re: [eigen] On tvmet performance
Previous by thread: [eigen] subscribe
Next by thread: Re: [eigen] On tvmet performance

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/