[eigen] need help optimizing with __restrict__ and such

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi List,

First of all: doing m = m*m now works as expected in eigen2, because operator* 
now performs immediate evaluation, see explanation in a previous mail. The 
old behavior is now available under the new name lazyMul().

I've started taking care of performance. So i've reused my old benchmark with 
fixed-size 3x3 matrices, adapted it to eigen2, see attached file b.cpp; and 
I've compared with the TVMET equivalent, see attached file a.cpp. Compilation 
command-line included as a comment inside the files.

Result: on my machine, ./a runs in 6 seconds while ./b runs in 24 seconds.

So Eigen2 currently is 4x slower than tvmet.

Wait, don't leave! come back!

As far as I can see, there are 3 "keys" to optimization here (not yet talking 
about handwriting assembly code).

0) g++ -O3 -DNDEBUG
1) Loop unrolling
2) Function inlining
3) The "restrict" keyword

For 1), I've hand-unrolled the case of fixed-size 3x3 matrices, just for the 
needs of my benchmark.

About 2): Once loops are unrolled, GCC is very good at inlining functions. In 
my benchmark there was only one function left that wasn't inlined: eval(). I 
solved that by marking it with __attribute__((always_inline)).

Before somebofy complains about nonstandard keywords: this is of course not an 
issue, we use the nonstandard stuff only on those compilers that support it.

Excerpt from Util.h:

#ifdef __GNUC__
# define EI_ALWAYS_INLINE __attribute__((always_inline))
# define EI_RESTRICT      __restrict__
#else
# define EI_ALWAYS_INLINE
# define EI_RESTRICT
#endif

This is of course only a draft, it will be possible to fine-grain those 
defines in a better way using cmake platform checks (I already have written a 
CheckRestrictKeyword.cmake).

Which takes us to 3): the restrict keyword. This is the only explanation that 
I see why eigen2 is 4x slower than tvmet.

The problem is that i'm very much inexperienced with that keyword. I read
http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html

I tried to add restrict at many places of eigen2 but never got more than a +3% 
performance increase -- and sometimes got worse performance.

I don't fully understand restrict so if someone is comfortable with it, I'd 
welcome some help. The code is in SVN, feel free to add/remove EI_RESTRICT 
whenever you think it is appropriate.

This restrict-stuff seems to be the only difference that might explain why 
tvmet performs well and eigen2 doesn't.

Cheers,
Benoit
// g++ -O3 -I /home/gaston/tvmet-1.7.1/include/ -DNDEBUG a.cpp -o a

#include<iostream>
#include<tvmet/Matrix.h>
#include<tvmet/Vector.h>

using namespace std;
using namespace tvmet;

int main(int argc, char *argv[])
{
	Matrix<double,3,3> I;
        I = 1,0,0,
            0,1,0,
            0,0,1;
	Matrix<double,3,3> m, n;
        m = 1,2,3,
            4,5,6,
            7,8,9;
	for(int a = 0; a < 100000000; a++)
	{
		n = m*m;
		alias(m) = I + 0.05 * (m + n);
	}
	cout << m << endl;
	return 0;
}
// g++ -O3 -I /home/gaston/cuisine/branches/work/eigen2/src/ -DNDEBUG b.cpp -o b

#include<All>

using namespace std;

int main(int argc, char *argv[])
{
	EiMatrix3d I;
	EiMatrix3d m;
	for(int i = 0; i < 3; i++) for(int j = 0; j < 3; j++)
	{
		I(i,j) = (i==j);
		m(i,j) = (i+3*j);
	}
	for(int a = 0; a < 100000000; a++)
	{
		m = I + 0.00005 * (m + m*m);
	}
	cout << m << endl;
	return 0;
}

Attachment: signature.asc
Description: This is a digitally signed message part.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/