[eigen] need help optimizing with __restrict__ and such

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

Hi List,

First of all: doing m = m*m now works as expected in eigen2, because operator* 
now performs immediate evaluation, see explanation in a previous mail. The 
old behavior is now available under the new name lazyMul().

I've started taking care of performance. So i've reused my old benchmark with 
fixed-size 3x3 matrices, adapted it to eigen2, see attached file b.cpp; and 
I've compared with the TVMET equivalent, see attached file a.cpp. Compilation 
command-line included as a comment inside the files.

Result: on my machine, ./a runs in 6 seconds while ./b runs in 24 seconds.

So Eigen2 currently is 4x slower than tvmet.

Wait, don't leave! come back!

As far as I can see, there are 3 "keys" to optimization here (not yet talking 
about handwriting assembly code).

0) g++ -O3 -DNDEBUG
1) Loop unrolling
2) Function inlining
3) The "restrict" keyword

For 1), I've hand-unrolled the case of fixed-size 3x3 matrices, just for the 
needs of my benchmark.

About 2): Once loops are unrolled, GCC is very good at inlining functions. In 
my benchmark there was only one function left that wasn't inlined: eval(). I 
solved that by marking it with __attribute__((always_inline)).

Before somebofy complains about nonstandard keywords: this is of course not an 
issue, we use the nonstandard stuff only on those compilers that support it.

Excerpt from Util.h:

#ifdef __GNUC__
# define EI_ALWAYS_INLINE __attribute__((always_inline))
# define EI_RESTRICT      __restrict__
# define EI_RESTRICT

This is of course only a draft, it will be possible to fine-grain those 
defines in a better way using cmake platform checks (I already have written a 

Which takes us to 3): the restrict keyword. This is the only explanation that 
I see why eigen2 is 4x slower than tvmet.

The problem is that i'm very much inexperienced with that keyword. I read

I tried to add restrict at many places of eigen2 but never got more than a +3% 
performance increase -- and sometimes got worse performance.

I don't fully understand restrict so if someone is comfortable with it, I'd 
welcome some help. The code is in SVN, feel free to add/remove EI_RESTRICT 
whenever you think it is appropriate.

This restrict-stuff seems to be the only difference that might explain why 
tvmet performs well and eigen2 doesn't.

// g++ -O3 -I /home/gaston/tvmet-1.7.1/include/ -DNDEBUG a.cpp -o a


using namespace std;
using namespace tvmet;

int main(int argc, char *argv[])
	Matrix<double,3,3> I;
        I = 1,0,0,
	Matrix<double,3,3> m, n;
        m = 1,2,3,
	for(int a = 0; a < 100000000; a++)
		n = m*m;
		alias(m) = I + 0.05 * (m + n);
	cout << m << endl;
	return 0;
// g++ -O3 -I /home/gaston/cuisine/branches/work/eigen2/src/ -DNDEBUG b.cpp -o b


using namespace std;

int main(int argc, char *argv[])
	EiMatrix3d I;
	EiMatrix3d m;
	for(int i = 0; i < 3; i++) for(int j = 0; j < 3; j++)
		I(i,j) = (i==j);
		m(i,j) = (i+3*j);
	for(int a = 0; a < 100000000; a++)
		m = I + 0.00005 * (m + m*m);
	cout << m << endl;
	return 0;

Attachment: signature.asc
Description: This is a digitally signed message part.

Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/