[eigen] optimization question

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


not sure if it's the good place to ask user question... tell me if so.

Well I try to get the best of eigen simple example, and I'm not sure that I get the most :

#define N 32768
Matrix<float,N,1> u;
Matrix<float,N,1> v;
Matrix<float,N,1> w;

for(int k=0; k <NLOOP; ++k) 
   u = v.array() * w.array();

compile with gcc and sse2 flag

Well, compare to a simple for loop and aligned array, I've got around 17% speed up with eigen ;)
but, is it possible to give at compile time some hints to go further, with unrolling, sse3,4? or other things?

the asm of product is:

 # 86 "..\eigen\main.cpp" 1
	#it begins here!
 # 0 "" 2
	xorl	%eax, %eax
	.p2align 4,,10
	movaps	(%esi,%eax,4), %xmm0
	mulps	(%ebx,%eax,4), %xmm0
	movaps	%xmm0, (%edx,%eax,4)
	addl	$4, %eax
	cmpl	$32768, %eax
	jne	L3
 # 88 "..\eigen\main.cpp" 1
	#it ends here!

I wonder if it could be more efficient with more than just one xmm reg, or prefetch ?

with my best regards for this great work,

michel pacilli

Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/