|[eigen] optimization question|
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
not sure if it's the good place to ask user question... tell me if so.
Well I try to get the best of eigen simple example, and I'm not sure that I get the most :
#define N 32768
for(int k=0; k <NLOOP; ++k)
u = v.array() * w.array(); compile with gcc and sse2 flag Well, compare to a simple for loop and aligned array, I've got around 17% speed up with eigen ;) but, is it possible to give at compile time some hints to go further, with unrolling, sse3,4? or other things? the asm of product is: # 86 "..\eigen\main.cpp" 1 #it begins here! # 0 "" 2 /NO_APP xorl %eax, %eax .p2align 4,,10 L3: movaps (%esi,%eax,4), %xmm0 mulps (%ebx,%eax,4), %xmm0 movaps %xmm0, (%edx,%eax,4) addl $4, %eax cmpl $32768, %eax jne L3 /APP # 88 "..\eigen\main.cpp" 1 #it ends here! I wonder if it could be more efficient with more than just one xmm reg, or prefetch ? with my best regards for this great work, michel pacilli
|Mail converted by MHonArc 2.6.19+||http://listengine.tuxfamily.org/|