[eigen] optimization question |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Hi, not sure if it's the good place to ask user question... tell me if so. Well I try to get the best of eigen simple example, and I'm not sure that I get the most : #define N 32768 Matrix<float,N,1> u; Matrix<float,N,1> v; Matrix<float,N,1> w; for(int k=0; k <NLOOP; ++k) u = v.array() * w.array(); compile with gcc and sse2 flag Well, compare to a simple for loop and aligned array, I've got around 17% speed up with eigen ;) but, is it possible to give at compile time some hints to go further, with unrolling, sse3,4? or other things? the asm of product is: # 86 "..\eigen\main.cpp" 1 #it begins here! # 0 "" 2 /NO_APP xorl %eax, %eax .p2align 4,,10 L3: movaps (%esi,%eax,4), %xmm0 mulps (%ebx,%eax,4), %xmm0 movaps %xmm0, (%edx,%eax,4) addl $4, %eax cmpl $32768, %eax jne L3 /APP # 88 "..\eigen\main.cpp" 1 #it ends here! I wonder if it could be more efficient with more than just one xmm reg, or prefetch ? with my best regards for this great work, michel pacilli |
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |