Re: [eigen] optimization question |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] optimization question*From*: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>*Date*: Mon, 3 Oct 2011 21:59:06 -0400*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Q/m+JpiCZnXcuq0UIVyniOLuVoDeLpnI3rhAwdzqwek=; b=uBcfQE8SlmI50cXhBRPvA14mvin+/DwbVATQMCFpzh1Qv7dyiTHG43O4LrhcjwV3UM 9Bh0zWaFyTR3ymiXbjSdab1cOjaY9sgX6lDaQvUxELXBiI5taKIdVay96C1yBe8xYl8p xSpBlxWqhAncBSwjpuJfQv3S7djqyWid8Z3nw=

2011/10/3 Michel <michel.pacilli@xxxxxxx>: > Hi, > > not sure if it's the good place to ask user question... tell me if so. > > Well I try to get the best of eigen simple example, and I'm not sure that I > get the most : > > #define N 32768 > > Matrix<float,N,1> u; Are you really sure that you want this? For such a large size, it is almost always a better idea to use a MatrixXf u(N). > > Matrix<float,N,1> v; > > Matrix<float,N,1> w; > > for(int k=0; k <NLOOP; ++k) > > u = v.array() * w.array(); > > compile with gcc and sse2 flag > > Well, compare to a simple for loop and aligned array, I've got around 17% > speed up with eigen ;) > but, is it possible to give at compile time some hints to go further, with > unrolling, sse3,4? or other things? I don't think that newer sse versions bring anything useful here. Actually, fwiw, sse1 would already be enough for this particular use case! > > the asm of product is: > > # 86 "..\eigen\main.cpp" 1 > #it begins here! > # 0 "" 2 > /NO_APP > xorl %eax, %eax > .p2align 4,,10 > L3: > movaps (%esi,%eax,4), %xmm0 > mulps (%ebx,%eax,4), %xmm0 > movaps %xmm0, (%edx,%eax,4) > addl $4, %eax > cmpl $32768, %eax > jne L3 > /APP > # 88 "..\eigen\main.cpp" 1 > #it ends here! > > I wonder if it could be more efficient with more than just one xmm reg, or > prefetch ? I can only see 1 xmm register here, and given the very simple and predictable access pattern, there shouldn't be any reason to use explicit prefetch instructions. The only further optimization that I would consider here, would be partial unrolling of this loop, try doing 2 or 4 iterations at a time. Benoit

**Follow-Ups**:**Re: [eigen] optimization question***From:*Gael Guennebaud

**References**:**AW: [eigen] New release?***From:*Schmidt, Michael

**[eigen] optimization question***From:*Michel

**Messages sorted by:**[ date | thread ]- Prev by Date:
**[eigen] optimization question** - Next by Date:
**Re: [eigen] optimization question** - Previous by thread:
**[eigen] optimization question** - Next by thread:
**Re: [eigen] optimization question**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |