Re: [eigen] Extending Eigen with AVX

Rohit,
Excellent! This is a worthwhile endeavor. I saw considerable gains at my company when we started to use avx.

Just a couple of random thoughts (I'm afraid that's all I have time for right now), from my experience in changing sse-optimized code to avx-optimized.

Lesson learned: Aligned moves are much faster ( again )

on older CPUs; unaligned 128 bit moves were much slower than the aligned moves. There was a speed advantage to using aligned moves wherever possible, even if it meant extra conditionals
Nehalem made unaligned moves just as fast. The speed penalty was eliminated, making unaligned moves the best game in town (full speed, without any need for alignment checks or data manipulations to enforce boundaries)
Sandy Bridge brought 256bit simd. Unfortunately it also brought back the unaligned penalty for moves (for 256 bit words)

The 256 bit SIMD instructions usually act like their 128 bit counterparts

see http://bit.ly/Z1nXH1 ("What I really want is VHADDPS/_mm256_hadd_ps to act like HADDPS/_mm_hadd_ps, only with 256 bit words. Unfortunately, it acts like two calls to HADDPS acting independently on the low and high words.")

Stack boundary and alignment must be 32 bytes vs 16 for SSE ( -mpreferred-stack-boundary=5 and maybe -mstackrealign or-mincoming-stack-boundary )
Make sure your OS is new enough to support avx (on linux: grep avx /proc/cpuinfo )
We saw speedups on the order of 10% to 60% in porting from SSE4 to AVX. Basically, the greater the number of cycles being used by SSE* , the higher your gains will be.
IIRC, Intel's roadmap has their SIMD continuing to occasionally double in the future. Try not to assume 256 bits as a max size.

-- Mark

On 03/02/2013 07:09 PM, Rohit Garg wrote: