Re: [eigen] Extending Eigen with AVX

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Rohit,
Excellent!  This is a worthwhile endeavor.  I saw considerable gains at my company when we started to use avx.

Just a couple of random thoughts (I'm afraid that's all I have time for right now), from my experience in changing sse-optimized code to avx-optimized.

  • Lesson learned: Aligned moves are much faster ( again )
    • on older CPUs; unaligned 128 bit moves were much slower than the aligned moves. There was a speed advantage to using aligned moves wherever possible, even if it meant extra conditionals
    • Nehalem made unaligned moves just as fast. The speed penalty was eliminated, making unaligned moves the best game in town (full speed, without any need for alignment checks or data manipulations to enforce boundaries)
    • Sandy Bridge brought 256bit simd.  Unfortunately it also brought back the unaligned penalty for moves (for 256 bit words)
  • The 256 bit SIMD instructions usually act like their 128 bit counterparts
    • see http://bit.ly/Z1nXH1  ("What I really want is VHADDPS/_mm256_hadd_ps to act like HADDPS/_mm_hadd_ps, only with 256 bit words. Unfortunately, it acts like two calls to HADDPS acting independently on the low and high words.")
  • Stack boundary and alignment must be 32 bytes vs 16 for SSE  ( -mpreferred-stack-boundary=5 and maybe -mstackrealign or-mincoming-stack-boundary  )
  • Make sure your OS is new enough to support avx (on linux: grep avx /proc/cpuinfo )
  • We saw speedups on the order of 10% to 60% in porting from SSE4 to AVX.  Basically, the greater the number of cycles being used by SSE* , the higher your gains will be.
  • IIRC, Intel's roadmap has their SIMD continuing to occasionally double in the future.  Try not to assume 256 bits as a max size.
-- Mark

On 03/02/2013 07:09 PM, Rohit Garg wrote:
Hi all,

I have been feeling the need for including AVX instructions in Eigen
for some time now. It's now too late to make 3.2 beta, but it can go
into 3.3.

I can work on the PacketMath functions for AVX if there are other
developers more familiar with Eigen's internals who would be OK with
helping me integrate this. Sadly, I do not know as much about Eigen's
internals as I should. Since AVX does not have integer operations, I
suggest that we begin with double and complex double operations first.
After that, we can look at float operations.




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/