Rohit,
Excellent! This is a worthwhile endeavor. I saw considerable
gains at my company when we started to use avx.
Just a couple of random thoughts (I'm afraid that's all I have
time for right now), from my experience in changing sse-optimized
code to avx-optimized.
- Lesson learned: Aligned moves are much faster ( again )
- on older CPUs; unaligned 128 bit moves were much slower
than the aligned moves. There was a speed advantage to using
aligned moves wherever possible, even if it meant extra
conditionals
- Nehalem made unaligned moves just as fast. The speed
penalty was eliminated, making unaligned moves the best game
in town (full speed, without any need for alignment checks
or data manipulations to enforce boundaries)
- Sandy Bridge brought 256bit simd. Unfortunately it also
brought back the unaligned penalty for moves (for 256 bit
words)
- The 256 bit SIMD instructions usually act
like their 128 bit counterparts
- see http://bit.ly/Z1nXH1 ("What I really want is
VHADDPS /_mm256_hadd_ps
to act like HADDPS /_mm_hadd_ps ,
only with 256 bit words. Unfortunately, it acts like two
calls to HADDPS acting independently on the
low and high words.")
- Stack boundary and alignment must be 32 bytes vs 16 for SSE
( -mpreferred-stack-boundary=5 and maybe -mstackrealign
or-mincoming-stack-boundary )
- Make sure your OS is new enough to support avx (on linux:
grep avx /proc/cpuinfo )
- We saw speedups on the order of 10% to 60% in porting from
SSE4 to AVX. Basically, the greater the number of cycles
being used by SSE* , the higher your gains will be.
- IIRC, Intel's roadmap has their SIMD continuing to
occasionally double in the future. Try not to assume 256 bits
as a max size.
-- Mark
On 03/02/2013 07:09 PM, Rohit Garg wrote:
Hi all,
I have been feeling the need for including AVX instructions in Eigen
for some time now. It's now too late to make 3.2 beta, but it can go
into 3.3.
I can work on the PacketMath functions for AVX if there are other
developers more familiar with Eigen's internals who would be OK with
helping me integrate this. Sadly, I do not know as much about Eigen's
internals as I should. Since AVX does not have integer operations, I
suggest that we begin with double and complex double operations first.
After that, we can look at float operations.
|