Re: [eigen] Eigen AVX support - first steps

[ Thread Index | Date Index | More Archives ]

On Mon, Apr 15, 2013 at 11:18 AM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:

1. what does loaddup do?

It is needed for compatibility with complexes, it load PacketSize/2
scalars and copy them into a packet where each scalar is duplicated,
e.g., for Packet8f:

A,B,C,D -> A,A,B,B,C,C,D,D.

2. what is the purpose of palign_impl?

That's the most tricky one. It is only used to optimize matrix-vector
products on unligned matrices. It takes 2 packets that represent a
contiguous memory array, and return a packet starting at the position
offset, e.g., for Packet4d

{A0,B0,C0,D0} ; {A1,B1,C1,D1}

if Offset==0 => {A0,B0,C0,D0}
if Offset==1 => {B0,C0,D0,A1}
if Offset==2 => {C0,D0,A1,B1}
if Offset==3 => {D0,A1,B1,C2}

For Packet8f .... well I have to think about it as considering all
possibilities might be overkill. We can easily discard this
optimization for PacketSize>4.

3. How are we defining the __m256d type?

+ I am assuming this type will be wrapped up in a union with an array
of doubles.
+ Where is that definition going to go? In the PacketMathDouble.h or
in some other
+ file? The name of the array will decide what we use in the pfirst
function. GCC
+ does not appear to have wrapped the __m256d type as a union.

hm... is it our job to define __m256d ?? Isn't it defined by  in the
AVX intrinsics header fiels??

It is. The question arises from the associated union member m128d_f64 array for SSE. GCC 4.7, apparently does not define an analogous union member m256d_f64 array. This makes a difference regarding what is to be done for the pfirst functions, which use this member. Is the m128d_f64 member defined in Eigen or it is part of the gcc definitions?


On Mon, Apr 15, 2013 at 5:06 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> Hi Rohit,
> thank you for the hard work.
> On Sun, Apr 14, 2013 at 6:47 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>> I have pushed some code to the eigen-avx repository at bitbucket.
>> a) All the additions reside in the AVX folder, along with altivec, neon and
>> SSE. I have split up the single and double precision code into two files as
>> one file was getting too big.
> I'm not sure decoupling float and double really helps readability
> since in 99% of the cases they should be extremely similar, but ok.
>> b) The integer code has been removed as AVX does not have int support. Once
>> real numbers are done, we can move on to complex number support.
> indeed, that's for AVX2 that should be available in coming soon CPUs.
>> c) I had a few questions about some of the intrinsic functions, I have
>> written them in the questions.txt file. in the AVX folder.
> I'll answer them in a second email.
>> d) So far, I have just migrated the intrinsic functions from SSE over to
>> AVX. All my changes are so far limited to the AVX folder in the arch folder.
>> I have not run any tests and this code is not hooked up to the rest of the
>> eigen code base as yet. The reduction functions have been tested separately,
>> so they should be fine.
> Look at the Eigen/Core header file. Before testing for SSE, if __AVX__
> is defined then we should define a EIGEN_VECTORIZE_AVX token that will
> be used later to include your files instead of the ones in SSE. Then,
> in CMakeLists.txt, you can add an option to enable AVX in unit tests,
> and start with the packet_math unit tests.
> I guess we well also have to move the alignement requirement to the
> packet_traits instead of the somewhat hardcoded 16 bytes. For initial
> testing though, you can make sure that pload and pstore also work on
> 16bytes aligned data.
>> e) I have made no attempt for micro-optimization so far. Once this works we
>> can move to optimization.
> sure!
> gael
>> f) Code review welcome. :)
>> Cheers,
>> --
>> Rohit Garg
>> Graduate Student
>> Applied and Engineering Physics
>> Cornell University

Rohit Garg

Graduate Student
Applied and Engineering Physics
Cornell University

Mail converted by MHonArc 2.6.19+