Re: [eigen] Eigen AVX support - first steps
• To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
• Subject: Re: [eigen] Eigen AVX support - first steps
• From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
• Date: Mon, 15 Apr 2013 17:18:07 +0200

```questions.txt:

It is needed for compatibility with complexes, it load PacketSize/2
scalars and copy them into a packet where each scalar is duplicated,
e.g., for Packet8f:

A,B,C,D -> A,A,B,B,C,C,D,D.

2. what is the purpose of palign_impl?

That's the most tricky one. It is only used to optimize matrix-vector
products on unligned matrices. It takes 2 packets that represent a
contiguous memory array, and return a packet starting at the position
offset, e.g., for Packet4d

Inputs:
{A0,B0,C0,D0} ; {A1,B1,C1,D1}

if Offset==0 => {A0,B0,C0,D0}
if Offset==1 => {B0,C0,D0,A1}
if Offset==2 => {C0,D0,A1,B1}
if Offset==3 => {D0,A1,B1,C2}

For Packet8f .... well I have to think about it as considering all
possibilities might be overkill. We can easily discard this
optimization for PacketSize>4.

3. How are we defining the __m256d type?

+ I am assuming this type will be wrapped up in a union with an array
of doubles.
+ Where is that definition going to go? In the PacketMathDouble.h or
in some other
+ file? The name of the array will decide what we use in the pfirst
function. GCC
+ does not appear to have wrapped the __m256d type as a union.

hm... is it our job to define __m256d ?? Isn't it defined by  in the

Cheers,
Gael.

On Mon, Apr 15, 2013 at 5:06 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> Hi Rohit,
>
> thank you for the hard work.
>
>
> On Sun, Apr 14, 2013 at 6:47 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>> I have pushed some code to the eigen-avx repository at bitbucket.
>>
>> a) All the additions reside in the AVX folder, along with altivec, neon and
>> SSE. I have split up the single and double precision code into two files as
>> one file was getting too big.
>
> I'm not sure decoupling float and double really helps readability
> since in 99% of the cases they should be extremely similar, but ok.
>
>> b) The integer code has been removed as AVX does not have int support. Once
>> real numbers are done, we can move on to complex number support.
>
> indeed, that's for AVX2 that should be available in coming soon CPUs.
>
>> c) I had a few questions about some of the intrinsic functions, I have
>> written them in the questions.txt file. in the AVX folder.
>
> I'll answer them in a second email.
>
>> d) So far, I have just migrated the intrinsic functions from SSE over to
>> AVX. All my changes are so far limited to the AVX folder in the arch folder.
>> I have not run any tests and this code is not hooked up to the rest of the
>> eigen code base as yet. The reduction functions have been tested separately,
>> so they should be fine.
>
> Look at the Eigen/Core header file. Before testing for SSE, if __AVX__
> is defined then we should define a EIGEN_VECTORIZE_AVX token that will
> be used later to include your files instead of the ones in SSE. Then,
> in CMakeLists.txt, you can add an option to enable AVX in unit tests,
>
> I guess we well also have to move the alignement requirement to the
> packet_traits instead of the somewhat hardcoded 16 bytes. For initial
> testing though, you can make sure that pload and pstore also work on
> 16bytes aligned data.
>
>> e) I have made no attempt for micro-optimization so far. Once this works we
>> can move to optimization.
>
> sure!
>
> gael
>
>> f) Code review welcome. :)
>>
>> Cheers,
>>
>> --
>> Rohit Garg
>>
>> http://rpg-314.blogspot.com/
>>