Re: [eigen] Eigen AVX support - first steps |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [eigen] Eigen AVX support - first steps
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Mon, 15 Apr 2013 17:18:07 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=1QpuaHzRvZmL+k0H6n5XNjnwieqyRXUUggBS6sQMceM=; b=eziH9NQ37im3sta3iT/1AN/dl9Cz4/tFh7LQbGOzl/GT1SKtqzPfNEE9xnbmHBuHGO PIf3flmwY5Tp0vslqIc8s+UMIbCezhVbTBsdpb1Jsu0X3IYKcxNyS+vb5lNZyCwCH3PA HguxVd3rJmbeaBxaCEBQFty3JlzGtyfZCVwiHyQwpvRztlpUpvKCwDMx6t/mAD8jK/84 mdkwyxCHy1399jHiCcHpHexXRHw1l3YzZI6E4g7PMfAM1KpIFLjT25/uh8euliuSfNQn Efy104/Zm4Y4UB33of/dOThDcJGWbiCfm9z6c7LikIpPzfu+zBIuZXSIlU/41AAVGqQq imzw==
questions.txt:
1. what does loaddup do?
It is needed for compatibility with complexes, it load PacketSize/2
scalars and copy them into a packet where each scalar is duplicated,
e.g., for Packet8f:
A,B,C,D -> A,A,B,B,C,C,D,D.
2. what is the purpose of palign_impl?
That's the most tricky one. It is only used to optimize matrix-vector
products on unligned matrices. It takes 2 packets that represent a
contiguous memory array, and return a packet starting at the position
offset, e.g., for Packet4d
Inputs:
{A0,B0,C0,D0} ; {A1,B1,C1,D1}
if Offset==0 => {A0,B0,C0,D0}
if Offset==1 => {B0,C0,D0,A1}
if Offset==2 => {C0,D0,A1,B1}
if Offset==3 => {D0,A1,B1,C2}
For Packet8f .... well I have to think about it as considering all
possibilities might be overkill. We can easily discard this
optimization for PacketSize>4.
3. How are we defining the __m256d type?
+ I am assuming this type will be wrapped up in a union with an array
of doubles.
+ Where is that definition going to go? In the PacketMathDouble.h or
in some other
+ file? The name of the array will decide what we use in the pfirst
function. GCC
+ does not appear to have wrapped the __m256d type as a union.
hm... is it our job to define __m256d ?? Isn't it defined by in the
AVX intrinsics header fiels??
Cheers,
Gael.
On Mon, Apr 15, 2013 at 5:06 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> Hi Rohit,
>
> thank you for the hard work.
>
>
> On Sun, Apr 14, 2013 at 6:47 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>> I have pushed some code to the eigen-avx repository at bitbucket.
>>
>> a) All the additions reside in the AVX folder, along with altivec, neon and
>> SSE. I have split up the single and double precision code into two files as
>> one file was getting too big.
>
> I'm not sure decoupling float and double really helps readability
> since in 99% of the cases they should be extremely similar, but ok.
>
>> b) The integer code has been removed as AVX does not have int support. Once
>> real numbers are done, we can move on to complex number support.
>
> indeed, that's for AVX2 that should be available in coming soon CPUs.
>
>> c) I had a few questions about some of the intrinsic functions, I have
>> written them in the questions.txt file. in the AVX folder.
>
> I'll answer them in a second email.
>
>> d) So far, I have just migrated the intrinsic functions from SSE over to
>> AVX. All my changes are so far limited to the AVX folder in the arch folder.
>> I have not run any tests and this code is not hooked up to the rest of the
>> eigen code base as yet. The reduction functions have been tested separately,
>> so they should be fine.
>
> Look at the Eigen/Core header file. Before testing for SSE, if __AVX__
> is defined then we should define a EIGEN_VECTORIZE_AVX token that will
> be used later to include your files instead of the ones in SSE. Then,
> in CMakeLists.txt, you can add an option to enable AVX in unit tests,
> and start with the packet_math unit tests.
>
> I guess we well also have to move the alignement requirement to the
> packet_traits instead of the somewhat hardcoded 16 bytes. For initial
> testing though, you can make sure that pload and pstore also work on
> 16bytes aligned data.
>
>> e) I have made no attempt for micro-optimization so far. Once this works we
>> can move to optimization.
>
> sure!
>
> gael
>
>> f) Code review welcome. :)
>>
>> Cheers,
>>
>> --
>> Rohit Garg
>>
>> http://rpg-314.blogspot.com/
>>
>> Graduate Student
>> Applied and Engineering Physics
>> Cornell University