Re: [eigen] AVX/LRB and SSE with SoA support

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] AVX/LRB and SSE with SoA support
From: keller <christoph-keller@xxxxxx>
Date: Sun, 08 Aug 2010 12:32:50 +0200

Yes, there are no shift operators yet, so this can be done easily (SSEhas these too). I also think it makes sense to have them.


But I will not do it unless the patch is accepted (maybe with changes).

On 08/08/2010 09:19 AM, Rohit Garg wrote:

PONY QUESTION: could you also add bit shift operations?

On Fri, Aug 6, 2010 at 10:41 PM, keller<christoph-keller@xxxxxx>  wrote:

Hello,

i changed the code further and added a few things to Array, ArrayBase and
Functors.h.

The bitwise operations are quite important in my opinion, so they are
operators (|,&,^,|=,&=,^=) in ArrayBase for me. If you do not like it
(because of possible mistakes), one can also use the bitwiseOr etc.
functions. The operators may then be made avaiable with sth. like
USE_BITWISE_ARRAY_OPERATORS. I do not know how likely these mistakes are.

The added Functions in Array like soam are not avaiable with Map. Therefore
frS in rayb.cpp cannnot be a Map yet. This may not make a big difference,
but its not optimal. What is the best way to change that?

I am not sure how to improve the bitmask() and allNegative() functions in
Array, to make them more general, in a good Eigen-consistent way. There may
not even be an equivalent to _mm_movemask_ps in NEON and AltiVec.

It is pretty amazing how close the speed of Eigencode is to inline asm and
its far easier to read and write.

I can make a patch using Mercurial and write some test functions, if needed.

Greetings,
Christoph

On 07/23/2010 01:19 PM, Gael Guennebaud wrote:

Hi,

I've attached a modified version of your code for Eigen3 as well as
the corresponding ASM generated with gcc 4.5.


On Fri, Jul 23, 2010 at 6:45 AM, keller<christoph-keller@xxxxxx>    wrote:

Hello,

I implemented a function where a sphere cuts a Ray Bundle (Most problems
can
be demonstrated there). It may not be optimally implemented, but i think
it
works as a simple example.

I did not yet put the data into blocks, as a few other problems occurred.

I used eigen2 for this.
-using VS2008 and the Intel compiler it compiles and runs correctly.
-g++ prints out errors with cwise()

Yes this is because when you call a template function on a template
type you have to use the template keyword, e.g.:

mat.template block<rows,cols>(i,j)

The resulting code does not use SSE-Instructions that often. I do not
know
if this is because i used eigen2 in the wrong way. The compiler(VS2008)
has
SSE-Instructions enabled.

Also the _mm_or_ps is avaiable through ei_por in Eigen3, but not via a
normal interface function.

This is something that would make sense to have upstream in the Array
API. The question is shall we use operator|  (i.e., A | B) or a very
explicit A.bitwiseOr(B) function. Since operator| is not defined for
non integral type I would go for the explicit bitwiseOr.

_mm_movemask_ps is not avaiable at all.

This one is more difficult to expose in a useful and portable way.
Perhaps something like:

ei_pallpositive(v) / ei_pallnegative(v)

for the low level functions, and perhaps in second time we could have
shortcuts for:

(A>    0).all() and (A<    0).all()

like

A.allPositive() / A.allNegative()

which would be built on top of the respective ei_p* functions, but I'm
not 100% sure and this is really specific use cases. I doubt that's
really critical even in your case.

gael

Is it a good idea at all to use Eigen, if one does not want to be more
than
about 20% slower than with a naive Intrinsics-implementation?

For this kind of low level stuff you can attack Eigen at two level:
using Array<>    API, or directly using the ei_p* functions. The former
has the advantage to allow you to use "blocks" of multiple packets
with automatic unrolling, while the later offers more flexibility for
fine tuning...

Greetings,
Christoph


On 07/03/2010 09:11 AM, Gael Guennebaud wrote:

Hi,

first things first, supporting a new vector engine in eigen is
relatively easy. However, you won't be able to right code
"templatized" on the vector engine, and even with a custom library I
doubt that's doable. The choice of the vector engine has to be
compiler options. So, the way to support multiple vector engines at
runtime (I guess this is what you want?), is to have a trivial
function foo(...) selecting the right implementation function e.g.,
foo_sse(...), foo_sse4(...), etc. The foo_* functions are implemented
only once, by putting it in its own file which will be compiled
multiple times with different compiler options (e.g., -msse, -msse4,
etc.). The actual of the function (foo_sse) will be built using the
preprocessor, e.g.:

EIGEN_CAT(foo_,VECTOR_ENGINE_SUFFIX)(...) { .... }

Now regarding the special interleaved packing of the data, there is
currently no such thing in eigen, however, I think you can easily add
that on top of, e.g., a
Array<float,Dynamic,ei_packet_traits<float>::size,RowMajor>. It will
be initialized with dimension *


((size/ei_packet_traits<float>::size)+((size%ei_packet_traits<float>::size)==0
? 0 : 1) rows where dimension is three in your example, and size is
the number of element. You can easily get the i-th element as follow:



underlying_array.block<dimension,1>((i/ei_packet_traits<float>::size)*dimension,
i%ei_packet_traits<float>::size) = Vector3f(x,y,z);

The idea would be to add a class warping this underlying_array to make
it convenient to use. The main questions are what is the set of
features which have to be supported? How do we want to use it? through
a manual loop over the (e.g., 3x4) blocks? through functors? through
high level expression template code? etc.

To finish, this is definitely something I planed to do the future, so
I'd be glad to discuss with you its design and help you to get it
right regarding Eigen.

cheers,

gael

On Fri, Jul 2, 2010 at 11:56 PM, keller<christoph-keller@xxxxxx>
  wrote:

Hello,

I have to implement a raytracer that uses the Vector units of modern
x86
cpus.

Intel will introduce AVX soon. I have to support AVX and SSE and i do
not
want all of my code replicated. I currently use the Intels intrinsics.
If
Intel adds the Larrabee extensions into normal x86 CPUs one would even
have
to write three versions.

As i use Eigen in parts of the software i wonder if one could support
this
with Eigen. The typical situation is like this.
-One has a ray bundle with 16 rays that are arranged in SoA format:
x1,..,x4
y1,...,y4
z1,...,z4
.....
x13,..,x16
y13,...,y16
z13,...,z16
This format has to be changed when using AVX/LRB of course.
-One source of the rays and an object like a sphere

Ideally i have a function like testCut(Bundles, Sphere) that uses a
template
parameter to set the Vector extension to use. I do not want to
partially
specialize this function but use some structures and functions in
testCut
that depend on the template parameter.

The question for me is: Develop a small library of my own, or use Eigen
(where i can contribute this functionality). I know i can do this with
a
custom library, but i dont know if it is easy to add this functionality
to
Eigen.

I think a lot of people will have the same problem soon.

Greetings,
Christoph

References:
- Re: [eigen] AVX/LRB and SSE with SoA support
  - From: keller
- Re: [eigen] AVX/LRB and SSE with SoA support
  - From: Rohit Garg

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] AVX/LRB and SSE with SoA support
Next by Date: [eigen] question on numerical stability of non-pivoting householder algos
Previous by thread: Re: [eigen] AVX/LRB and SSE with SoA support
Next by thread: Re: [eigen] InnerPanel change mis-detects alignment?

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/