Re: [eigen] AVX/LRB and SSE with SoA support

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


2010/8/6 keller <christoph-keller@xxxxxx>:
> Hello,
>
> i changed the code further and added a few things to Array, ArrayBase and
> Functors.h.
>
> The bitwise operations are quite important in my opinion, so they are
> operators (|,&,^,|=,&=,^=) in ArrayBase for me. If you do not like it
> (because of possible mistakes), one can also use the bitwiseOr etc.
> functions. The operators may then be made avaiable with sth. like
> USE_BITWISE_ARRAY_OPERATORS. I do not know how likely these mistakes are.
>
> The added Functions in Array like soam are not avaiable with Map. Therefore
> frS in rayb.cpp cannnot be a Map yet. This may not make a big difference,
> but its not optimal. What is the best way to change that?
>
> I am not sure how to improve the bitmask() and allNegative() functions in
> Array, to make them more general, in a good Eigen-consistent way. There may
> not even be an equivalent to _mm_movemask_ps in NEON and AltiVec.
>
> It is pretty amazing how close the speed of Eigencode is to inline asm and
> its far easier to read and write.
>
> I can make a patch using Mercurial and write some test functions, if needed.

Yes, both would be very useful (especially the patch).

Gael is a bit busy at the moment (just FYI).

Benoit

>
> Greetings,
> Christoph
>
> On 07/23/2010 01:19 PM, Gael Guennebaud wrote:
>>
>> Hi,
>>
>> I've attached a modified version of your code for Eigen3 as well as
>> the corresponding ASM generated with gcc 4.5.
>>
>>
>> On Fri, Jul 23, 2010 at 6:45 AM, keller<christoph-keller@xxxxxx>  wrote:
>>
>>>
>>> Hello,
>>>
>>> I implemented a function where a sphere cuts a Ray Bundle (Most problems
>>> can
>>> be demonstrated there). It may not be optimally implemented, but i think
>>> it
>>> works as a simple example.
>>>
>>> I did not yet put the data into blocks, as a few other problems occurred.
>>>
>>> I used eigen2 for this.
>>> -using VS2008 and the Intel compiler it compiles and runs correctly.
>>> -g++ prints out errors with cwise()
>>>
>>
>> Yes this is because when you call a template function on a template
>> type you have to use the template keyword, e.g.:
>>
>> mat.template block<rows,cols>(i,j)
>>
>>
>>
>>>
>>> The resulting code does not use SSE-Instructions that often. I do not
>>> know
>>> if this is because i used eigen2 in the wrong way. The compiler(VS2008)
>>> has
>>> SSE-Instructions enabled.
>>>
>>> Also the _mm_or_ps is avaiable through ei_por in Eigen3, but not via a
>>> normal interface function.
>>>
>>
>> This is something that would make sense to have upstream in the Array
>> API. The question is shall we use operator|  (i.e., A | B) or a very
>> explicit A.bitwiseOr(B) function. Since operator| is not defined for
>> non integral type I would go for the explicit bitwiseOr.
>>
>>
>>>
>>> _mm_movemask_ps is not avaiable at all.
>>>
>>
>> This one is more difficult to expose in a useful and portable way.
>> Perhaps something like:
>>
>> ei_pallpositive(v) / ei_pallnegative(v)
>>
>> for the low level functions, and perhaps in second time we could have
>> shortcuts for:
>>
>> (A>  0).all() and (A<  0).all()
>>
>> like
>>
>> A.allPositive() / A.allNegative()
>>
>> which would be built on top of the respective ei_p* functions, but I'm
>> not 100% sure and this is really specific use cases. I doubt that's
>> really critical even in your case.
>>
>> gael
>>
>>
>>>
>>> Is it a good idea at all to use Eigen, if one does not want to be more
>>> than
>>> about 20% slower than with a naive Intrinsics-implementation?
>>>
>>
>> For this kind of low level stuff you can attack Eigen at two level:
>> using Array<>  API, or directly using the ei_p* functions. The former
>> has the advantage to allow you to use "blocks" of multiple packets
>> with automatic unrolling, while the later offers more flexibility for
>> fine tuning...
>>
>>
>>
>>
>>>
>>> Greetings,
>>> Christoph
>>>
>>>
>>> On 07/03/2010 09:11 AM, Gael Guennebaud wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> first things first, supporting a new vector engine in eigen is
>>>> relatively easy. However, you won't be able to right code
>>>> "templatized" on the vector engine, and even with a custom library I
>>>> doubt that's doable. The choice of the vector engine has to be
>>>> compiler options. So, the way to support multiple vector engines at
>>>> runtime (I guess this is what you want?), is to have a trivial
>>>> function foo(...) selecting the right implementation function e.g.,
>>>> foo_sse(...), foo_sse4(...), etc. The foo_* functions are implemented
>>>> only once, by putting it in its own file which will be compiled
>>>> multiple times with different compiler options (e.g., -msse, -msse4,
>>>> etc.). The actual of the function (foo_sse) will be built using the
>>>> preprocessor, e.g.:
>>>>
>>>> EIGEN_CAT(foo_,VECTOR_ENGINE_SUFFIX)(...) { .... }
>>>>
>>>> Now regarding the special interleaved packing of the data, there is
>>>> currently no such thing in eigen, however, I think you can easily add
>>>> that on top of, e.g., a
>>>> Array<float,Dynamic,ei_packet_traits<float>::size,RowMajor>. It will
>>>> be initialized with dimension *
>>>>
>>>>
>>>> ((size/ei_packet_traits<float>::size)+((size%ei_packet_traits<float>::size)==0
>>>> ? 0 : 1) rows where dimension is three in your example, and size is
>>>> the number of element. You can easily get the i-th element as follow:
>>>>
>>>>
>>>>
>>>> underlying_array.block<dimension,1>((i/ei_packet_traits<float>::size)*dimension,
>>>> i%ei_packet_traits<float>::size) = Vector3f(x,y,z);
>>>>
>>>> The idea would be to add a class warping this underlying_array to make
>>>> it convenient to use. The main questions are what is the set of
>>>> features which have to be supported? How do we want to use it? through
>>>> a manual loop over the (e.g., 3x4) blocks? through functors? through
>>>> high level expression template code? etc.
>>>>
>>>> To finish, this is definitely something I planed to do the future, so
>>>> I'd be glad to discuss with you its design and help you to get it
>>>> right regarding Eigen.
>>>>
>>>> cheers,
>>>>
>>>> gael
>>>>
>>>> On Fri, Jul 2, 2010 at 11:56 PM, keller<christoph-keller@xxxxxx>
>>>>  wrote:
>>>>
>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> I have to implement a raytracer that uses the Vector units of modern
>>>>> x86
>>>>> cpus.
>>>>>
>>>>> Intel will introduce AVX soon. I have to support AVX and SSE and i do
>>>>> not
>>>>> want all of my code replicated. I currently use the Intels intrinsics..
>>>>> If
>>>>> Intel adds the Larrabee extensions into normal x86 CPUs one would even
>>>>> have
>>>>> to write three versions.
>>>>>
>>>>> As i use Eigen in parts of the software i wonder if one could support
>>>>> this
>>>>> with Eigen. The typical situation is like this.
>>>>> -One has a ray bundle with 16 rays that are arranged in SoA format:
>>>>> x1,..,x4
>>>>> y1,...,y4
>>>>> z1,...,z4
>>>>> .....
>>>>> x13,..,x16
>>>>> y13,...,y16
>>>>> z13,...,z16
>>>>> This format has to be changed when using AVX/LRB of course.
>>>>> -One source of the rays and an object like a sphere
>>>>>
>>>>> Ideally i have a function like testCut(Bundles, Sphere) that uses a
>>>>> template
>>>>> parameter to set the Vector extension to use. I do not want to
>>>>> partially
>>>>> specialize this function but use some structures and functions in
>>>>> testCut
>>>>> that depend on the template parameter.
>>>>>
>>>>> The question for me is: Develop a small library of my own, or use Eigen
>>>>> (where i can contribute this functionality). I know i can do this with
>>>>> a
>>>>> custom library, but i dont know if it is easy to add this functionality
>>>>> to
>>>>> Eigen.
>>>>>
>>>>> I think a lot of people will have the same problem soon.
>>>>>
>>>>> Greetings,
>>>>> Christoph
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/