Re: [eigen]

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


I am not sure regarding the round up of mem size bit. Why can't you
have a float3 array aligned at 16 byte boundary?

OTOH, if you are prepared to live with float8 storage for getting the
vectorization, You can get vectorization NOW.

On Mon, Oct 12, 2009 at 8:02 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/10/11 dilas dilas <espiritusantu@xxxxxxx>:
>> Hello. I think Eigen to be really good library, but I've found some problem, which I can't to live with. :)
>> I employ x86 + SSE2 platform and Visual C++
>> When I use matrices with fixed sizes, which are not multiple of the packet size Eigen doesn't use vector instructions.
>> I tried to enforce it by using AutoAlign option, but it didn't worked.
>> for example for:
>>
>> Matrix<float,5,1,AutoAlign | ColMajor> a;
>> Matrix<float,5,1,AutoAlign | ColMajor> b;
>> Matrix<float,5,1,AutoAlign | ColMajor> c = a.cwise()  * b;
>>
>> Visual C++ generates five mulss-instructions, but I want one mulps and one mulss.
>>
>> So I've decided to round up first dimention to multiple of 4. And it worked. But in some cases, for example when we multiply 3x3 Matrix by 3x1 one this doesn't work, and we must round up second dimention either, but it's not on.
>>
>> So I have two questions:
>> 1. Do developers intend to reach behavior, which I've declared?
>
> In order to efficiently vectorize a Vector5f, we would have to align
> its array to 16-byte boundary. Which would make sizeof(Vector5f) grow
> from 20 to 32 bytes. Which would mean that if you allocate an array of
> N Vector5f's, the memory usage grows from 20N to 32N bytes. That's why
> we will never make that the default behavior. But yes, it would be
> nice to have that as a non-default option. Note that in the
> development branch, in unsupported/, we already have a AlignedVector3
> class that does something comparable that for vectors of size 3,
> though that is a little different: here, it does it all with SIMD
> instructions, ignoring the last component. That's all one can do for
> Vector3f. For Vector5f, it is better to do as you suggest: 1 packet +
> 1 scalar. That at least works in all cases.
>
>> 2. How can I myself change sources to reach this behavior?
>
> - If you're happy about a quick hack like AlignedVector3, just check
> its sources.
> - If you want the real solution as you're suggesting, hm, there's a
> bit of changes to make! Here are some starting points:
>  -- in MatrixStorage.h, in ei_matrix_array, always make it align the array.
>  -- in all the files that have meta-unrolled loops, well our
> meta-unrollers aren't usable in your case, so your best bet is to not
> use them, instead force the usage of non-unrolled paths, cross your
> fingers that the compiler auto-unrolls in your case, or write new
> meta-unrollers for these cases.
>
> I'd be OK to add a new non-default matrix option ForceAlign in
> addition to the existing AutoAlign (default) and DontAlign. Again the
> bulk of the work will be to extend / write new unrolled loops for
> Assign, for the products, etc...
>
> Benoit
>
>
>



-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/