[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen]
- From: Rohit Garg <rpg.314@xxxxxxxxx>
- Date: Mon, 12 Oct 2009 14:26:10 +0530
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=fCsCBOgyFcZHZVtQ1Db0ZNVsjD/3Ytn8MArNa87Je7I=; b=fFdk4PFmZU3ZvCdw6huC49W71DMTKvoTU8FJST6yWevTl56NLFWV5vcfjqwXkryhs8 0R4ej+BFnVIzXTiXwK2IAvqZrHeVQzrQIuct3DXd2zpG+Bo/6I6coh5PlIYWWjsUyNs6 TPHytapwesdSRMG/14HGtuoggMDyAx/0/NwqU=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=Er9nBCtbNRCncvsU+DRkOg46kSV/KN3LSa0Qd86uddSzHnIvxkkHS7cpr8n1KuIj7o bQnv6scsSvXrx3jbYVhCTttA006Qan0FeisyTdzIsYqy9uZ/gswYTyHly/jTHj9NLjCi VTnCQy9rSxNBCY+/s6zmNCcYhHJ8KggzQx+74=
I am not sure regarding the round up of mem size bit. Why can't you
have a float3 array aligned at 16 byte boundary?
OTOH, if you are prepared to live with float8 storage for getting the
vectorization, You can get vectorization NOW.
On Mon, Oct 12, 2009 at 8:02 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/10/11 dilas dilas <espiritusantu@xxxxxxx>:
>> Hello. I think Eigen to be really good library, but I've found some problem, which I can't to live with. :)
>> I employ x86 + SSE2 platform and Visual C++
>> When I use matrices with fixed sizes, which are not multiple of the packet size Eigen doesn't use vector instructions.
>> I tried to enforce it by using AutoAlign option, but it didn't worked.
>> for example for:
>> Matrix<float,5,1,AutoAlign | ColMajor> a;
>> Matrix<float,5,1,AutoAlign | ColMajor> b;
>> Matrix<float,5,1,AutoAlign | ColMajor> c = a.cwise() * b;
>> Visual C++ generates five mulss-instructions, but I want one mulps and one mulss.
>> So I've decided to round up first dimention to multiple of 4. And it worked. But in some cases, for example when we multiply 3x3 Matrix by 3x1 one this doesn't work, and we must round up second dimention either, but it's not on.
>> So I have two questions:
>> 1. Do developers intend to reach behavior, which I've declared?
> In order to efficiently vectorize a Vector5f, we would have to align
> its array to 16-byte boundary. Which would make sizeof(Vector5f) grow
> from 20 to 32 bytes. Which would mean that if you allocate an array of
> N Vector5f's, the memory usage grows from 20N to 32N bytes. That's why
> we will never make that the default behavior. But yes, it would be
> nice to have that as a non-default option. Note that in the
> development branch, in unsupported/, we already have a AlignedVector3
> class that does something comparable that for vectors of size 3,
> though that is a little different: here, it does it all with SIMD
> instructions, ignoring the last component. That's all one can do for
> Vector3f. For Vector5f, it is better to do as you suggest: 1 packet +
> 1 scalar. That at least works in all cases.
>> 2. How can I myself change sources to reach this behavior?
> - If you're happy about a quick hack like AlignedVector3, just check
> its sources.
> - If you want the real solution as you're suggesting, hm, there's a
> bit of changes to make! Here are some starting points:
> -- in MatrixStorage.h, in ei_matrix_array, always make it align the array.
> -- in all the files that have meta-unrolled loops, well our
> meta-unrollers aren't usable in your case, so your best bet is to not
> use them, instead force the usage of non-unrolled paths, cross your
> fingers that the compiler auto-unrolls in your case, or write new
> meta-unrollers for these cases.
> I'd be OK to add a new non-default matrix option ForceAlign in
> addition to the existing AutoAlign (default) and DontAlign. Again the
> bulk of the work will be to extend / write new unrolled loops for
> Assign, for the products, etc...
Department of Physics
Indian Institute of Technology