[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen]
- From: Rohit Garg <rpg.314@xxxxxxxxx>
- Date: Mon, 12 Oct 2009 21:45:17 +0530
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=uUceBMWxqN3JoYWtpUHyxJ5AEjUqzWYhYaR0q2iXH9s=; b=VUPslRjGZE7m0+lmU4DA0m4xoRRIq9wVCQd/iOLmEKQwvbv1+ZrB2zuWf2EtYQ1ILS muhszPqRHw8XFShkJNnumCVuXEkDXFlJjN3PcY9TFfuzrUCH6khXc+STeFZ8NQUy//s2 NKCKMuCTYHIWrefqsjvBq5niNX5ezC0BlSLbE=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=mTEqBWSCkAy54+RmOm/8Ip1vGsYkM27tmrqRH352ninh9uUboe0kfINO6s04W89cZ2 8j/TKskn3KNM1NA0pmp7txGhbGRvwhMtIx4o7HnfyeXOtZx3XuxP665cABwmNc9SOAXY 96h3g00yRtV/7x7YxIOWZFfH6N2MZyCcSC5+E=
This is surprising. I didn't know that the compiler would pad
everything to the next highest multiple of 4. Even if one aligns a
float3 to 16 bytes, the last 4 bytes should be usable on stack for
other variables. It's not like you typedef'ed a float3 and aligned it.
Then if you declare an array of such type, I can understand that
compiler will pad the w component to make all elements of that array
aligned.
Then why this behavior?
On Mon, Oct 12, 2009 at 9:17 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> 2009/10/12 Rohit Garg <rpg.314@xxxxxxxxx>:
>> I am not sure regarding the round up of mem size bit. Why can't you
>> have a float3 array aligned at 16 byte boundary?
>
> Because that makes its sizeof() increase to 16 bytes instead of 12, so
> that incurs a +33% memory overhead for everybody when you allocate an
> array of N such vectors --- and even so, this isn't quite optimal,
> e.g. this doesn't give you a very easy way to vectorize the product of
> 3x3 matrices with 3-vectors. For Vector5f, the situation is different,
> the vectorization can always work (you can always fit a packet) but
> the memory overhead is even bigger: 32/20 = 1.6 so it's a +60% memory
> overhead.
>
> Have a look at the example program (attached):
>
> #include<iostream>
>
> template<int N> struct foo
> {
> __attribute__((aligned(16))) float f[N];
> foo()
> {
> std::cout << "sizeof(foo<" << N << ">) is " << sizeof(foo)
> << " instead of " << N*sizeof(float) << std::endl;
> }
> };
>
> int main()
> {
> foo<3>();
> foo<5>();
> }
>
>
> Output:
>
> $ g++ sizeof_aligned.cpp -o s && ./s
> sizeof(foo<3>) is 16 instead of 12
> sizeof(foo<5>) is 32 instead of 20
>
>
> Cheers,
> Benoit
>
--
Rohit Garg
http://rpg-314.blogspot.com/
Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay