Re: [eigen] Status of AVX support |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Status of AVX support
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Thu, 23 Feb 2012 13:07:56 +0100
- Authentication-results: mr.google.com; spf=pass (google.com: domain of gael.guennebaud@xxxxxxxxx designates 10.50.208.74 as permitted sender) smtp.mail=gael.guennebaud@xxxxxxxxx; dkim=pass header.i=gael.guennebaud@xxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=9VqRxuPcuZkTXDAZTBTH6jf0Y1De7phBd7OtbEwnoV8=; b=CA6w5Bq27tyeT/JeRCy2SEK8Gsi9/GeVBY9aBpG8F+6o/PwkQgUqNxXmahkDW9Q3dZ hFJt7hpaUDwx6fgsN6hsSfPw/BEed+9OUvDJZVM7SIL1IjJi20r3yvM47XxxZ4lIdwq7 s4ZidARd2Sokf62OUespalkOpFcN2IDoxeLmo=
Sure, if the performance penalty is really low, then that considerably
simplify our work. So, the first thing to do would be to benchmark the
real performance penalty between a 32 and 16 byte alignment.
gael
On Wed, Feb 22, 2012 at 2:55 PM, Eamon Nerbonne <eamon@xxxxxxxxxxxx> wrote:
> I happened across
> http://stackoverflow.com/questions/6546275/what-are-the-alignment-restrictions-on-the-new-haswell-avx-gather-instruction,
> which notes the fact that most AVX instructions don't actually require
> alignment. Might it not thus be possible to simply use AVX everywhere, and
> opportunistically use 32-byte alignment only where easy (for the extra
> performance)?
>
> (Sorry for the dead thread revival, if that's objectionable)
>
> --Eamon
> eamon@xxxxxxxxxxxx - Tel#:+31-6-15142163
>
>
>
> On Sat, Dec 10, 2011 at 18:58, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
>>
>> On Wed, Dec 7, 2011 at 11:31 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
>> wrote:
>> > 2011/12/7 Rhys Ulerich <rhys.ulerich@xxxxxxxxx>:
>> >>> W.r.t porting to AVX: Be aware that there might be some pitfalls with
>> >>> AVX-performance:
>> >>> http://www.agner.org/optimize/blog/read.php?i=142
>> >>
>> >> Interesting tidbit from that link "If the programmer inadvertently
>> >> mixes AVX and non-AVX vector instructions in the same code then there
>> >> is a penalty of 70 clock cycles for each transition between the two
>> >> forms."
>> >
>> > Between this, and the fact that we can't 32-byte-align Vector4d
>> > without breaking the ABI, I'm starting to wonder if maybe we should
>> > treat AVX as a dynamic-size-only thing and completely give up on AVX
>> > for fixed-size objects? For dynamic-size objects, the situation is
>> > much simpler, we can increase the alignment without breaking the ABI
>> > and we can assume that objects are large so that AVX is always better
>> > than SSE.
>>
>> That is a good idea. The fixed size objects would be very small
>> anyway, so not using AVX wouldn't hurt much.
>>
>> Would it be any easier for implementing AVX for just the dynamic objects?
>> >
>> > In any case, I think we should start by doing AVX for dynamic-size
>> > objects only, it will be time to think about fixed-size later.
>> >
>> > Benoit
>> >
>> >>
>> >> Thank you for the pointer to the blog,
>> >> Rhys
>> >>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Rohit Garg
>>
>> http://rpg-314.blogspot.com/
>>
>> Graduate Student
>> Applied and Engineering Physics
>> Cornell University
>>
>>
>