|Re: [eigen] 32 byte alignment for avx|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] 32 byte alignment for avx
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Sun, 27 Nov 2011 17:31:39 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=rGqEli15ecY1LXgH0mtoOA7otJd00Z8CFl9t4mAjCAo=; b=EwKn6HkVV4tZo7+oi7T/qnGmW/2tLbuDTZ0bb3fkJLDV/wybgXGhCtHY2i+msGUNnd 6w7tkEWNQtJIJUUTutptSSfzHPo2hYKXFZ/1QFKfXMAWC2SaT2ZTw8+BkfO/Um+Rroba rT0FvsusTwBUHo98ASjLptsnNsAF/s8Z2+d3c=
sure, at some point this information will have to be carried out by
the packet_traits class, that's the plan.
On Sun, Nov 27, 2011 at 4:16 PM, Mark Borgerding <mark@xxxxxxxxxxxxxx> wrote:
> On 11/26/2011 05:50 PM, Benoit Jacob wrote:
>> We never align to more than 16 bytes, so requiring higher alignment than
>> that is wrong.
> Has anyone talked about aligning to 32 byte boundaries for AVX SIMD?
> For those who don't know: Starting with Sandy Bridge, intel processors have
> a new flavor of SIMD: AVX, which uses 256 bit registers. This allows us to
> do the same operation to 8 floats at once, vs. 4 floats at once using SSE
> thru SSE4.2. The instruction set is structured such that future processor
> generations may use 512 bits or beyond.
> I've done a bit for my work and the speedups are impressive. Not quite the
> 2x that the increase in register width would suggest, but often 30-60%
> speedups on CPU-intensive problems.
> -- Mark