Re: [eigen] unaligned or not unaligned vectorization ?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


OK, thanks for the benchmark.

From your numbers it looks like unaligned loads is even more expensive than 
unaligned stores though!

Another interesting this is how even U/U is at least as fast as no 
vectorization -- at least it's not slower.

Benoit

On Thursday 03 July 2008 20:07:05 Gael Guennebaud wrote:
> Hi,
>
> today we had a discussion about the usefulness of unaligned
> vectorization. So here are some benchmark for a += a.cwiseProduct(b),
> where, e.g. U/A means Unaligned loads / Aligned stores:
>
>
> float:
>
> eigen A/A : 1.2163s   1.31546 GFlops
> eigen U/A : 1.71109s   0.935079 GFlops
> eigen U/U : 2.16024s   0.74066 GFlops
> Loop peeling + A/A : 0.932119s  1.71652 GFlops
> Loop peeling + U/A : 1.48324s  1.07872 GFlops
> Loop peeling + A/U : 1.1676s  1.37033 GFlops
> Loop peeling + U/U : 1.68971s  0.946908 GFlops
>
>
> float (no vectorization):
>
> eigen : 2.05874s   0.777173 GFlops
> Loop peeling : 2.27903s  0.702053 GFlops
>
>
>
> double:
>
> eigen A/A : 2.70669s   0.591128 GFlops
> eigen U/U : 2.75419s   0.580933 GFlops
> eigen U/A : 2.82088s   0.567199 GFlops
> Loop peeling + A/A : 1.98525s  0.805943 GFlops
> Loop peeling + U/A : 3.07734s  0.51993 GFlops
> Loop peeling + A/U : 2.44861s  0.653431 GFlops
> Loop peeling + U/U : 3.48922s  0.458555 GFlops
>
>
> double (no vectorization):
>
> eigen : 2.86233s   0.558985 GFlops
> Loop peeling : 3.10623s  0.515094 GFlops
>
> So, at least for SSE, there is currently no gain doing unaligned
> vectorization but it is worth removing the unaligned stores by first
> processing the unaligned coefficients of the result. So let's do it !
>
>
> cheers,
> Gael.


Attachment: signature.asc
Description: This is a digitally signed message part.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/