Re: [eigen] unaligned or not unaligned vectorization ?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi all, 
First I agree, at least on Altivec unaligned stores have a tremendous impact 
on performance (loads not that much).
Regarding the benchmark, just one question, was it done using totally random 
alignment (non-aligned) for each iteration?

Konstantinos

On Thursday 03 July 2008 21:07:05 Gael Guennebaud wrote:
> Hi,
>
> today we had a discussion about the usefulness of unaligned
> vectorization. So here are some benchmark for a += a.cwiseProduct(b),
> where, e.g. U/A means Unaligned loads / Aligned stores:
>
>
> float:
>
> eigen A/A : 1.2163s   1.31546 GFlops
> eigen U/A : 1.71109s   0.935079 GFlops
> eigen U/U : 2.16024s   0.74066 GFlops
> Loop peeling + A/A : 0.932119s  1.71652 GFlops
> Loop peeling + U/A : 1.48324s  1.07872 GFlops
> Loop peeling + A/U : 1.1676s  1.37033 GFlops
> Loop peeling + U/U : 1.68971s  0.946908 GFlops
>
>
> float (no vectorization):
>
> eigen : 2.05874s   0.777173 GFlops
> Loop peeling : 2.27903s  0.702053 GFlops
>
>
>
> double:
>
> eigen A/A : 2.70669s   0.591128 GFlops
> eigen U/U : 2.75419s   0.580933 GFlops
> eigen U/A : 2.82088s   0.567199 GFlops
> Loop peeling + A/A : 1.98525s  0.805943 GFlops
> Loop peeling + U/A : 3.07734s  0.51993 GFlops
> Loop peeling + A/U : 2.44861s  0.653431 GFlops
> Loop peeling + U/U : 3.48922s  0.458555 GFlops
>
>
> double (no vectorization):
>
> eigen : 2.86233s   0.558985 GFlops
> Loop peeling : 3.10623s  0.515094 GFlops
>
> So, at least for SSE, there is currently no gain doing unaligned
> vectorization but it is worth removing the unaligned stores by first
> processing the unaligned coefficients of the result. So let's do it !
>
>
> cheers,
> Gael.




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/