Re: [eigen] unaligned or not unaligned vectorization ? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Hi all,
First I agree, at least on Altivec unaligned stores have a tremendous impact
on performance (loads not that much).
Regarding the benchmark, just one question, was it done using totally random
alignment (non-aligned) for each iteration?
Konstantinos
On Thursday 03 July 2008 21:07:05 Gael Guennebaud wrote:
> Hi,
>
> today we had a discussion about the usefulness of unaligned
> vectorization. So here are some benchmark for a += a.cwiseProduct(b),
> where, e.g. U/A means Unaligned loads / Aligned stores:
>
>
> float:
>
> eigen A/A : 1.2163s 1.31546 GFlops
> eigen U/A : 1.71109s 0.935079 GFlops
> eigen U/U : 2.16024s 0.74066 GFlops
> Loop peeling + A/A : 0.932119s 1.71652 GFlops
> Loop peeling + U/A : 1.48324s 1.07872 GFlops
> Loop peeling + A/U : 1.1676s 1.37033 GFlops
> Loop peeling + U/U : 1.68971s 0.946908 GFlops
>
>
> float (no vectorization):
>
> eigen : 2.05874s 0.777173 GFlops
> Loop peeling : 2.27903s 0.702053 GFlops
>
>
>
> double:
>
> eigen A/A : 2.70669s 0.591128 GFlops
> eigen U/U : 2.75419s 0.580933 GFlops
> eigen U/A : 2.82088s 0.567199 GFlops
> Loop peeling + A/A : 1.98525s 0.805943 GFlops
> Loop peeling + U/A : 3.07734s 0.51993 GFlops
> Loop peeling + A/U : 2.44861s 0.653431 GFlops
> Loop peeling + U/U : 3.48922s 0.458555 GFlops
>
>
> double (no vectorization):
>
> eigen : 2.86233s 0.558985 GFlops
> Loop peeling : 3.10623s 0.515094 GFlops
>
> So, at least for SSE, there is currently no gain doing unaligned
> vectorization but it is worth removing the unaligned stores by first
> processing the unaligned coefficients of the result. So let's do it !
>
>
> cheers,
> Gael.