[eigen] unaligned or not unaligned vectorization ?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi,

today we had a discussion about the usefulness of unaligned
vectorization. So here are some benchmark for a += a.cwiseProduct(b),
where, e.g. U/A means Unaligned loads / Aligned stores:


float:

eigen A/A : 1.2163s   1.31546 GFlops
eigen U/A : 1.71109s   0.935079 GFlops
eigen U/U : 2.16024s   0.74066 GFlops
Loop peeling + A/A : 0.932119s  1.71652 GFlops
Loop peeling + U/A : 1.48324s  1.07872 GFlops
Loop peeling + A/U : 1.1676s  1.37033 GFlops
Loop peeling + U/U : 1.68971s  0.946908 GFlops


float (no vectorization):

eigen : 2.05874s   0.777173 GFlops
Loop peeling : 2.27903s  0.702053 GFlops



double:

eigen A/A : 2.70669s   0.591128 GFlops
eigen U/U : 2.75419s   0.580933 GFlops
eigen U/A : 2.82088s   0.567199 GFlops
Loop peeling + A/A : 1.98525s  0.805943 GFlops
Loop peeling + U/A : 3.07734s  0.51993 GFlops
Loop peeling + A/U : 2.44861s  0.653431 GFlops
Loop peeling + U/U : 3.48922s  0.458555 GFlops


double (no vectorization):

eigen : 2.86233s   0.558985 GFlops
Loop peeling : 3.10623s  0.515094 GFlops

So, at least for SSE, there is currently no gain doing unaligned
vectorization but it is worth removing the unaligned stores by first
processing the unaligned coefficients of the result. So let's do it !


cheers,
Gael.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/