Re: [eigen] unaligned or not unaligned vectorization ? |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
OK, thanks for the benchmark. From your numbers it looks like unaligned loads is even more expensive than unaligned stores though! Another interesting this is how even U/U is at least as fast as no vectorization -- at least it's not slower. Benoit On Thursday 03 July 2008 20:07:05 Gael Guennebaud wrote: > Hi, > > today we had a discussion about the usefulness of unaligned > vectorization. So here are some benchmark for a += a.cwiseProduct(b), > where, e.g. U/A means Unaligned loads / Aligned stores: > > > float: > > eigen A/A : 1.2163s 1.31546 GFlops > eigen U/A : 1.71109s 0.935079 GFlops > eigen U/U : 2.16024s 0.74066 GFlops > Loop peeling + A/A : 0.932119s 1.71652 GFlops > Loop peeling + U/A : 1.48324s 1.07872 GFlops > Loop peeling + A/U : 1.1676s 1.37033 GFlops > Loop peeling + U/U : 1.68971s 0.946908 GFlops > > > float (no vectorization): > > eigen : 2.05874s 0.777173 GFlops > Loop peeling : 2.27903s 0.702053 GFlops > > > > double: > > eigen A/A : 2.70669s 0.591128 GFlops > eigen U/U : 2.75419s 0.580933 GFlops > eigen U/A : 2.82088s 0.567199 GFlops > Loop peeling + A/A : 1.98525s 0.805943 GFlops > Loop peeling + U/A : 3.07734s 0.51993 GFlops > Loop peeling + A/U : 2.44861s 0.653431 GFlops > Loop peeling + U/U : 3.48922s 0.458555 GFlops > > > double (no vectorization): > > eigen : 2.86233s 0.558985 GFlops > Loop peeling : 3.10623s 0.515094 GFlops > > So, at least for SSE, there is currently no gain doing unaligned > vectorization but it is worth removing the unaligned stores by first > processing the unaligned coefficients of the result. So let's do it ! > > > cheers, > Gael.
Attachment:
signature.asc
Description: This is a digitally signed message part.
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |