[eigen] unaligned or not unaligned vectorization ? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: [eigen] unaligned or not unaligned vectorization ?
- From: "Gael Guennebaud" <gael.guennebaud@xxxxxxxxx>
- Date: Thu, 3 Jul 2008 20:07:05 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type:content-transfer-encoding :content-disposition; bh=s+jynW3s2Dmt7VhNzXVtK2UrHSY1hiJTqQikh29FGSE=; b=Xx1YNyuzy0LzW2qbg6qc4zJ3VXakNG5HqfnA75cSMkvAZ1HJN0Cij2JIRgKlQ9Bbhg oijfWSZMV2OFAF/NCxaqwLSR02JLrv160ayWSZrdqxdFLUwe0imQJdzJKAg7GBKgwcsh /4pgMCqi7XCqkD+9vSRxGlcKQXMJUgiFwFVwY=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition; b=wBhPS9aDkyYV5r9ZJHZXEKGpEW3DR/00eKxR3YZ5FfGOUZithM2G2TCZ1V9Lae2eDR VI9XQasxtEu3EM0dzHSHhzy8HYtXPTsjaf5R1LXhXF2EEFPkHxByhwwwGlnV6BDhoi2T XRF1DTKjaNoqqdpsTr/fmZjhrZ8biyqPdewjg=
Hi,
today we had a discussion about the usefulness of unaligned
vectorization. So here are some benchmark for a += a.cwiseProduct(b),
where, e.g. U/A means Unaligned loads / Aligned stores:
float:
eigen A/A : 1.2163s 1.31546 GFlops
eigen U/A : 1.71109s 0.935079 GFlops
eigen U/U : 2.16024s 0.74066 GFlops
Loop peeling + A/A : 0.932119s 1.71652 GFlops
Loop peeling + U/A : 1.48324s 1.07872 GFlops
Loop peeling + A/U : 1.1676s 1.37033 GFlops
Loop peeling + U/U : 1.68971s 0.946908 GFlops
float (no vectorization):
eigen : 2.05874s 0.777173 GFlops
Loop peeling : 2.27903s 0.702053 GFlops
double:
eigen A/A : 2.70669s 0.591128 GFlops
eigen U/U : 2.75419s 0.580933 GFlops
eigen U/A : 2.82088s 0.567199 GFlops
Loop peeling + A/A : 1.98525s 0.805943 GFlops
Loop peeling + U/A : 3.07734s 0.51993 GFlops
Loop peeling + A/U : 2.44861s 0.653431 GFlops
Loop peeling + U/U : 3.48922s 0.458555 GFlops
double (no vectorization):
eigen : 2.86233s 0.558985 GFlops
Loop peeling : 3.10623s 0.515094 GFlops
So, at least for SSE, there is currently no gain doing unaligned
vectorization but it is worth removing the unaligned stores by first
processing the unaligned coefficients of the result. So let's do it !
cheers,
Gael.