[eigen] sse4 and integer multiplication |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
- Subject: [eigen] sse4 and integer multiplication
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Tue, 24 Nov 2009 15:21:27 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=dYSsU1hvqDoQSdYUclYZ4h3hvn2X6qqKsJKpuFhNuuk=; b=w1C4FAfw10rWF8PuYle8CO49xlgvkZo5oFIEEcc5IrWiXvch6c60RPsGFpsSmYJJ8x Sp/VYst9bpd0Q1xtdg6dOiMa98Atuki5TA/tyCU45B38kamYV85UXVo8WDftqwIO4dMv DkMbNkD9AgKW9l57S14vtYlhfWkvbEWb35GNA=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=Rxot6dEVxASLLDnba8KplUv1anzfEJcBWgGv+mBqTPsfZCDY3ko+GiRm5awy5ipiyt PxGL89Y9Iny6HVM1voTioOjFXIyVQ+j+Yjr4ikfhaeBlhD0Ui2egBS4HNX5i8WlB0jP/ 9y8W0GVefduTt/T31NFizuin9NmDGQaKd1NfA=
Hi,
i just added SSE4 integer mul support. It is an improvement over the
current vectorized integer multiplication where SSE4 is available, but
i am puzzled: here is my benchmark:
#include <Eigen/Dense>
using namespace Eigen;
using namespace std;
EIGEN_DONT_INLINE void foo()
{
// i was wondering if the cpu could be clever enough to
// optimize when the ints are 0 or 1; it's not so easy to
// ensure that we don't end up with only 0 and 1...
Vector4i v(5,-7,11,13);
Vector4i w(9,3,-5,-7);
for(int i = 0; i<100000000; i++)
{
EIGEN_ASM_COMMENT("begin");
v = v.cwise()*v;
v = v.cwise()*w;
EIGEN_ASM_COMMENT("end");
}
cout << v << endl;
}
int main()
{
foo();
}
OK so i'm puzzled because the fastest is... with no vectorization at all.
No vectorization: 0.57 sec
With SSE4.1: 0.81 sec
With SSE2: 1.21 sec
So i did what i usually do in such circumstances: dump the assembly
and go whine until daddy Gael takes care of me.
Without vec:
imull %edx, %edx
imull %eax, %eax
leal 0(,%rdx,8), %edi
imull %ebx, %ebx
leal (%rax,%rax,4), %eax
imull %ecx, %ecx
subl %edi, %edx
negl %eax
leal (%rbx,%rbx,8), %ebx
leal (%rcx,%rcx,2), %ecx
With SSE4.1:
movdqa %xmm1, %xmm0
pmulld %xmm1, %xmm0
pmulld (%rdx), %xmm0
movdqa %xmm0, %xmm1
movdqa %xmm0, (%rbp)
Cheers,
Benoit