|Re: [eigen] Vectorized(SSE) integer multiplication|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Vectorized(SSE) integer multiplication
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Sun, 8 Mar 2009 11:18:30 +0100
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=ip9JPnuhuDDX8yKYnFa0YS0jWPtVOrCtJJZZaoniNHE=; b=BgdxRiJmg9BwEYQ+8m9igQ17uniLDcgCjuHn1Fgq6qfZGXR0gP7PA0za0Sv4y5bIcN ufPNNlh7Yqm3yq/nIVrt2Ud8F0/QvV9zRScVRRUF5SopIICxAljMO/MBVvg3Ghh+xUE8 TCbqcFY2LOJR1uyEEHnmo4O3dg7mWcSR2+m5E=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Xk2RTbQEOktGJR1U2i1wHaTuxbt1YzlYTFOC3KP0TgwU5mS6uhkJX9Ci0mB/AMv66T xHnA3HCQAnvz2S/jo2upR1myF1/G3PDcGDwiLKaQgL4nf1wut1xb+OFL899EBw4B4zL8 G6I8ZC7XWI/vSnKvEAOd222ipSKCCoZj0fWys=
On Sun, Mar 8, 2009 at 7:02 AM, Rohit Garg <rpg.314@xxxxxxxxx> wrote:
> This file has my vectorized implementation (sse2) of multiplication of
> 4 integers. The eigen routine was taken from packetMath.h file. The
> benchmarks show small but noticeable difference.
> ~/Documents/numerical@rpg> g++ vec4i_mul.cpp -msse3 -O3 -march=native
> ~/Documents/numerical@rpg> ./a.out > /dev/null
> 1236491601 ei mul begins
> 1236491618 ei mul ends
> 1236491618my mul
> 1236491633 end
> The macros could be defined better I admit. They were taken from my
> implementation of vec4i multiplication which I wrote for my own needs
> earlier. They are same as for the quaternion routine I sent earlier.
> So please consider unifying them.
thanks, actually I expected the bitwise ops being faster than
shuffles; but that's not the case. I committed your change.
> BTW, this multiplication instruction that you (and I) are using does
> only unsigned multiplication. Signed multiplication is there as a
> single instruction in SSE4.1. So a small patch could be added for that
> too. the exact intrinsic is _mm_mul_epi32. My cpu doesn't have that,
> so I can't test it.
actually, here it works for negative integers too... and like, you, my
CPU does not support SSE4.1 so cannot try...
> Rohit Garg
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology