Re: [eigen] Re: sse4 and integer multiplication |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Re: sse4 and integer multiplication
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Wed, 25 Nov 2009 07:25:58 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=J2kO7nYFF8Ji785fReONN0qs/HdcrnM+rUUDK6MiVGM=; b=QaNdMwo9EBpVvozp2fTcMgAPQuf+SoNBEyYa2NAHBdxN3mV23g/YqP1VFDhvzMEM8a tShyRVUqc8H/h15J1PU21AN/IcPml6Vyid7UCFBZYxAfbIE33Y4mhAqqzqe74jKO4iZD tMsiQqpO9inFqrnqq0QQwmYy+Lxr1HT+7wdL0=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=IrXy/aOfOubTCm7iym4R+74Oe9/yX4NOFZl1+90G3DSU79Y3w5tw3SWKEUkdZ2k2Gc rG2byNr1h17LZxMBfJ923SzbKiG38ljeKaKP8wd+6YIWTgbz0BD4PljZlwweHzl9Wxw1 UXj1TEJxzGw6HWcr+qg534bSsBaQ9lUEeDzPw=
2009/11/25 Rohit Garg <rpg.314@xxxxxxxxx>:
> On Wed, Nov 25, 2009 at 2:42 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
>> ok, last email, i promise.
>>
>> my last benchmark was stupid: by constantly reallocating v it
>> prevented it from being cached, making the whole thing memory-bound.
>> Plus the time spent waiting for malloc.
>>
>> New benchmark:
>>
>>
>>
>> #include <Eigen/Dense>
>> using namespace Eigen;
>> using namespace std;
>>
>> EIGEN_DONT_INLINE int foo(VectorXi& v, VectorXi& w)
>> {
>> EIGEN_ASM_COMMENT("begin");
>> v += (v.cwise()*v).cwise()*w;
>> EIGEN_ASM_COMMENT("end");
>> return v(ei_random<int>(0,999));
>> }
>>
>> int main()
>> {
>> VectorXi v = VectorXi::Random(1000);
>> VectorXi w = VectorXi::Random(1000);
>> for(int i = 0; i<1000000; i++) foo(v,w);
>> }
>>
>>
>> No vec: 6.797s
>> SSE4.1 1.819s
>> SSE2 2.024s
>
> This is definitely better. Both as a benchmark and as an end result.
> Just out of curiosity, how do the results change when you use profile
> guided optimization?
I don't know: actually I didn't know about profile guided optimization
and had to google for it. I found that:
http://mituzas.lt/2009/07/27/profile-guided-optimization-with-gcc/
i'll tell you if i try it...
Benoit
>>
>>
>>
>> So SSE is faster.... but when one has clever guys like Rohit on the
>> team, SSE2 is good enough ;)
>>
>> Benoit
>>
>> 2009/11/24 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>>> 2009/11/24 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>>>> Non-vectorized: 1.91 s
>>>> SSE4.1: 2.41 s
>>>
>>> oops, i meant the reverse:
>>>
>>> Non-vectorized: 2.41 s
>>> SSE4.1: 1.91 s
>>>
>>>>
>>>> so this time it's 26% faster...
>>>
>>> so yes this time sse4.1 is faster than nothing....
>>>
>>>> Cheers to Intel's marketing dept.
>>>
>>> ... but not as fast as Intel would have you believe, that's what i meant.
>>>
>>>
>>>> Benoit
>>>>
>>>
>>
>>
>>
>
>
>
> --
> Rohit Garg
>
> http://rpg-314.blogspot.com/
>
> Senior Undergraduate
> Department of Physics
> Indian Institute of Technology
> Bombay
>
>
>