Re: [eigen] Re: sse4 and integer multiplication

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] Re: sse4 and integer multiplication
From: Rohit Garg <rpg.314@xxxxxxxxx>
Date: Wed, 25 Nov 2009 10:59:36 +0530
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=vbSeu9xdU04tPkSiyqSRy8jfdliRp4n3PR1prlMfyt4=; b=dY5dDkrE9v2VAp/YzERxubSoRQwzfIHnPU6epo3V3iqc6L5QHK+axjMwFZIEcd9LIL xfNo1+xb648M3hMptprCFUEKJiWXf+RAM3ULYi5xIIhaacqG6wyI1mQ+PwLXmINzl7ca kCf9IhKUqlj1Vfq1XJ4qa3Qm3uTuGbanYK468=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=APunYs3+1mddeDeGXTi7me5vLJpZFHMRBN1epCdMtIqyQOcThXkaMTO/ABdbOayviL UPTNnQDdj4YWSh9TerZQPGALcgx9fzyTvtvPUByh7ZzDMYw/dQsjjPYNxIB9CR4bmpgT 4ss122mgNqRtOBHPaidlU4mi2kdqUNBzM0NRU=

On Wed, Nov 25, 2009 at 2:42 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
> ok, last email, i promise.
>
> my last benchmark was stupid: by constantly reallocating v it
> prevented it from being cached, making the whole thing memory-bound.
> Plus the time spent waiting for malloc.
>
> New benchmark:
>
>
>
> #include <Eigen/Dense>
> using namespace Eigen;
> using namespace std;
>
> EIGEN_DONT_INLINE int foo(VectorXi& v, VectorXi& w)
> {
>  EIGEN_ASM_COMMENT("begin");
>  v += (v.cwise()*v).cwise()*w;
>  EIGEN_ASM_COMMENT("end");
>  return v(ei_random<int>(0,999));
> }
>
> int main()
> {
>  VectorXi v = VectorXi::Random(1000);
>  VectorXi w = VectorXi::Random(1000);
>  for(int i = 0; i<1000000; i++) foo(v,w);
> }
>
>
> No vec:      6.797s
> SSE4.1         1.819s
> SSE2           2.024s

This is definitely better. Both as a benchmark and as an end result.
Just out of curiosity, how do the results change when you use profile
guided optimization?
>
>
>
> So SSE is faster.... but when one has clever guys like Rohit on the
> team, SSE2 is good enough ;)
>
> Benoit
>
> 2009/11/24 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>> 2009/11/24 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>>> Non-vectorized:    1.91 s
>>> SSE4.1:             2.41 s
>>
>> oops, i meant the reverse:
>>
>> Non-vectorized:    2.41 s
>> SSE4.1:             1.91 s
>>
>>>
>>> so this time it's 26% faster...
>>
>> so yes this time sse4.1 is faster than nothing....
>>
>>> Cheers to Intel's marketing dept.
>>
>> ... but not as fast as Intel would have you believe, that's what i meant..
>>
>>
>>> Benoit
>>>
>>
>
>
>



-- 
Rohit Garg

http://rpg-314.blogspot.com/

Senior Undergraduate
Department of Physics
Indian Institute of Technology
Bombay

Follow-Ups:
- Re: [eigen] Re: sse4 and integer multiplication
  - From: Benoit Jacob

References:
- [eigen] sse4 and integer multiplication
  - From: Benoit Jacob
- [eigen] Re: sse4 and integer multiplication
  - From: Benoit Jacob
- Re: [eigen] Re: sse4 and integer multiplication
  - From: Gael Guennebaud
- Re: [eigen] Re: sse4 and integer multiplication
  - From: Benoit Jacob
- Re: [eigen] Re: sse4 and integer multiplication
  - From: Benoit Jacob
- Re: [eigen] Re: sse4 and integer multiplication
  - From: Benoit Jacob

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] Re: 2.0.10 on November 25?
Next by Date: Re: [eigen] Support for true Array
Previous by thread: Re: [eigen] Re: sse4 and integer multiplication
Next by thread: Re: [eigen] Re: sse4 and integer multiplication

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/