Re: [eigen] Re: sse4 and integer multiplication

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


ok, last email, i promise.

my last benchmark was stupid: by constantly reallocating v it
prevented it from being cached, making the whole thing memory-bound.
Plus the time spent waiting for malloc.

New benchmark:



#include <Eigen/Dense>
using namespace Eigen;
using namespace std;

EIGEN_DONT_INLINE int foo(VectorXi& v, VectorXi& w)
{
  EIGEN_ASM_COMMENT("begin");
  v += (v.cwise()*v).cwise()*w;
  EIGEN_ASM_COMMENT("end");
  return v(ei_random<int>(0,999));
}

int main()
{
  VectorXi v = VectorXi::Random(1000);
  VectorXi w = VectorXi::Random(1000);
  for(int i = 0; i<1000000; i++) foo(v,w);
}


No vec:      6.797s
SSE4.1         1.819s
SSE2           2.024s



So SSE is faster.... but when one has clever guys like Rohit on the
team, SSE2 is good enough ;)

Benoit

2009/11/24 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> 2009/11/24 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>> Non-vectorized:    1.91 s
>> SSE4.1:             2.41 s
>
> oops, i meant the reverse:
>
> Non-vectorized:    2.41 s
> SSE4.1:             1.91 s
>
>>
>> so this time it's 26% faster...
>
> so yes this time sse4.1 is faster than nothing....
>
>> Cheers to Intel's marketing dept.
>
> ... but not as fast as Intel would have you believe, that's what i meant.
>
>
>> Benoit
>>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/