Re: [eigen] New(?) way to make using SIMD easier |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] New(?) way to make using SIMD easier
- From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
- Date: Wed, 25 Nov 2009 12:43:23 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=CJyWcD6dH5MGAr8yz7AaHGk00XbHWzvyhpY4st5aZx8=; b=a57mntTTMHK+0XFrgarORf6EGkWzOvum94XkJucheVEiCUfD3/UA6RYjRkTv8JPPEL WKNo6fPW+JOVXG3wwhXt9WWzPu7nZKGKZc+s2kotARc2MBrh7+j1brZWsaZOuznyrGuD ufbXQOenKzqWZ0gxjx0joPWpf3k9iUTQzkoAc=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=JQu4xsenDfgUzcqMBtOS645qjWjLYkBqFh3E+sCjbqTht7nS82BKm5zL9gu3ZwlmLw P3YPt0N/5LnVxt4wAxFZo06E28u86gVYuKOTFAD6Y+Axzm87AY8nBz+HziKyJ/NqGxkP SBWHDYAnsgNRwZgwHIJ6TMvf7teYoZyPf13aA=
2009/11/25 Mark Borgerding <mark@xxxxxxxxxxxxxx>:
> On 11/24/2009 11:51 AM, Benoit Jacob wrote:
>>
>> ah and also.
>>
>> if you just want a generic easy-to-use way of performing a SIMD
>> operation on arrays in memory... then you can do even much simpler:
>> just use Map and do your operation on that. like:
>>
>> VectorXf::Map(dstPtr,num)
>> = VectorXf::Map(srcPtr1,num)
>> + VectorXf::Map(srcPtr2,num);
>>
>> that compiles to just what you wanted. well except that it adds some
>> code to deal with unaligned boundaries; but if 'num' is known at
>> compile time then you avoid that by using Matrix<float,num,1> instead
>> of VectorXf.
>>
>>
>
> I love this syntax and was excited to start to use it more in some of our
> legacy code.
> ...
> Then, I did a benchmark comparing the speed of the above to that of a very
> simple C-style function using SSE(see "vector_add" in attached testmap..cc).
> The simple function was *much* faster with both the intel compiler (11.0
> 20081105) and with g++ (4.4.1 20090725). See the output below.
That's because I forgot to tell you that when the pointers are known
to be aligned, you need to tell that to Eigen, otherwise it can't
guess it (at least not without incurring a constant overhead).
So just use MapAligned() instead of Map() (note: that requires the
development branch). Actually I tried and now it has exactly the same
speed as your simple version:
$ g++ testmap.cc -I ../eigen -O2 -DNDEBUG -o t && ./t
With simple function, iterations=6000000, elements=512 took 1.39999s.
rate=2194.3 MS/s
With VectorXf::Map, iterations=6000000, elements=512 took 1.40002s.
rate=2194.25 MS/s
With simple function, iterations=6000000, elements=512 took 1.39974s.
rate=2194.69 MS/s
With VectorXf::Map, iterations=6000000, elements=512 took 1.39972s.
rate=2194.72 MS/s
> I'm aware that this simple case does not showcase the metaprogramming
> goodies that allow one to chain more complicated operations together. With
> that said, why cannot Eigen come close to the speed of a simple function
> when all one wants to do is add two vectors together?
As you see, it can, provided you give it the same amount of
information at compile-time ;)
Benoit
#include <malloc.h>
#include <sys/time.h>
#include <time.h>
#include <iostream>
#include <Eigen/Core>
using namespace std;
using namespace Eigen;
inline double curtime(void)
{
struct timeval tv;
if ( gettimeofday(&tv, NULL) != 0)
perror("gettimeofday");
return (double)tv.tv_sec + (double)tv.tv_usec*.000001;
}
inline
ptrdiff_t ptr2int(const void * ptr)
{
return (ptrdiff_t)ptr;
}
void vector_add(float * dst,const float * src1,const float * src2,int n)
{
int k=0;
#ifdef __SSE__
bool all_aligned = (0 == (15 & ( ptr2int(dst) | ptr2int(src1) | ptr2int(src2) ) ) );
if (all_aligned) {
for (; k+4<=n;k+=4)
_mm_store_ps(dst+k, _mm_add_ps(_mm_load_ps(src1+k),_mm_load_ps(src2+k) ) );
}
#endif
for (;k<n;++k)
dst[k] = src1[k] + src2[k];
}
int main(int argc, char ** argv)
{
const unsigned int nel = 512;
const unsigned int nit = 6000000;
double t0,t1,t2;
float * dstPtr = (float*)memalign(16,nel*sizeof(float));
float * srcPtr1 = (float*)memalign(16,nel*sizeof(float));
float * srcPtr2 = (float*)memalign(16,nel*sizeof(float));
for (int testcase=0;testcase<4;++testcase) {
for (int k=0;k<nel;++k) {
dstPtr[k] = 0;
srcPtr1[k] = rand();
srcPtr2[k] = rand();
}
string testname;
t0 = curtime();
if (testcase&1) {
testname = "VectorXf::Map";
for (int i=0;i<nit;++i) {
VectorXf::MapAligned(dstPtr,nel) = VectorXf::MapAligned(srcPtr1,nel) + VectorXf::MapAligned(srcPtr2,nel);
//srcPtr1[i&(nel-1)] = dstPtr[0]; // trick the compiler from knowing that it is doing the same thing over and over
}
}else{
testname = "simple function";
for (int i=0;i<nit;++i) {
vector_add(dstPtr,srcPtr1,srcPtr2,nel);
//srcPtr1[i&(nel-1)] = dstPtr[0]; // trick the compiler from knowing that it is doing the same thing over and over
}
}
t1 = curtime();
cout << " With " << testname << ", iterations=" << nit << ", elements=" << nel
<< " took " << (t1-t0) <<"s. rate=" << (1e-6*(nit*nel)/(t1-t0))<<" MS/s\n";
}
free(dstPtr);
free(srcPtr1);
free(srcPtr2);
return 0;
}