Re: [eigen] New(?) way to make using SIMD easier

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] New(?) way to make using SIMD easier
From: Benoit Jacob <jacob.benoit.1@xxxxxxxxx>
Date: Wed, 25 Nov 2009 16:08:57 -0500
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=qMHZoXMVHG+bwOwynvCQ7nm1U921op43uvqh5fUUqJI=; b=pXLhZ4ii73ZXBPi8YiQY6BHPRub+MEp5/FY5jBE918hPOZV4BU54xvLXXeYPcWJuYe 7ilQZHptlt0sCnOjOQtLodOSrKfp/9LKcW6sYIT2BxjqH0uIKV3U8xVEGJ6mFf6D9H68 BAALibNiJsG12FF61aI2K4rRMHmk+Nc0oG3LQ=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=AlPaFt7j6U/bjVazCsl1WUyrhav+xr/yFRjloIQT5hLTAx9sLubYJuTJTYf5rNcxW7 PtauFPLMlTQu+slkyWsZ/xeRImiwJ8RNO0nBGlhox7LzZW3/D91iZ6W3QBLs6t1jlSUT Pewqok4jpbVD5h8JtLzA5aqnddcLz268sGvVE=

2009/11/25 Mark Borgerding <mark@xxxxxxxxxxxxxx>:
> On 11/25/2009 12:43 PM, Benoit Jacob wrote:
>>
>> 2009/11/25 Mark Borgerding<mark@xxxxxxxxxxxxxx>:
>>
>>>
>>> On 11/24/2009 11:51 AM, Benoit Jacob wrote:
>>>
>>>>
>>>> VectorXf::Map(dstPtr,num)
>>>>   = VectorXf::Map(srcPtr1,num)
>>>>   + VectorXf::Map(srcPtr2,num);
>>>>
>>>
>>> I love this syntax and was excited to start to use it more in some of our
>>> legacy code.
>>> ...
>>> Then,  I did a benchmark  comparing the speed of the above to that of a
>>> very
>>>  simple C-style function using SSE(see "vector_add" in attached
>>> testmap.cc).
>>> The simple function was *much* faster with both the intel compiler (11.0
>>> 20081105)  and with g++ (4.4.1 20090725). See the output below.
>>>
>>
>> That's because I forgot to tell you that when the pointers are known
>> to be aligned, you need to tell that to Eigen, otherwise it can't
>> guess it (at least not without incurring a constant overhead).
>>
>> So just use MapAligned() instead of Map()  (note: that requires the
>> development branch). Actually I tried and now it has exactly the same
>> speed as your simple version:
>>
>> $ g++ testmap.cc -I ../eigen -O2 -DNDEBUG -o t&&  ./t
>>
>
> You did not use any -msse* flags.  So neither version is using SIMD.

I am on a 64-bit machine, so SSE2 is implicit. Both versions are using
SIMD. Actually here is the assembly generated by the Eigen version:

	xorl	%eax, %eax
	.p2align 4,,10
	.p2align 3
..L21:
	movaps	(%rbp,%rax), %xmm0
	addps	(%r12,%rax), %xmm0
	movaps	%xmm0, (%rbx,%rax)
	addq	$16, %rax
	cmpq	$2048, %rax
	jne	.L21

> After switching to MapAligned ( from hg tip), it helped a little, but I
> still see almost a 2x difference.
>
> g++ -I.. -O3 -msse -msse2 -msse3    -c -o testmap.o testmap.cc
> g++ -o testmap testmap.o
> ./testmap
>  With simple function, iterations=6000000, elements=512 took 0.690981s.
> rate=4445.85 MS/s
>  With VectorXf::Map, iterations=6000000, elements=512 took 1.29193s..
> rate=2377.84 MS/s
>  With simple function, iterations=6000000, elements=512 took 0.671556s.
> rate=4574.45 MS/s
>  With VectorXf::Map, iterations=6000000, elements=512 took 1.27064s..
> rate=2417.67 MS/s

Strange. I can't reproduce this here, although i too have gcc 4.4,
even using the exact same command lines as you do.

Can you try -DNDEBUG ? Here it makes a small but noticeable difference.

Otherwise the most likely explanation is the difference between x86
and x86-64. Can you generate the asm and send it? Find attached a
modified source file to emit asm comments at the right place (like i
used above).

Cheers
Benoit

#include <malloc.h>
#include <sys/time.h>
#include <time.h>
#include <iostream>
#include <Eigen/Core>

using namespace std;
using namespace Eigen;

inline double curtime(void)
{
    struct timeval tv;
    if ( gettimeofday(&tv, NULL) != 0)
        perror("gettimeofday");
    return (double)tv.tv_sec + (double)tv.tv_usec*.000001;
}

inline
ptrdiff_t ptr2int(const void * ptr)
{
    return (ptrdiff_t)ptr;
}

void vector_add(float * dst,const float * src1,const float * src2,int n)
{
    int k=0;
#ifdef __SSE__
    bool all_aligned = (0 == (15 & ( ptr2int(dst) | ptr2int(src1) | ptr2int(src2) ) ) );
    if (all_aligned) {
        for (; k+4<=n;k+=4)
            _mm_store_ps(dst+k, _mm_add_ps(_mm_load_ps(src1+k),_mm_load_ps(src2+k) ) );
    }
#endif
    for (;k<n;++k) 
        dst[k] = src1[k] + src2[k];
}

int main(int argc, char ** argv)
{
    const unsigned int nel = 512;
    const unsigned int nit = 6000000;
    double t0,t1,t2;
    float * dstPtr = (float*)memalign(16,nel*sizeof(float));
    float * srcPtr1 = (float*)memalign(16,nel*sizeof(float));
    float * srcPtr2 = (float*)memalign(16,nel*sizeof(float));

    for (int testcase=0;testcase<4;++testcase) {
        for (int k=0;k<nel;++k) {
            dstPtr[k] = 0;
            srcPtr1[k] = rand();
            srcPtr2[k] = rand();
        }

        string testname;
        t0 = curtime();
        if (testcase&1) {
            testname = "VectorXf::Map";
            for (int i=0;i<nit;++i) {
              EIGEN_ASM_COMMENT("begin eigen");
                VectorXf::MapAligned(dstPtr,nel) = VectorXf::MapAligned(srcPtr1,nel) + VectorXf::MapAligned(srcPtr2,nel);
                //srcPtr1[i&(nel-1)] = dstPtr[0]; // trick the compiler from knowing that it is doing the same thing over and over
              EIGEN_ASM_COMMENT("end eigen");
            }
        }else{
            testname = "simple function";
            for (int i=0;i<nit;++i) {
                vector_add(dstPtr,srcPtr1,srcPtr2,nel);
                //srcPtr1[i&(nel-1)] = dstPtr[0]; // trick the compiler from knowing that it is doing the same thing over and over
            }
        }
        t1 = curtime();
        cout << " With " << testname << ", iterations=" << nit << ", elements=" << nel 
            << " took " << (t1-t0) <<"s. rate=" << (1e-6*(nit*nel)/(t1-t0))<<" MS/s\n";
    }
    free(dstPtr);
    free(srcPtr1);
    free(srcPtr2);
    return 0;
}

Follow-Ups:
- Re: [eigen] New(?) way to make using SIMD easier
  - From: Mark Borgerding
- Re: [eigen] New(?) way to make using SIMD easier
  - From: Gael Guennebaud

References:
- [eigen] New(?) way to make using SIMD easier
  - From: Mark Borgerding
- Re: [eigen] New(?) way to make using SIMD easier
  - From: Benoit Jacob
- Re: [eigen] New(?) way to make using SIMD easier
  - From: Benoit Jacob
- Re: [eigen] New(?) way to make using SIMD easier
  - From: Mark Borgerding
- Re: [eigen] New(?) way to make using SIMD easier
  - From: Benoit Jacob
- Re: [eigen] New(?) way to make using SIMD easier
  - From: Mark Borgerding

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] New(?) way to make using SIMD easier
Next by Date: Re: [eigen] Eigen and rigid body simulation
Previous by thread: Re: [eigen] New(?) way to make using SIMD easier
Next by thread: Re: [eigen] New(?) way to make using SIMD easier

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/