Re: [eigen] New(?) way to make using SIMD easier

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On 11/25/2009 12:43 PM, Benoit Jacob wrote:
2009/11/25 Mark Borgerding<mark@xxxxxxxxxxxxxx>:
On 11/24/2009 11:51 AM, Benoit Jacob wrote:
VectorXf::Map(dstPtr,num)
   = VectorXf::Map(srcPtr1,num)
   + VectorXf::Map(srcPtr2,num);
I love this syntax and was excited to start to use it more in some of our
legacy code.
...
Then,  I did a benchmark  comparing the speed of the above to that of a very
  simple C-style function using SSE(see "vector_add" in attached testmap.cc).
The simple function was *much* faster with both the intel compiler (11.0
20081105)  and with g++ (4.4.1 20090725). See the output below.
That's because I forgot to tell you that when the pointers are known
to be aligned, you need to tell that to Eigen, otherwise it can't
guess it (at least not without incurring a constant overhead).

So just use MapAligned() instead of Map()  (note: that requires the
development branch). Actually I tried and now it has exactly the same
speed as your simple version:

$ g++ testmap.cc -I ../eigen -O2 -DNDEBUG -o t&&  ./t
You did not use any -msse* flags.  So neither version is using SIMD.


After switching to MapAligned ( from hg tip), it helped a little, but I still see almost a 2x difference.

g++ -I.. -O3 -msse -msse2 -msse3    -c -o testmap.o testmap.cc
g++ -o testmap testmap.o
../testmap
With simple function, iterations=6000000, elements=512 took 0.690981s. rate=4445.85 MS/s With VectorXf::Map, iterations=6000000, elements=512 took 1.29193s. rate=2377.84 MS/s With simple function, iterations=6000000, elements=512 took 0.671556s. rate=4574.45 MS/s With VectorXf::Map, iterations=6000000, elements=512 took 1.27064s. rate=2417.67 MS/s

icpc -I.. -O3 -msse3    -c -o testmap.o testmap.cc
icpc -o testmap testmap.o
../testmap
With simple function, iterations=6000000, elements=512 took 0.803989s. rate=3820.95 MS/s With VectorXf::Map, iterations=6000000, elements=512 took 1.55667s. rate=1973.44 MS/s With simple function, iterations=6000000, elements=512 took 0.803499s. rate=3823.28 MS/s With VectorXf::Map, iterations=6000000, elements=512 took 1.55634s. rate=1973.87 MS/s




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/