Re: [eigen] New(?) way to make using SIMD easier |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
On 11/25/2009 12:43 PM, Benoit Jacob wrote:
2009/11/25 Mark Borgerding<mark@xxxxxxxxxxxxxx>:On 11/24/2009 11:51 AM, Benoit Jacob wrote:VectorXf::Map(dstPtr,num) = VectorXf::Map(srcPtr1,num) + VectorXf::Map(srcPtr2,num);I love this syntax and was excited to start to use it more in some of our legacy code. ... Then, I did a benchmark comparing the speed of the above to that of a very simple C-style function using SSE(see "vector_add" in attached testmap.cc). The simple function was *much* faster with both the intel compiler (11.0 20081105) and with g++ (4.4.1 20090725). See the output below.That's because I forgot to tell you that when the pointers are known to be aligned, you need to tell that to Eigen, otherwise it can't guess it (at least not without incurring a constant overhead). So just use MapAligned() instead of Map() (note: that requires the development branch). Actually I tried and now it has exactly the same speed as your simple version: $ g++ testmap.cc -I ../eigen -O2 -DNDEBUG -o t&& ./t
You did not use any -msse* flags. So neither version is using SIMD.After switching to MapAligned ( from hg tip), it helped a little, but I still see almost a 2x difference.
g++ -I.. -O3 -msse -msse2 -msse3 -c -o testmap.o testmap.cc g++ -o testmap testmap.o ../testmapWith simple function, iterations=6000000, elements=512 took 0.690981s. rate=4445.85 MS/s With VectorXf::Map, iterations=6000000, elements=512 took 1.29193s. rate=2377.84 MS/s With simple function, iterations=6000000, elements=512 took 0.671556s. rate=4574.45 MS/s With VectorXf::Map, iterations=6000000, elements=512 took 1.27064s. rate=2417.67 MS/s
icpc -I.. -O3 -msse3 -c -o testmap.o testmap.cc icpc -o testmap testmap.o ../testmapWith simple function, iterations=6000000, elements=512 took 0.803989s. rate=3820.95 MS/s With VectorXf::Map, iterations=6000000, elements=512 took 1.55667s. rate=1973.44 MS/s With simple function, iterations=6000000, elements=512 took 0.803499s. rate=3823.28 MS/s With VectorXf::Map, iterations=6000000, elements=512 took 1.55634s. rate=1973.87 MS/s
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |