[eigen] perf issue with vector of size 2 ?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hello

I am doing some performances tests
comparing by hand and eigen operations

attached you will find the code. There is  a component wise test and a
scalar product test.

If I take  C1=2 the eigen code is about 4 times slower
 g++ -mssse3 -I/usr/include/eigen2 -O3 -o e2 e2.cpp
 ./e2
time: 1.04
time: 1.28
time: 4.47
scalar product
time: 1.48
time: 1.33
time: 4.12

with C1=3 it is much faster with eigen
g++ -mssse3 -I/usr/include/eigen2 -O3 -o e2 e2.cpp
../e2
time: 1.39
time: 1.9
time: 1.03
scalar product
time: 2.03
time: 1.97
time: 0.44

something is terribly wrong with the case C1=2 or am I missing something here ?
C1=3 is 4 to 8 times faster than C1=2;

Best regards
C.
-- 
Christophe Prud'homme
Université de Grenoble      christophe.prudhomme@xxxxxxxxxxxxxxx
LJK - Room 55                  Tel: +33476635497
51, rue des Mathématiques      Fax: +33476631263
BP53 38041 Grenoble Cedex 9
      <http://ljk.imag.fr/membres/Christophe.Prudhomme/>
#include <boost/multi_array.hpp>
#include <Eigen/Eigen>
#include <boost/timer.hpp>

int main()
{
	const int P = 100000;
	static const int N=21;
	static const int Q=25;
	static const int C1=2;
	static const int C2=1;

	typedef Eigen::Matrix<double,C1,C2> vector_type;
	boost::multi_array<double,4> x( boost::extents[N][Q][1][1] );
	boost::multi_array<double,4> x1( boost::extents[N][Q][C1][C2] );
	boost::multi_array<double,4> x2( boost::extents[N][Q][C1][C2] );
	boost::multi_array<double,4> w( boost::extents[N][Q][1][1] );
	boost::multi_array<double,4> w1( boost::extents[N][C1][C2][Q] );
	boost::multi_array<double,4> w2( boost::extents[N][C1][C2][Q] );
	boost::multi_array<double,2> y( boost::extents[N][Q] );
	boost::multi_array<vector_type,2> y1( boost::extents[N][Q] );
	boost::multi_array<vector_type,2> y2( boost::extents[N][Q] );

	boost::timer ti;
	for(int e = 0; e < P;++e )
	for(int i = 0; i < N; ++i )
		for(int q = 0; q < Q; ++q )
		{
			for(int c1 = 0; c1 < C1; ++c1 )
				for(int c2 = 0; c2 < C2; ++c2)
				{
					x1[i][q][c1][c2] = cos(x2[i][q][c1][c2])*x2[i][q][c1][c2];
				}
		}
	std::cout << "time: " << ti.elapsed() << "\n";

	ti.restart();
	for(int e = 0; e < P;++e )
	for(int i = 0; i < N; ++i )
	{
		for(int c1 = 0; c1 < C1; ++c1 )
			for(int c2 = 0; c2 < C2; ++c2)
				for(int q = 0; q < Q; ++q )
				{
					w1[i][c1][c2][q] = cos(w2[i][c1][c2][q])*w2[i][c1][c2][q];
				}
	}
	std::cout << "time: " << ti.elapsed() << "\n";

	ti.restart();
	for(int e = 0; e < P;++e )
	for(int i = 0; i < N; ++i )
		for(int q = 0; q < Q; ++q )
		{
			y1[i][q] = y2[i][q].cwise()*y2[i][q].cwise().cos();
		}
	std::cout << "time: " << ti.elapsed() << "\n";
	std::cout << "scalar product\n";

	ti.restart();
	for(int e = 0; e < P;++e )
	for(int i = 0; i < N; ++i )
		for(int q = 0; q < Q; ++q )
		{
			x[i][q][0][0] = 0;
			for(int c1 = 0; c1 < C1; ++c1 )
				for(int c2 = 0; c2 < C2; ++c2)
				{
					x[i][q][0][0] += x2[i][q][c1][c2]*x1[i][q][c1][c2];
				}
		}
	std::cout << "time: " << ti.elapsed() << "\n";

	ti.restart();
	for(int e = 0; e < P;++e )
	for(int i = 0; i < N; ++i )
	{

		for(int c1 = 0; c1 < C1; ++c1 )
			for(int c2 = 0; c2 < C2; ++c2)
			{
				for(int q = 0; q < Q; ++q )
				{

					w[i][q][0][0] += w2[i][c1][c2][q]*w1[i][c1][c2][q];
				}
			}

	}
	std::cout << "time: " << ti.elapsed() << "\n";

	ti.restart();
	for(int e = 0; e < P;++e )
		for(int i = 0; i < N; ++i )
			for(int q = 0; q < Q; ++q )
			{
				y[i][q] = y2[i][q].dot(y1[i][q]);
				//y[i][q] = y2[i][q].transpose()*y1[i][q];
			}
	std::cout << "time: " << ti.elapsed() << "\n";

}


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/