Re: [eigen] Re: perf issue with vector of size 2 ?

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


There are a couple of issues with your benchmark:

1 - Your data is not initialized and so you are triggering many SSE
floating point exceptions. You can disable them by putting the
following at the beginning of your make file:

{
int aux;
asm(
 "stmxcsr   %[aux]         \n\t"
 "orl       $32832, %[aux] \n\t"
 "ldmxcsr   %[aux]         \n\t"
 : : [aux] "m" (aux));
}

2 - The "manual" and eigen code for the second part of your benchmark
does not compute the same things.

3 - It is a bad idea to bench various piece of code in the main
function (the compiler might entirely or partially remove some parts
because the result is not used, or because the for(int e = 0; e <
P;++e ) loop has not no effects, etc.) I recommend you to put each
piece of code in a non inlined function, e.g.:

static const int C1=2;
static const int C2=2;
static const int Q=25;
static const int N=21;
typedef boost::multi_array<double,4> mad4;

EIGEN_DONT_INLINE void t1(mad4& x1, mad4& x2)
{
	for(int i = 0; i < N; ++i )
		for(int q = 0; q < Q; ++q )
		{
			for(int c1 = 0; c1 < C1; ++c1 )
				for(int c2 = 0; c2 < C2; ++c2)
				{
					x1[i][q][c1][c2] = cos(x2[i][q][c1][c2])*x2[i][q][c1][c2];
				}
		}
}

int main()
{
  // ...
  for(int e = 0; e < P;++e )
      t1(x1, x2);
  std::cout << "time: " << ti.elapsed() << "\n";
 // ...
}


This way you better now what you are benchmarking....

Also you should not see any difference for the first example because
it is largely dominated by the computation of the cosines which are
not vectorized yet for double.

cheers,
gael

On Wed, Jul 21, 2010 at 12:19 PM, Christophe Prud'homme
<christophe.prudhomme@xxxxxxxxxxxxxxx> wrote:
> Now in the case C!=C2=2 or C1=C2=3 a similar behavior occurs:
> I stlghtly changed the code
>
> # C1=C2=2
> ./e2
> time: 1.66
> time: 2.56
> time: 8.35
> scalar product
> time: 2.97
> time: 2.68
> time: 7.92
> # C1=C2=3
> ./e2
> time: 4.44
> time: 5.67
> time: 2.42
> scalar product
> time: 5.81
> time: 5.91
> time: 0.74
>
> Again Eigen2 is very fast for C1=C2=3 and way slower than the "by
> hand" implementations
>
> anything to explain that behavior ?
>
> Best regards
> C.
> --
> Christophe Prud'homme
> Université de Grenoble      christophe.prudhomme@xxxxxxxxxxxxxxx
> LJK - Room 55                  Tel: +33476635497
> 51, rue des Mathématiques      Fax: +33476631263
> BP53 38041 Grenoble Cedex 9
>       <http://ljk.imag.fr/membres/Christophe.Prudhomme/>
>



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/