[eigen] Re: small sums: vectorization not worth it

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: [eigen] Re: small sums: vectorization not worth it
From: "Benoit Jacob" <jacob.benoit.1@xxxxxxxxx>
Date: Sat, 17 Jan 2009 00:40:47 +0100
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=0GveNgRl5BqcDH4+t+S8Zi7kyJ/nVWa3BCu66SCWmEk=; b=i/Of2jg2khAH4lPpUtDZOVQkXjcTP+B+owedYTTnqyUnyoyjRM8RhHr7t1nu0yCSlw jk2JutC8SI4A+n+Ca77f8M+msooqJhuNd0swbFWWQdEeIThECE/IXJ/8yp+Yx8vOqXy7 uFoJwQN4ksyOseKckrJYsXs7hX9mao2ybR21Q=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=M3XxsX7nYz+w0KnBGcO0JCvB/W5l+BumEFaiQitX3nvnoZx9DfjAq80+JCVvVleF8m 6tdaOzeApGuoUDE0Sw5HT3a00qRGF5Tj8UBwVYsfM+Ps/oGERdpud0ae3FmMCLzHa95a JQxy55IFWy+kR3UtCaE4JfWNBATkDBK7nLOIM=

Act 1, 4th and last scene:

i wanted to see for which size it starts getting beneficial to vectorize...

attached is one more example (sum.cpp) this time with size 64, type
double. unrolling forced.

Result: x87 is still 50% faster than SSE2.

The SSE2  asm is:

#APP
# 12 "dot.cpp" 1
	#a
# 0 "" 2
#NO_APP
	movapd	(%esi), %xmm1
	movapd	-104(%ebp), %xmm0
	addpd	-88(%ebp), %xmm0
	addpd	-120(%ebp), %xmm0
	addpd	-136(%ebp), %xmm0
	addpd	-152(%ebp), %xmm0
	addpd	-168(%ebp), %xmm0
	addpd	-184(%ebp), %xmm0
	addpd	-200(%ebp), %xmm0
	addpd	-216(%ebp), %xmm0
	addpd	-232(%ebp), %xmm0
	addpd	-248(%ebp), %xmm0
	addpd	-264(%ebp), %xmm0
	addpd	-280(%ebp), %xmm0
	addpd	-296(%ebp), %xmm0
	addpd	-312(%ebp), %xmm0
	addpd	-328(%ebp), %xmm0
	addpd	-344(%ebp), %xmm0
	addpd	-360(%ebp), %xmm0
	addpd	-376(%ebp), %xmm0
	addpd	-392(%ebp), %xmm0
	addpd	-408(%ebp), %xmm0
	addpd	-424(%ebp), %xmm0
	addpd	-440(%ebp), %xmm0
	addpd	-456(%ebp), %xmm0
	addpd	-472(%ebp), %xmm0
	addpd	-488(%ebp), %xmm0
	addpd	-504(%ebp), %xmm0
	addpd	-520(%ebp), %xmm0
	addpd	-536(%ebp), %xmm0
	addpd	-552(%ebp), %xmm0
	addpd	-568(%ebp), %xmm0
	addpd	%xmm0, %xmm1
	movapd	%xmm1, %xmm2
	unpckhpd	%xmm1, %xmm2
	addsd	%xmm2, %xmm1
	movapd	%xmm1, %xmm2
	movsd	%xmm2, -584(%ebp)
#APP
# 14 "dot.cpp" 1
	#b
# 0 "" 2
#NO_APP



2009/1/16 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> and in case you wonder: the situation is almost the same for dot product
>
> (attached file dot.cpp)
>
> runs 2x slower with SSE...
>
> Cheers,
> Benoit
>
> 2009/1/16 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>> Note: i think the old heuristic was wrong anyway.
>>
>> Maybe take this occasion to introduce a EIGEN_COST_OF_PREDUX (since
>> this cost depends greatly on the simd platform) ?
>> And then do a natural heuristic rather than a quick hack like we used to have?
>>
>> Benoit
>>
>> 2009/1/16 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
>>> Hi Gael *cough* List,
>>>
>>> ei_predux is costly because it consists of >1 SIMD instruction.
>>>
>>> So until recently we had sum() only vectorize if the size was big enough.
>>> However this was recently changed.
>>>
>>> Attached is a benchmark that runs 2.5x slower with SSE (2 or 3) than
>>> without. It's just Vector2d::sum().
>>>
>>> So, revert to old behavior?
>>>
>>> Moreover: matrix product innerVectorization also uses a ei_predux. Same here?
>>>
>>> Cheers,
>>> Benoit
>>>
>>
>

#define EIGEN_UNROLLING_LIMIT 10000
#include<Eigen/Core>

typedef Eigen::Matrix<double, 64, 1> T;

int main()
{
  T v; v.setZero(); v[0] = 1;
  for(int i = 0; i < 10000000; i++)
  {
    v = T::Ones() + v * 1e-10;
    asm("#a");
    v[0] = v.sum();
    asm("#b");
    v[1] = v.sum();
    //std::cout << v << "\n"; // check it's not inf...
  }
  return int(v[0]);
}

Follow-Ups:
- Re: [eigen] Re: small sums: vectorization not worth it
  - From: Gael Guennebaud

References:
- [eigen] small sums: vectorization not worth it
  - From: Benoit Jacob
- [eigen] Re: small sums: vectorization not worth it
  - From: Benoit Jacob
- [eigen] Re: small sums: vectorization not worth it
  - From: Benoit Jacob

Messages sorted by: [ date | thread ]
Prev by Date: [eigen] Re: small sums: vectorization not worth it
Next by Date: [eigen] act 2....
Previous by thread: [eigen] Re: small sums: vectorization not worth it
Next by thread: Re: [eigen] Re: small sums: vectorization not worth it

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/