Re: [eigen] vectorization bug

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen@xxxxxxxxxxxxxxxxxxx
Subject: Re: [eigen] vectorization bug
From: "Gael Guennebaud" <gael.guennebaud@xxxxxxxxx>
Date: Sun, 24 Aug 2008 15:57:37 +0200
Cc: "Tim Vandermeersch" <tim.vandermeersch@xxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=wa+acy91iTpXgmx9snjERImsB+mmsesYDdSktqbG52E=; b=R8YpzV4lT2FT6pTERZby2rB15e7MLeBpBniIdU6iNJLP1iGxcWo62Txd3Ikfnd0RMc YaWaH4QIuqHNBteCJd4PbYeWupZyu0GEF75KRo08WG0RP0WpKEAexWXS6PlHAKoSOlLn UivNn6LdwtbZQiyLPI+xZZwvfftOrmC7AWT5Q=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=T1Pp6+5MMEMEZGqCWrtInn79hpk2uPzeY3dH2nItoyjGnt603w0sYiMi4fKjBxXgeD ZJfIBUTmL0jHTUwWu8MqGafgoQuK4Rl38nKZELcEcR2OJItzyd5kEWArdjfQuDwU+tpr pvzmF7S/FxGkGTt90PO7gW0a3EFQCQlAbMJUg=

Hi Benoit,

Secondly, with Eigen gcc 4.3 sucks compared to gcc 4.2. I observed
that in all benchmarks.
In your case here my results (core2, 64bits system):

                   gcc 4.2  gcc 4.3
explicit vec   0.38s     0.5s
implicit vec   0.48s     0.46s

the line implicit actually means no vectorization for gcc 4.2 and
gcc's default vectorization for gcc 4.3.

so here is the core of the vector addition:

gcc 4.3:

..L57:
	movq	32(%rsp), %rax
	addl	$2, %ecx
	movapd	(%rax,%rdx), %xmm0
	movq	16(%rsp), %rax
	addpd	(%rax,%rdx), %xmm0
	movq	(%rsp), %rax
	movapd	%xmm0, (%rax,%rdx)
	addq	$16, %rdx
	cmpl	%ecx, %r8d
	jg	.L57

as we can see gcc should move 3 movq instructions (which load the
address of the data) out of the loop !

Now let's compare with gcc 4.2 code:

..L73:
	movapd	(%rax,%rbp), %xmm0
	addpd	(%rax,%rbx), %xmm0
	movapd	%xmm0, (%rax,%rdi)
	addq	$16, %rax
	cmpq	$24000, %rax
	jne	.L73

yeah much much better !!

FYI current gcc trunk (future 4.4) generates code here, so let's not
bother... also I'm using g++-4.3 (GCC) 4.3.0 20080215 (experimental)
which is not the most recent one....



About Ones, here it is well vectorized: (gcc 4.2 and 4.4)

..L62:
	movapd	%xmm0, (%rax,%rdx)
	addq	$16, %rax
	cmpq	$24000, %rax
	jne	.L62

and for some weird reasons, it seems gcc 4.3 drops the middle
vectorized loop here.... very strange !

cheers,
gael.


2008/8/24  <jacob@xxxxxxxxxxxxxxx>:
> Hi List,
>
> Here's a simple benchmark, a.cpp. It runs faster without vectorization than
> with!
>
> Trying to understand this I added some asm comments in Assign.h, so my copy
> looks like this:
>
> template<typename Derived1, typename Derived2>
> struct ei_assign_impl<Derived1, Derived2, LinearVectorization, NoUnrolling>
> {
>  static void run(Derived1 &dst, const Derived2 &src)
>  {
>    asm("#begin");
>    const int size = dst.size();
>    const int packetSize = ei_packet_traits<typename Derived1::Scalar>::size;
>    const int alignedStart =
> ei_assign_traits<Derived1,Derived2>::DstIsAligned ? 0
>                           : ei_alignmentOffset(&dst.coeffRef(0), size);
>    const int alignedEnd = alignedStart +
> ((size-alignedStart)/packetSize)*packetSize;
>
>    asm("#unaligned start");
>
>    for(int index = 0; index < alignedStart; index++)
>      dst.copyCoeff(index, src);
>    asm("#aligned middle");
>
>    for(int index = alignedStart; index < alignedEnd; index += packetSize)
>    {
>      dst.template copyPacket<Derived2, Aligned,
> ei_assign_traits<Derived1,Derived2>::SrcAlignment>(index, src);
>    }
>
>    asm("#unaligned end");
>
>    for(int index = alignedEnd; index < size; index++)
>      dst.copyCoeff(index, src);
>    asm("#end");
>  }
> };
>
> I attach the resulting assembly (a.s). Can you see what's wrong?
>
> Another thing. The ones() part compiles to this:
>
>        xorl    %edx, %edx
> .L107:
>        movl    -24(%ebp), %eax
>        fld1
>        fstl    (%eax,%edx)
>        fstpl   8(%eax,%edx)
>        addl    $16, %edx
>        cmpl    $24000, %edx
>        jne     .L107
>
> This is not vectorized, right??
>
> Cheers,
> Benoit
>
> Cheers,
> Benoit
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>

Follow-Ups:
- Re: [eigen] vectorization bug
  - From: Gael Guennebaud
- Re: [eigen] vectorization bug
  - From: jacob

References:
- [eigen] vectorization bug
  - From: jacob

Messages sorted by: [ date | thread ]
Prev by Date: [eigen] vectorization bug
Next by Date: Re: [eigen] vectorization bug
Previous by thread: [eigen] vectorization bug
Next by thread: Re: [eigen] vectorization bug

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/