Re: [eigen] vectorization bug |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] vectorization bug
- From: "Gael Guennebaud" <gael.guennebaud@xxxxxxxxx>
- Date: Sun, 24 Aug 2008 15:57:37 +0200
- Cc: "Tim Vandermeersch" <tim.vandermeersch@xxxxxxxxx>
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=wa+acy91iTpXgmx9snjERImsB+mmsesYDdSktqbG52E=; b=R8YpzV4lT2FT6pTERZby2rB15e7MLeBpBniIdU6iNJLP1iGxcWo62Txd3Ikfnd0RMc YaWaH4QIuqHNBteCJd4PbYeWupZyu0GEF75KRo08WG0RP0WpKEAexWXS6PlHAKoSOlLn UivNn6LdwtbZQiyLPI+xZZwvfftOrmC7AWT5Q=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=T1Pp6+5MMEMEZGqCWrtInn79hpk2uPzeY3dH2nItoyjGnt603w0sYiMi4fKjBxXgeD ZJfIBUTmL0jHTUwWu8MqGafgoQuK4Rl38nKZELcEcR2OJItzyd5kEWArdjfQuDwU+tpr pvzmF7S/FxGkGTt90PO7gW0a3EFQCQlAbMJUg=
Hi Benoit,
Secondly, with Eigen gcc 4.3 sucks compared to gcc 4.2. I observed
that in all benchmarks.
In your case here my results (core2, 64bits system):
gcc 4.2 gcc 4.3
explicit vec 0.38s 0.5s
implicit vec 0.48s 0.46s
the line implicit actually means no vectorization for gcc 4.2 and
gcc's default vectorization for gcc 4.3.
so here is the core of the vector addition:
gcc 4.3:
..L57:
movq 32(%rsp), %rax
addl $2, %ecx
movapd (%rax,%rdx), %xmm0
movq 16(%rsp), %rax
addpd (%rax,%rdx), %xmm0
movq (%rsp), %rax
movapd %xmm0, (%rax,%rdx)
addq $16, %rdx
cmpl %ecx, %r8d
jg .L57
as we can see gcc should move 3 movq instructions (which load the
address of the data) out of the loop !
Now let's compare with gcc 4.2 code:
..L73:
movapd (%rax,%rbp), %xmm0
addpd (%rax,%rbx), %xmm0
movapd %xmm0, (%rax,%rdi)
addq $16, %rax
cmpq $24000, %rax
jne .L73
yeah much much better !!
FYI current gcc trunk (future 4.4) generates code here, so let's not
bother... also I'm using g++-4.3 (GCC) 4.3.0 20080215 (experimental)
which is not the most recent one....
About Ones, here it is well vectorized: (gcc 4.2 and 4.4)
..L62:
movapd %xmm0, (%rax,%rdx)
addq $16, %rax
cmpq $24000, %rax
jne .L62
and for some weird reasons, it seems gcc 4.3 drops the middle
vectorized loop here.... very strange !
cheers,
gael.
2008/8/24 <jacob@xxxxxxxxxxxxxxx>:
> Hi List,
>
> Here's a simple benchmark, a.cpp. It runs faster without vectorization than
> with!
>
> Trying to understand this I added some asm comments in Assign.h, so my copy
> looks like this:
>
> template<typename Derived1, typename Derived2>
> struct ei_assign_impl<Derived1, Derived2, LinearVectorization, NoUnrolling>
> {
> static void run(Derived1 &dst, const Derived2 &src)
> {
> asm("#begin");
> const int size = dst.size();
> const int packetSize = ei_packet_traits<typename Derived1::Scalar>::size;
> const int alignedStart =
> ei_assign_traits<Derived1,Derived2>::DstIsAligned ? 0
> : ei_alignmentOffset(&dst.coeffRef(0), size);
> const int alignedEnd = alignedStart +
> ((size-alignedStart)/packetSize)*packetSize;
>
> asm("#unaligned start");
>
> for(int index = 0; index < alignedStart; index++)
> dst.copyCoeff(index, src);
> asm("#aligned middle");
>
> for(int index = alignedStart; index < alignedEnd; index += packetSize)
> {
> dst.template copyPacket<Derived2, Aligned,
> ei_assign_traits<Derived1,Derived2>::SrcAlignment>(index, src);
> }
>
> asm("#unaligned end");
>
> for(int index = alignedEnd; index < size; index++)
> dst.copyCoeff(index, src);
> asm("#end");
> }
> };
>
> I attach the resulting assembly (a.s). Can you see what's wrong?
>
> Another thing. The ones() part compiles to this:
>
> xorl %edx, %edx
> .L107:
> movl -24(%ebp), %eax
> fld1
> fstl (%eax,%edx)
> fstpl 8(%eax,%edx)
> addl $16, %edx
> cmpl $24000, %edx
> jne .L107
>
> This is not vectorized, right??
>
> Cheers,
> Benoit
>
> Cheers,
> Benoit
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>