AW: Eigen 3.3 vs 3.2 Performance (was RE: [eigen] 3.3-beta2 released!) |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Hello Gael,
if I only use SSE2 (that is without -march=native on a Haswell Xeon with AVX & FMA), then I also see no difference in the benchmark. The same is true for "-msse42" as well.
But if I use "-march=native" (which then enables AVX and FMA for Eigen-3.3) for *this particular example*, I see this ~13% slowdown. Can you confirm this?
For reference, the compilation command used was
g++ eigen_bench2.cpp -std=c++11 -Ofast -fno-finite-math-only -DNDEBUG -march=native -I eigen-3.2.9 / 3.3.5
I am no x86 assembly nor vectorization expert, so take the following with a lot of salt.
I have attached the generate assembly for the "hot loop". The first 49 lines are nearly the same, except for eigen-3.2 using %rbp-relative addressing and 3.3 using %rsp (with different offsets).
The remainder is more distinct. Eigen-3.3 doesn't use AVX registers as far as I can see, but uses more "...packed-double" instructions (Eigen 3.2 assembly doesn't seem to use any), but it seems the sequence is still (slightly) slower overall for Eigen-3.3.
eigen-3.2:
vmovsd
-168(%rbp), %xmm3
vxorpd
%xmm6, %xmm6, %xmm6
vmovsd
.LC4(%rip), %xmm2
vmovsd
-136(%rbp), %xmm8
vcvtsi2sd
%eax, %xmm6, %xmm6
vfmadd132sd
.LC3(%rip), %xmm2, %xmm6
vmulsd
%xmm3, %xmm3, %xmm0
vmovsd
-160(%rbp), %xmm2
vmovsd
-176(%rbp), %xmm4
vmulsd
%xmm8, %xmm3, %xmm1
vmovsd
-144(%rbp), %xmm9
vmovsd
-184(%rbp), %xmm5
vmovsd
-152(%rbp), %xmm7
vfmadd231sd
%xmm2, %xmm2, %xmm0
vfmadd231sd
%xmm6, %xmm2, %xmm1
vfmadd231sd
%xmm4, %xmm4, %xmm0
vmulsd
.LC5(%rip), %xmm0, %xmm0
vfmadd231sd
%xmm4, %xmm9, %xmm1
vdivsd
%xmm5, %xmm0, %xmm0
vdivsd
%xmm5, %xmm1, %xmm1
vsubsd
%xmm0, %xmm7, %xmm0
vmulsd
.LC6(%rip), %xmm0, %xmm0
vfmadd213sd
-104(%rbp), %xmm1, %xmm4
vfmadd213sd
-96(%rbp), %xmm1, %xmm3
vfmadd213sd
-88(%rbp), %xmm1, %xmm2
vfmadd213sd
-112(%rbp), %xmm1, %xmm5
vfmadd231sd
%xmm9, %xmm0, %xmm4
vfmadd231sd
%xmm8, %xmm0, %xmm3
vfmadd231sd
%xmm6, %xmm0, %xmm2
vaddsd
%xmm7, %xmm0, %xmm0
vmovsd
%xmm5, -112(%rbp)
vfmadd213sd
-80(%rbp), %xmm0, %xmm1
vmovsd
%xmm4, -104(%rbp)
vmovsd
%xmm3, -96(%rbp)
vmovsd
%xmm2, -88(%rbp)
vmovsd
%xmm1, -80(%rbp)
cmpl
%ebx, %r13d
jl
.L91
eigen-3.3:
vmovupd
120(%rsp), %xmm7
vmovapd
32(%rsp), %xmm8
vxorpd
%xmm5, %xmm5, %xmm5
vcvtsi2sd
%eax, %xmm5, %xmm5
vmovsd
.LC4(%rip), %xmm4
vfmadd132sd
.LC3(%rip), %xmm4, %xmm5
vmulpd
%xmm7, %xmm7, %xmm1
vmovsd
16(%rsp), %xmm3
vmovsd
24(%rsp), %xmm4
vmovsd
8(%rsp), %xmm6
vunpckhpd
%xmm1, %xmm1, %xmm2
vaddsd
%xmm2, %xmm1, %xmm2
vmulpd
%xmm8, %xmm7, %xmm1
vfmadd231sd
%xmm3, %xmm3, %xmm2
vmulsd
.LC5(%rip), %xmm2, %xmm2
vunpckhpd
%xmm1, %xmm1, %xmm0
vaddsd
%xmm0, %xmm1, %xmm0
vdivsd
%xmm4, %xmm2, %xmm2
vfmadd231sd
%xmm5, %xmm3, %xmm0
vdivsd
%xmm4, %xmm0, %xmm0
vsubsd
%xmm2, %xmm6, %xmm2
vmulsd
.LC6(%rip), %xmm2, %xmm2
vfmadd213sd
88(%rsp), %xmm0, %xmm3
vfmadd213sd
64(%rsp), %xmm0, %xmm4
vmovddup
%xmm0, %xmm1
vfmadd213pd
72(%rsp), %xmm1, %xmm7
vmovddup
%xmm2, %xmm1
vfmadd231sd
%xmm5, %xmm2, %xmm3
vaddsd
%xmm6, %xmm2, %xmm2
vmovsd
%xmm4, 64(%rsp)
vfmadd213sd
96(%rsp), %xmm2, %xmm0
vfmadd231pd
%xmm1, %xmm8, %xmm7
vmovsd
%xmm3, 88(%rsp)
vmovsd
%xmm0, 96(%rsp)
vmovups
%xmm7, 72(%rsp)
cmpl
%ebx, %r12d
jl
.L91
If there's anything else I could try to pin-point causes, I'm all ears.. :)
Best regards Daniel Vollmer -------------------------- Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR) German Aerospace Center Institute of Aerodynamics and Flow Technology | Lilienthalplatz 7 | 38108 Braunschweig | Germany Daniel Vollmer | AS C²A²S²E www.DLR.de Von: Gael Guennebaud [gael.guennebaud@xxxxxxxxx]
Gesendet: Donnerstag, 2. August 2018 0:10 An: eigen Betreff: Re: Eigen 3.3 vs 3.2 Performance (was RE: [eigen] 3.3-beta2 released!) Hi,
I tried your little benchmark and with gcc 7, I got no difference at all (-O3 -NDEBUG):
// 3.2: 7.75s
// 3.3: 7.68s
// 3.2 -march=native: 7.46s
// 3.3 -march=native: 7.48s
I ran each test 4 times and keep the best of each, the variations were about 0.2s.
gael
On Wed, Aug 1, 2018 at 1:28 PM <Daniel.Vollmer@xxxxxx> wrote:
Hi, |
Attachment:
eigen_32.S
Description: eigen_32.S
Attachment:
eigen_33.S
Description: eigen_33.S
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |