Re: [eigen] Speed issues, array min,max |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Speed issues, array min,max
- From: Gabriel <gnuetzi@xxxxxxxxx>
- Date: Mon, 5 Oct 2015 18:08:17 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=K8CvWRMnvmpQ1hZDhDSs6pLKa+sfWX4Nv36nEP8Ut/4=; b=d9cm/ULcU63mOjpIF6Nao+iP3EaRmAiShQBWF0MdOBvIfbEL68C1P69z49/nL/8nv5 vLTwObvUIxv5NPnHCWzJPmAC5kHL1UgiOPz47rg1Ak6fA3VgSfC2t50eGopcTBuMEODP Gnfwgn6AGIkb/xrZNhRtbxxDcVxVMVPrkeOME9E+cLFppusO25fxhzQ8TI1jOURkL+vU bLduMKprrhjDHIrupQlU57CIcI5EYd4t/a2n6WqMIKBpAN4NEm28KPFxV7mGmz5k8AZm zKTNzjtYsZIEphWHuSODTZZtarGlpyTzS/LZIJWCDHVPPs08G/xMQcpxESkrh43Qu4Nr QnYQ==
Thanks a lot!
I thought something is flawed, but
I compared the assembler output with comments of the following
(hopefully unflawed problem, random numbers, all the same )
http://pastebin.com/qDvLGBfU
I get close timings, but Eigen3 is still slower what can also be seen
from the assembler output below
Is my example still flawed, the Eigen internal dense assignment loop is
not inlined?
(SLOWER)
#BEGIN1
# 0 "" 2
#NO_APP
leaq 128(%rsp), %r13
leaq 32(%rsp), %r12
leaq 64(%rsp), %rbp
movl $100000000, %ebx
.p2align 4,,10
.p2align 3
..L367:
call rand
subl $1073741824, %eax
cltq
movq %rax, 64(%rsp)
call rand
subl $1073741824, %eax
cltq
movq %rax, 72(%rsp)
call rand
leal -1073741824(%rax), %edx
leaq 160(%rsp), %rsi
leaq 96(%rsp), %rdi
movq %r13, 160(%rsp)
movq %r12, 168(%rsp)
movslq %edx, %rdx
movq %rbp, 176(%rsp)
movq %rdx, 80(%rsp)
leaq 31(%rsp), %rdx
call
_ZN5Eigen8internal26call_dense_assignment_loopINS_5ArrayIxLi3ELi1ELi0ELi3ELi1EEENS_13CwiseBinaryOpINS0_13scalar_max_opIxEEKS3_KNS4_INS0_13scalar_min_opIxEES7_S7_EEEENS0_13add_assign_opIxEEEEvRKT_RKT0_RKT1_
subl $1, %ebx
jne .L367
#APP
# 123
"/home/zfmgpu/Desktop/Repository/SimulationFramework/SourceCode/Projects/TestBench/Projects/Test/src/main.cpp"
1
#END1
(FASTER)
#BEGIN2
# 0 "" 2
#NO_APP
movl $100000000, %ebx
xorl %r12d, %r12d
.p2align 4,,10
.p2align 3
..L368:
call rand
subl $1073741824, %eax
cltq
movq %rax, 64(%rsp)
call rand
subl $1073741824, %eax
cltq
movq %rax, 72(%rsp)
call rand
leal -1073741824(%rax), %edx
movq 64(%rsp), %rax
cmpq %rax, 32(%rsp)
cmovle 32(%rsp), %rax
movslq %edx, %rdx
movq %rdx, 80(%rsp)
testq %rax, %rax
cmovs %r12, %rax
addq %rax, 96(%rsp)
movq 72(%rsp), %rax
cmpq %rax, 40(%rsp)
cmovle 40(%rsp), %rax
testq %rax, %rax
cmovs %r12, %rax
addq %rax, 104(%rsp)
movq 48(%rsp), %rax
cmpq %rax, %rdx
cmovg %rax, %rdx
testq %rdx, %rdx
cmovs %r12, %rdx
addq %rdx, 112(%rsp)
subl $1, %ebx
jne .L368
#APP
# 138
"/home/zfmgpu/Desktop/Repository/SimulationFramework/SourceCode/Projects/TestBench/Projects/Test/src/main.cpp"
1
#END2
On 10/05/2015 04:32 PM, Christoph Hertzberg wrote:
Your example is flawed, since it is trivial enough for the compiler to
optimize away (almost) entirely. Add
EIGEN_ASM_COMMENT("begin/end..."); lines and have a look at the
generated assembler to see what I mean.
If the values of the vectors are not known at compile-time (and not
the same for each iteration), you should get essentially the same
assembler with Eigen and your hand-coded version -- but with less
lines of code.
Christoph
On 05.10.2015 15:21, Gabriel wrote:
Why is this test code so slow for eigen3
(see simple main)
*http://pastebin.com/11XzzNFs*
Output here with gcc 4.9.2 , full optimization is turned on:
*Eigen3: time: 0.150045 ms **
**Seperate: time: 0.000131 ms **
*
So it seems that it is not beneficial to use eigen for this simple index
calculations, but why?
Thanks for the help! :-)