Re: [eigen] Speed issues, array min,max

On Mon, Oct 5, 2015 at 10:08 AM, Gabriel <gnuetzi@xxxxxxxxx> wrote:

Thanks a lot!
I thought something is flawed, but
I compared the assembler output with comments of the following (hopefully unflawed problem, random numbers, all the same )

http://pastebin.com/qDvLGBfU

I get close timings, but Eigen3 is still slower what can also be seen from the assembler output below
Is my example still flawed, the Eigen internal dense assignment loop is not inlined?

(SLOWER)
#BEGIN1
# 0 "" 2
#NO_APP
leaq 128(%rsp), %r13
leaq 32(%rsp), %r12
leaq 64(%rsp), %rbp
movl $100000000, %ebx
.p2align 4,,10
.p2align 3
..L367:
call rand
subl $1073741824, %eax
cltq
movq %rax, 64(%rsp)
call rand
subl $1073741824, %eax
cltq
movq %rax, 72(%rsp)
call rand
leal -1073741824(%rax), %edx
leaq 160(%rsp), %rsi
leaq 96(%rsp), %rdi
movq %r13, 160(%rsp)
movq %r12, 168(%rsp)
movslq %edx, %rdx
movq %rbp, 176(%rsp)
movq %rdx, 80(%rsp)
leaq 31(%rsp), %rdx
call _ZN5Eigen8internal26call_dense_assignment_loopINS_5ArrayIxLi3ELi1ELi0ELi3ELi1EEENS_13CwiseBinaryOpINS0_13scalar_max_opIxEEKS3_KNS4_INS0_13scalar_min_opIxEES7_S7_EEEENS0_13add_assign_opIxEEEEvRKT_RKT0_RKT1_
subl $1, %ebx
jne .L367
#APP
# 123 "/home/zfmgpu/Desktop/Repository/SimulationFramework/SourceCode/Projects/TestBench/Projects/Test/src/main.cpp" 1
#END1

(FASTER)
#BEGIN2
# 0 "" 2
#NO_APP
movl $100000000, %ebx
xorl %r12d, %r12d
.p2align 4,,10
.p2align 3
..L368:
call rand
subl $1073741824, %eax
cltq
movq %rax, 64(%rsp)
call rand
subl $1073741824, %eax
cltq
movq %rax, 72(%rsp)
call rand
leal -1073741824(%rax), %edx
movq 64(%rsp), %rax
cmpq %rax, 32(%rsp)
cmovle 32(%rsp), %rax
movslq %edx, %rdx
movq %rdx, 80(%rsp)
testq %rax, %rax
cmovs %r12, %rax
addq %rax, 96(%rsp)
movq 72(%rsp), %rax
cmpq %rax, 40(%rsp)
cmovle 40(%rsp), %rax
testq %rax, %rax
cmovs %r12, %rax
addq %rax, 104(%rsp)
movq 48(%rsp), %rax
cmpq %rax, %rdx
cmovg %rax, %rdx
testq %rdx, %rdx
cmovs %r12, %rdx
addq %rdx, 112(%rsp)
subl $1, %ebx
jne .L368
#APP
# 138 "/home/zfmgpu/Desktop/Repository/SimulationFramework/SourceCode/Projects/TestBench/Projects/Test/src/main.cpp" 1
#END2

On 10/05/2015 04:32 PM, Christoph Hertzberg wrote:

Your example is flawed, since it is trivial enough for the compiler to optimize away (almost) entirely. Add EIGEN_ASM_COMMENT("begin/end..."); lines and have a look at the generated assembler to see what I mean.

If the values of the vectors are not known at compile-time (and not the same for each iteration), you should get essentially the same assembler with Eigen and your hand-coded version -- but with less lines of code.

Christoph

On 05.10.2015 15:21, Gabriel wrote:

Why is this test code so slow for eigen3

(see simple main)
*http://pastebin.com/11XzzNFs*

Output here with gcc 4.9.2 , full optimization is turned on:

*Eigen3: time: 0.150045 ms **
**Seperate: time: 0.000131 ms **
*
So it seems that it is not beneficial to use eigen for this simple index
calculations, but why?

Thanks for the help! :-)

Nate Yonkee

Graduate research assistant

All around great guy