Re: [eigen] Speed issues, array min,max

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Gabriel,

Why the long long int? I used doubles and Eigen was 2x faster, but with long long ints Eigen was (significantly) slower. I replaced the constant zero vector with one that changes to (hopefully) minimize compiler magic.

Here is the code I used,

http://pastebin.com/euFbnAxU 

On Mon, Oct 5, 2015 at 10:08 AM, Gabriel <gnuetzi@xxxxxxxxx> wrote:
Thanks a lot!
I thought something is flawed, but
I compared the assembler output with comments of the following (hopefully unflawed problem, random numbers, all the same )

http://pastebin.com/qDvLGBfU

I get close timings, but Eigen3 is still slower what can also be seen from the assembler output below
Is my example still flawed, the Eigen internal dense assignment loop is not inlined?


(SLOWER)
#BEGIN1
# 0 "" 2
#NO_APP
    leaq    128(%rsp), %r13
    leaq    32(%rsp), %r12
    leaq    64(%rsp), %rbp
    movl    $100000000, %ebx
    .p2align 4,,10
    .p2align 3
..L367:
    call    rand
    subl    $1073741824, %eax
    cltq
    movq    %rax, 64(%rsp)
    call    rand
    subl    $1073741824, %eax
    cltq
    movq    %rax, 72(%rsp)
    call    rand
    leal    -1073741824(%rax), %edx
    leaq    160(%rsp), %rsi
    leaq    96(%rsp), %rdi
    movq    %r13, 160(%rsp)
    movq    %r12, 168(%rsp)
    movslq    %edx, %rdx
    movq    %rbp, 176(%rsp)
    movq    %rdx, 80(%rsp)
    leaq    31(%rsp), %rdx
    call _ZN5Eigen8internal26call_dense_assignment_loopINS_5ArrayIxLi3ELi1ELi0ELi3ELi1EEENS_13CwiseBinaryOpINS0_13scalar_max_opIxEEKS3_KNS4_INS0_13scalar_min_opIxEES7_S7_EEEENS0_13add_assign_opIxEEEEvRKT_RKT0_RKT1_
    subl    $1, %ebx
    jne    .L367
#APP
# 123 "/home/zfmgpu/Desktop/Repository/SimulationFramework/SourceCode/Projects/TestBench/Projects/Test/src/main.cpp" 1
    #END1

(FASTER)
    #BEGIN2
# 0 "" 2
#NO_APP
    movl    $100000000, %ebx
    xorl    %r12d, %r12d
    .p2align 4,,10
    .p2align 3
..L368:
    call    rand
    subl    $1073741824, %eax
    cltq
    movq    %rax, 64(%rsp)
    call    rand
    subl    $1073741824, %eax
    cltq
    movq    %rax, 72(%rsp)
    call    rand
    leal    -1073741824(%rax), %edx
    movq    64(%rsp), %rax
    cmpq    %rax, 32(%rsp)
    cmovle    32(%rsp), %rax
    movslq    %edx, %rdx
    movq    %rdx, 80(%rsp)
    testq    %rax, %rax
    cmovs    %r12, %rax
    addq    %rax, 96(%rsp)
    movq    72(%rsp), %rax
    cmpq    %rax, 40(%rsp)
    cmovle    40(%rsp), %rax
    testq    %rax, %rax
    cmovs    %r12, %rax
    addq    %rax, 104(%rsp)
    movq    48(%rsp), %rax
    cmpq    %rax, %rdx
    cmovg    %rax, %rdx
    testq    %rdx, %rdx
    cmovs    %r12, %rdx
    addq    %rdx, 112(%rsp)
    subl    $1, %ebx
    jne    .L368
#APP
# 138 "/home/zfmgpu/Desktop/Repository/SimulationFramework/SourceCode/Projects/TestBench/Projects/Test/src/main.cpp" 1
    #END2




On 10/05/2015 04:32 PM, Christoph Hertzberg wrote:
Your example is flawed, since it is trivial enough for the compiler to optimize away (almost) entirely. Add EIGEN_ASM_COMMENT("begin/end..."); lines and have a look at the generated assembler to see what I mean.

If the values of the vectors are not known at compile-time (and not the same for each iteration), you should get essentially the same assembler with Eigen and your hand-coded version -- but with less lines of code.

Christoph

On 05.10.2015 15:21, Gabriel wrote:
Why is this test code so slow for eigen3

(see simple main)
*http://pastebin.com/11XzzNFs*


Output here with gcc 4.9.2 , full optimization is turned on:

*Eigen3: time: 0.150045 ms **
**Seperate: time: 0.000131 ms **
*
So it seems that it is not beneficial to use eigen for this simple index
calculations, but why?


Thanks for the help! :-)










--
Nate Yonkee
Graduate research assistant
All around great guy


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/