RE: Eigen 3.3 vs 3.2 Performance (was RE: [eigen] 3.3-beta2 released!)

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On Mon, 6 Aug 2018, Daniel.Vollmer@xxxxxx wrote:

I've been trying to understand a bit better what is happening with the performance regression I'm seeing, and at the moment I am under the impression that Eigen-3.3 makes it harder (impossible?) for gcc to recognize when no aliasing is happening.

Nah, it is just gcc being silly.

I've further reduced my original example to essentially the following loop  (see eigen_bench3.cpp for a self-contained version).
 using Vec          = Eigen::Matrix<double, 2, 1>;
 Vec sum = Vec::Zero();
 for (int i = 0; i < num; ++i)
 {
   const Vec dirA = sum;
   const Vec dirB = dirA;

   sum += dirA.dot(dirB) * dirA;
 }

Without vectors, the main loop at -O3 starts with

        movdqu  (%rax), %xmm0
        addl    $1, %edx
        movaps  %xmm0, -40(%rsp)
        movsd   -40(%rsp), %xmm1
        movsd   -32(%rsp), %xmm4
        movaps  %xmm0, -24(%rsp)
        movsd   -16(%rsp), %xmm0
        movsd   -24(%rsp), %xmm5

so: read from memory, write to memory and re-read piecewise, and do it a second time just for the sake of it.

The corresponding internal representation at the end of the high-level optimization phase is

  MEM[(struct DenseStorage *)&dirA].m_data = MEM[(const struct DenseStorage &)sum_5(D)].m_data;
  dirA_31 = MEM[(struct plain_array *)&dirA];
  dirA$8_30 = MEM[(struct plain_array *)&dirA + 8B];
  MEM[(struct DenseStorage *)&dirB].m_data = MEM[(const struct DenseStorage &)&dirA].m_data;
  dirB_37 = MEM[(struct plain_array *)&dirB];
  dirB$8_38 = MEM[(struct plain_array *)&dirB + 8B];

This involves some direct mem-to-mem assignments, which is something that gcc handles super badly. If the copy was done piecewise, each element would be a SSA variable and optimizations would work. Even if the copy was done with memcpy there would be code to simplify it. But mem-to-mem...

I strongly encourage you to report this testcase to gcc's bugzilla.

(it doesn't mean that people can't work around it in eigen somehow, but that will likely not be nice and not catch all cases)

--
Marc Glisse



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/