|RE: Eigen 3.3 vs 3.2 Performance (was RE: [eigen] 3.3-beta2 released!)|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
On Mon, 6 Aug 2018, Daniel.Vollmer@xxxxxx wrote:
I've been trying to understand a bit better what is happening with the
performance regression I'm seeing, and at the moment I am under the
impression that Eigen-3.3 makes it harder (impossible?) for gcc to
recognize when no aliasing is happening.
Nah, it is just gcc being silly.
I've further reduced my original example to essentially the following loop (see eigen_bench3.cpp for a self-contained version).
using Vec = Eigen::Matrix<double, 2, 1>;
Vec sum = Vec::Zero();
for (int i = 0; i < num; ++i)
const Vec dirA = sum;
const Vec dirB = dirA;
sum += dirA.dot(dirB) * dirA;
Without vectors, the main loop at -O3 starts with
movdqu (%rax), %xmm0
addl $1, %edx
movaps %xmm0, -40(%rsp)
movsd -40(%rsp), %xmm1
movsd -32(%rsp), %xmm4
movaps %xmm0, -24(%rsp)
movsd -16(%rsp), %xmm0
movsd -24(%rsp), %xmm5
so: read from memory, write to memory and re-read piecewise, and do it a
second time just for the sake of it.
The corresponding internal representation at the end of the high-level
optimization phase is
MEM[(struct DenseStorage *)&dirA].m_data = MEM[(const struct DenseStorage &)sum_5(D)].m_data;
dirA_31 = MEM[(struct plain_array *)&dirA];
dirA$8_30 = MEM[(struct plain_array *)&dirA + 8B];
MEM[(struct DenseStorage *)&dirB].m_data = MEM[(const struct DenseStorage &)&dirA].m_data;
dirB_37 = MEM[(struct plain_array *)&dirB];
dirB$8_38 = MEM[(struct plain_array *)&dirB + 8B];
This involves some direct mem-to-mem assignments, which is something that
gcc handles super badly. If the copy was done piecewise, each element
would be a SSA variable and optimizations would work. Even if the copy was
done with memcpy there would be code to simplify it. But mem-to-mem...
I strongly encourage you to report this testcase to gcc's bugzilla.
(it doesn't mean that people can't work around it in eigen somehow, but
that will likely not be nice and not catch all cases)