Re: [eigen] Performance regression with Matrix4f multiplication?

[ Thread Index | Date Index | More Archives ]

In some cases your loop gets over-optimized by the compiler leading to inconsistent results depending on compiler version and flags. See attached file for a more correct version. Also, better use 3.3.4 than 3.3.0.


On Sun, Nov 26, 2017 at 5:03 AM, Ryo Miyajima <sergeant.wizard@xxxxxxxxx> wrote:

I was testing Matrix4{d,f} multiplication performance across different Eigen versions and found that since 3.3.0, the Matrix4f multiplication speed slowed down significantly when compiled with `-march=native` flag in gcc.
The performance deteriorated on Core i5 and Core i7 but not on a Xeon..
Is this expected behavior (because for example, Eigen optimizes for larger matrices than 4x4), or am I doing something wrong like not providing the right compilation flag?

The benchmarks are in the following repo and can be reproduced by docker images:
The assembly code is also provided in the repository.

I searched through bugtracker and this may or may not be related to this issue:

Thanks in advance for your help.

Ryo Miyajima

#include <Eigen/Core>
#include <benchmark/benchmark.h>

static const int num_iterations = 1000;

template<typename A, typename B, typename C>
void prod(const A& a, const B& b, C& c)
  c.noalias() += a * b;

template<class T>
static inline void BM_EigenMatrix4(benchmark::State& state) {
    Eigen::Matrix<T, 4, 4> mat1 = Eigen::Matrix<T, 4, 4>::Random(4, 4);
    Eigen::Matrix<T, 4, 4> mat2 = Eigen::Matrix<T, 4, 4>::Random(4, 4);
    Eigen::Matrix<T, 4, 4> mat3;
    for (auto _ : state) {
        for (int i = 0; i < num_iterations; ++i) {
            prod(mat1, mat2, mat3);

BENCHMARK_TEMPLATE(BM_EigenMatrix4, float);
BENCHMARK_TEMPLATE(BM_EigenMatrix4, double);

Mail converted by MHonArc 2.6.19+