[eigen] Slowdown on MSVC 2013

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Guys,

We are observing some strange slowdown with Eigen on MSVC 2013, and the source of this seems to be some strange inlining behaviour that is different between MSVC and Clang.

Bjorn Piltz did some digging and his findings are described below. Any thoughts/suggestions for how we should do things differently would be very useful.

The code for the automatic differentiation lives in

https://ceres-solver.googlesource.com/ceres-solver/+/master/include/ceres/jet.h

Thank you,
Sameer
 
On Thu, Jun 19, 2014 at 10:56 AM, Björn Piltz <bjornpiltz@xxxxxxxxxx> wrote:
Hi,
This is quite interesting. I quickly tested this and can confirm that this is an issue. I think I might have a fix as well.

On the same hardware I got:
Clang:
Time (in seconds):
Preprocessor                            0.085

  Residual evaluation                   0.127
  Jacobian evaluation                   0.748
  Linear solver                         1.122
Minimizer                               2.157

MSVC 2013:
Time (in seconds):
Preprocessor                            0.061

  Residual evaluation                   0.191
  Jacobian evaluation                   4.097
  Linear solver                         3..079
Minimizer                               8.285

I have used the default CMake configuration "Release". This should almost always be enough to produce reasonable code.

I profiled it on Windows and got the following:

The application spends 50% of the time in the following constructor:

template<typename Derived>
Jet(const T& value, const Eigen::DenseBase<Derived> &vIn)
  : a(value),
    v(vIn)
{
}

with different "Derived". This was also always the bottom of the stack. I.e. everything below was inlined out.
This lead me to believe that this is a problem with automatic inlining.. To test this I changed that declaration to 

template<typename Derived>
EIGEN_STRONG_INLINE Jet(const T& value, const Eigen::DenseBase<Derived> &vIn)
  : a(value),
    v(vIn)
{
}

And the timings changed to:

Time (in seconds):
Preprocessor                            0.071

  Residual evaluation                   0.196
  Jacobian evaluation                   1.107
  Linear solver                         2..943
Minimizer                               5.401

Which seems to go a long way to fix the problem, even if it is still way worse than clang.
These kind of things is something the Eigen guys have a lot of experience with, perhaps you can pump them for some input.

I hope that helps!

Björn

www.blikken.de



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/