Re: [eigen] Slow matrix-matrix multiply

To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [eigen] Slow matrix-matrix multiply
From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
Date: Wed, 3 Apr 2013 10:19:57 +0200

I also have difficulties to observe significant differences with Apple's default compiler: -- default - clang - macbookpro -- Time (in seconds): Preprocessor 0.043 Residual Evaluations 0.076 Jacobian Evaluations 0.866 Linear Solver 0.740 Minimizer 1.816 Postprocessor 0.002 Total 1.906 -- CERES_NO_CUSTOM_BLAS - clang-inline-threshold - macbookpro -- Time (in seconds): Preprocessor 0.043 Residual Evaluations 0.070 Jacobian Evaluations 0.859 Linear Solver 0.779 Minimizer 1.837 Postprocessor 0.002 Total 1.926 -- CERES_NO_CUSTOM_BLAS - clang - macbookpro -- Time (in seconds): Preprocessor 0.043 Residual Evaluations 0.075 Jacobian Evaluations 0.863 Linear Solver 0.896 Minimizer 1.970 Postprocessor 0.002 Total 2.060 On Wed, Apr 3, 2013 at 9:42 AM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote: > still cannot reproduce with gcc: > > -- default - gcc47 - Core2 Q9400 @2.66GHz -- > > Time (in seconds): > Preprocessor 0.093 > > Residual Evaluations 0.117 > Jacobian Evaluations 1.067 > Linear Solver 0.809 > Minimizer 2.237 > > Postprocessor 0.005 > Total 2.371 > > -- CERES_NO_CUSTOM_BLAS - gcc47 - Core2 Q9400 @2.66GHz -- > > Time (in seconds): > Preprocessor 0.089 > > Residual Evaluations 0.108 > Jacobian Evaluations 1.054 > Linear Solver 0.803 > Minimizer 2.206 > > Postprocessor 0.005 > Total 2.335 > > > -- default - gcc47 - Xeon X5570 @2.93GHz -- > > Time (in seconds): > Preprocessor 0.067 > > Residual Evaluations 0.085 > Jacobian Evaluations 0.720 > Linear Solver 0.600 > Minimizer 1.557 > > Postprocessor 0.001 > Total 1.645 > > -- CERES_NO_CUSTOM_BLAS - gcc47 - Xeon X5570 @2.93GHz -- > > Time (in seconds): > Preprocessor 0.067 > > Residual Evaluations 0.085 > Jacobian Evaluations 0.734 > Linear Solver 0.599 > Minimizer 1.570 > > Postprocessor 0.001 > Total 1.658 > > gael > > On Wed, Apr 3, 2013 at 5:58 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> wrote: >> In case there is still interest, the change has been merged into the master >> branch. >> Sameer >> >> >> >> >> On Tue, Apr 2, 2013 at 12:25 PM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> >> wrote: >>> >>> On Keir's suggestion, I have updated this CL to optionally compile Eigen >>> based routines in and out. >>> >>> passing -DCUSTOM_BLAS=ON/OFF to cmake switches between custom loops and >>> eigen inside blas.h >>> >>> Sameer >>> >>> >>> >>> On Tue, Apr 2, 2013 at 11:42 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> >>> wrote: >>>> >>>> Here is the gerrit CL that is used for generating these numbers >>>> >>>> https://ceres-solver-review.googlesource.com/#/c/2870/ >>>> >>>> Sameer >>>> >>>> >>>> >>>> On Tue, Apr 2, 2013 at 11:34 AM, Sameer Agarwal >>>> <sameeragarwal@xxxxxxxxxx> wrote: >>>>> >>>>> Gael and Christoph, >>>>> >>>>> Thank you for looking into this. >>>>> >>>>> Yes adding -mllvm -inline-threshold=600 makes the timing of Eigen >>>>> comparable to CUSTOM_GEMM. >>>>> >>>>> However, I went ahead and replaced all use of small block operations in >>>>> the eliminator with simple gemm and gemv implementations. And the time has >>>>> dropped even further. Which would not be the case if inlining were the only >>>>> thing at work here. >>>>> >>>>> With the increased inlining 1.02s >>>>> With custom blas 0.634s >>>>> >>>>> I get roughy similar numbers with g++4.2 on macos. I also tested this on >>>>> linux with g++ 4.6.3, where the linear solver time goes from 0.8 to .5 >>>>> seconds. >>>>> >>>>> Sameer >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Apr 2, 2013 at 5:23 AM, Gael Guennebaud >>>>> <gael.guennebaud@xxxxxxxxx> wrote: >>>>>> >>>>>> On Tue, Apr 2, 2013 at 1:58 PM, Gael Guennebaud >>>>>> <gael.guennebaud@xxxxxxxxx> wrote: >>>>>> > After adding a few always_inline attributes >>>>>> >>>>>> An alternative is to add the following compiler option: >>>>>> >>>>>> -mllvm -inline-threshold=600 >>>>>> >>>>>> gael >>>>>> >>>>>> >>>>> >>>> >>> >>

