still cannot reproduce with gcc: -- default - gcc47 - Core2 Q9400 @2.66GHz -- Time (in seconds): Preprocessor 0.093 Residual Evaluations 0.117 Jacobian Evaluations 1.067 Linear Solver 0.809 Minimizer 2.237 Postprocessor 0.005 Total 2.371 -- CERES_NO_CUSTOM_BLAS - gcc47 - Core2 Q9400 @2.66GHz -- Time (in seconds): Preprocessor 0.089 Residual Evaluations 0.108 Jacobian Evaluations 1.054 Linear Solver 0.803 Minimizer 2.206 Postprocessor 0.005 Total 2.335 -- default - gcc47 - Xeon X5570 @2.93GHz -- Time (in seconds): Preprocessor 0.067 Residual Evaluations 0.085 Jacobian Evaluations 0.720 Linear Solver 0.600 Minimizer 1.557 Postprocessor 0.001 Total 1.645 -- CERES_NO_CUSTOM_BLAS - gcc47 - Xeon X5570 @2.93GHz -- Time (in seconds): Preprocessor 0.067 Residual Evaluations 0.085 Jacobian Evaluations 0.734 Linear Solver 0.599 Minimizer 1.570 Postprocessor 0.001 Total 1.658 gael On Wed, Apr 3, 2013 at 5:58 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> wrote: > In case there is still interest, the change has been merged into the master > branch. > Sameer > > > > > On Tue, Apr 2, 2013 at 12:25 PM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> > wrote: >> >> On Keir's suggestion, I have updated this CL to optionally compile Eigen >> based routines in and out. >> >> passing -DCUSTOM_BLAS=ON/OFF to cmake switches between custom loops and >> eigen inside blas.h >> >> Sameer >> >> >> >> On Tue, Apr 2, 2013 at 11:42 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> >> wrote: >>> >>> Here is the gerrit CL that is used for generating these numbers >>> >>> https://ceres-solver-review.googlesource.com/#/c/2870/ >>> >>> Sameer >>> >>> >>> >>> On Tue, Apr 2, 2013 at 11:34 AM, Sameer Agarwal >>> <sameeragarwal@xxxxxxxxxx> wrote: >>>> >>>> Gael and Christoph, >>>> >>>> Thank you for looking into this. >>>> >>>> Yes adding -mllvm -inline-threshold=600 makes the timing of Eigen >>>> comparable to CUSTOM_GEMM. >>>> >>>> However, I went ahead and replaced all use of small block operations in >>>> the eliminator with simple gemm and gemv implementations. And the time has >>>> dropped even further. Which would not be the case if inlining were the only >>>> thing at work here. >>>> >>>> With the increased inlining 1.02s >>>> With custom blas 0.634s >>>> >>>> I get roughy similar numbers with g++4.2 on macos. I also tested this on >>>> linux with g++ 4.6.3, where the linear solver time goes from 0.8 to .5 >>>> seconds. >>>> >>>> Sameer >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Apr 2, 2013 at 5:23 AM, Gael Guennebaud >>>> <gael.guennebaud@xxxxxxxxx> wrote: >>>>> >>>>> On Tue, Apr 2, 2013 at 1:58 PM, Gael Guennebaud >>>>> <gael.guennebaud@xxxxxxxxx> wrote: >>>>> > After adding a few always_inline attributes >>>>> >>>>> An alternative is to add the following compiler option: >>>>> >>>>> -mllvm -inline-threshold=600 >>>>> >>>>> gael >>>>> >>>>> >>>> >>> >> >

