Re: [eigen] Slow matrix-matrix multiply |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen <eigen@xxxxxxxxxxxxxxxxxxx>*Subject*: Re: [eigen] Slow matrix-matrix multiply*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Wed, 3 Apr 2013 10:19:57 +0200*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=cHd9BxZxgzM9lNOgRlj4s4NkI5Yp/8v70cp/yQJtfTw=; b=QhsGRpLcjQIhk9VwdcFu2SsLNogfQFGqv6vgu0SYEMv3Qu0SH1HV4ZLCh0vhU1BN/C ffIBj3kTSGZgHTSCO0YlfXwzeMzKy5gEke57CPgMLz8RwW43I6584MiBFHwkocN0oROu pdFbNEInKnH/p04M8qcxP+K/Wnzpa3mH4oA2L39rK/sOgrZBfiNkxVpXrc3S6xEYcKGb cIR1mH57UTQV9OrZ20NskuHxvD1hBf8g4zRz5v5fzqdPUU/2FX07OuljpexiIsdYd42t epW3xC5P0xFMTrXN5Q9VpTCGrvjVReRuUI5ZJONEkZ+XdHLMWtBjTVVkPEoOGu33eHB+ A04w==

I also have difficulties to observe significant differences with Apple's default compiler: -- default - clang - macbookpro -- Time (in seconds): Preprocessor 0.043 Residual Evaluations 0.076 Jacobian Evaluations 0.866 Linear Solver 0.740 Minimizer 1.816 Postprocessor 0.002 Total 1.906 -- CERES_NO_CUSTOM_BLAS - clang-inline-threshold - macbookpro -- Time (in seconds): Preprocessor 0.043 Residual Evaluations 0.070 Jacobian Evaluations 0.859 Linear Solver 0.779 Minimizer 1.837 Postprocessor 0.002 Total 1.926 -- CERES_NO_CUSTOM_BLAS - clang - macbookpro -- Time (in seconds): Preprocessor 0.043 Residual Evaluations 0.075 Jacobian Evaluations 0.863 Linear Solver 0.896 Minimizer 1.970 Postprocessor 0.002 Total 2.060 On Wed, Apr 3, 2013 at 9:42 AM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote: > still cannot reproduce with gcc: > > -- default - gcc47 - Core2 Q9400 @2.66GHz -- > > Time (in seconds): > Preprocessor 0.093 > > Residual Evaluations 0.117 > Jacobian Evaluations 1.067 > Linear Solver 0.809 > Minimizer 2.237 > > Postprocessor 0.005 > Total 2.371 > > -- CERES_NO_CUSTOM_BLAS - gcc47 - Core2 Q9400 @2.66GHz -- > > Time (in seconds): > Preprocessor 0.089 > > Residual Evaluations 0.108 > Jacobian Evaluations 1.054 > Linear Solver 0.803 > Minimizer 2.206 > > Postprocessor 0.005 > Total 2.335 > > > -- default - gcc47 - Xeon X5570 @2.93GHz -- > > Time (in seconds): > Preprocessor 0.067 > > Residual Evaluations 0.085 > Jacobian Evaluations 0.720 > Linear Solver 0.600 > Minimizer 1.557 > > Postprocessor 0.001 > Total 1.645 > > -- CERES_NO_CUSTOM_BLAS - gcc47 - Xeon X5570 @2.93GHz -- > > Time (in seconds): > Preprocessor 0.067 > > Residual Evaluations 0.085 > Jacobian Evaluations 0.734 > Linear Solver 0.599 > Minimizer 1.570 > > Postprocessor 0.001 > Total 1.658 > > gael > > On Wed, Apr 3, 2013 at 5:58 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> wrote: >> In case there is still interest, the change has been merged into the master >> branch. >> Sameer >> >> >> >> >> On Tue, Apr 2, 2013 at 12:25 PM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> >> wrote: >>> >>> On Keir's suggestion, I have updated this CL to optionally compile Eigen >>> based routines in and out. >>> >>> passing -DCUSTOM_BLAS=ON/OFF to cmake switches between custom loops and >>> eigen inside blas.h >>> >>> Sameer >>> >>> >>> >>> On Tue, Apr 2, 2013 at 11:42 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> >>> wrote: >>>> >>>> Here is the gerrit CL that is used for generating these numbers >>>> >>>> https://ceres-solver-review.googlesource.com/#/c/2870/ >>>> >>>> Sameer >>>> >>>> >>>> >>>> On Tue, Apr 2, 2013 at 11:34 AM, Sameer Agarwal >>>> <sameeragarwal@xxxxxxxxxx> wrote: >>>>> >>>>> Gael and Christoph, >>>>> >>>>> Thank you for looking into this. >>>>> >>>>> Yes adding -mllvm -inline-threshold=600 makes the timing of Eigen >>>>> comparable to CUSTOM_GEMM. >>>>> >>>>> However, I went ahead and replaced all use of small block operations in >>>>> the eliminator with simple gemm and gemv implementations. And the time has >>>>> dropped even further. Which would not be the case if inlining were the only >>>>> thing at work here. >>>>> >>>>> With the increased inlining 1.02s >>>>> With custom blas 0.634s >>>>> >>>>> I get roughy similar numbers with g++4.2 on macos. I also tested this on >>>>> linux with g++ 4.6.3, where the linear solver time goes from 0.8 to .5 >>>>> seconds. >>>>> >>>>> Sameer >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Apr 2, 2013 at 5:23 AM, Gael Guennebaud >>>>> <gael.guennebaud@xxxxxxxxx> wrote: >>>>>> >>>>>> On Tue, Apr 2, 2013 at 1:58 PM, Gael Guennebaud >>>>>> <gael.guennebaud@xxxxxxxxx> wrote: >>>>>> > After adding a few always_inline attributes >>>>>> >>>>>> An alternative is to add the following compiler option: >>>>>> >>>>>> -mllvm -inline-threshold=600 >>>>>> >>>>>> gael >>>>>> >>>>>> >>>>> >>>> >>> >>

**Follow-Ups**:**Re: [eigen] Slow matrix-matrix multiply***From:*Sameer Agarwal

**References**:**[eigen] Slow matrix-matrix multiply***From:*Sameer Agarwal

**Re: [eigen] Slow matrix-matrix multiply***From:*Christoph Hertzberg

**Re: [eigen] Slow matrix-matrix multiply***From:*Gael Guennebaud

**Re: [eigen] Slow matrix-matrix multiply***From:*Gael Guennebaud

**Re: [eigen] Slow matrix-matrix multiply***From:*Gael Guennebaud

**Re: [eigen] Slow matrix-matrix multiply***From:*Gael Guennebaud

**Re: [eigen] Slow matrix-matrix multiply***From:*Sameer Agarwal

**Re: [eigen] Slow matrix-matrix multiply***From:*Sameer Agarwal

**Re: [eigen] Slow matrix-matrix multiply***From:*Sameer Agarwal

**Re: [eigen] Slow matrix-matrix multiply***From:*Sameer Agarwal

**Re: [eigen] Slow matrix-matrix multiply***From:*Gael Guennebaud

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Slow matrix-matrix multiply** - Next by Date:
**Re: [eigen] Slow matrix-matrix multiply** - Previous by thread:
**Re: [eigen] Slow matrix-matrix multiply** - Next by thread:
**Re: [eigen] Slow matrix-matrix multiply**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |