Re: [eigen] Slow matrix-matrix multiply

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Gael,

I need to run some experiments on my laptop and desktop, and I will post what I think are statistically meaningful numbers(with multiple problems), but its going to take me a bit of time.

Sameer



On Wed, Apr 3, 2013 at 1:19 AM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote:
I also have difficulties to observe significant differences with
Apple's default compiler:

-- default - clang - macbookpro --

Time (in seconds):
Preprocessor                            0.043

  Residual Evaluations                  0.076
  Jacobian Evaluations                  0.866
  Linear Solver                         0.740
Minimizer                               1.816

Postprocessor                           0.002
Total                                   1..906


-- CERES_NO_CUSTOM_BLAS - clang-inline-threshold - macbookpro --

Time (in seconds):
Preprocessor                            0.043

  Residual Evaluations                  0.070
  Jacobian Evaluations                  0.859
  Linear Solver                         0.779
Minimizer                               1.837

Postprocessor                           0.002
Total                                   1..926


-- CERES_NO_CUSTOM_BLAS - clang - macbookpro --

Time (in seconds):
Preprocessor                            0.043

  Residual Evaluations                  0.075
  Jacobian Evaluations                  0.863
  Linear Solver                         0.896
Minimizer                               1.970

Postprocessor                           0.002
Total                                   2..060

On Wed, Apr 3, 2013 at 9:42 AM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> still cannot reproduce with gcc:
>
> -- default - gcc47 - Core2 Q9400 @2.66GHz --
>
> Time (in seconds):
> Preprocessor                            0.093
>
>   Residual Evaluations                  0.117
>   Jacobian Evaluations                  1.067
>   Linear Solver                         0.809
> Minimizer                               2.237
>
> Postprocessor                           0.005
> Total                                   2.371
>
> -- CERES_NO_CUSTOM_BLAS - gcc47 - Core2 Q9400 @2.66GHz --
>
> Time (in seconds):
> Preprocessor                            0.089
>
>   Residual Evaluations                  0.108
>   Jacobian Evaluations                  1.054
>   Linear Solver                         0.803
> Minimizer                               2.206
>
> Postprocessor                           0.005
> Total                                   2.335
>
>
> -- default - gcc47 - Xeon X5570 @2.93GHz --
>
> Time (in seconds):
> Preprocessor                            0.067
>
>   Residual Evaluations                  0.085
>   Jacobian Evaluations                  0.720
>   Linear Solver                         0.600
> Minimizer                               1.557
>
> Postprocessor                           0.001
> Total                                   1.645
>
> -- CERES_NO_CUSTOM_BLAS - gcc47 - Xeon X5570 @2.93GHz --
>
> Time (in seconds):
> Preprocessor                            0.067
>
>   Residual Evaluations                  0.085
>   Jacobian Evaluations                  0.734
>   Linear Solver                         0.599
> Minimizer                               1.570
>
> Postprocessor                           0.001
> Total                                   1.658
>
> gael
>
> On Wed, Apr 3, 2013 at 5:58 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> wrote:
>> In case there is still interest, the change has been merged into the master
>> branch.
>> Sameer
>>
>>
>>
>>
>> On Tue, Apr 2, 2013 at 12:25 PM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx>
>> wrote:
>>>
>>> On Keir's suggestion, I have updated this CL to optionally compile Eigen
>>> based routines in and out.
>>>
>>> passing -DCUSTOM_BLAS=ON/OFF to cmake switches between custom loops and
>>> eigen inside blas.h
>>>
>>> Sameer
>>>
>>>
>>>
>>> On Tue, Apr 2, 2013 at 11:42 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx>
>>> wrote:
>>>>
>>>> Here is the gerrit CL that is used for generating these numbers
>>>>
>>>> https://ceres-solver-review.googlesource.com/#/c/2870/
>>>>
>>>> Sameer
>>>>
>>>>
>>>>
>>>> On Tue, Apr 2, 2013 at 11:34 AM, Sameer Agarwal
>>>> <sameeragarwal@xxxxxxxxxx> wrote:
>>>>>
>>>>> Gael and Christoph,
>>>>>
>>>>> Thank you for looking into this.
>>>>>
>>>>> Yes adding -mllvm -inline-threshold=600 makes the timing of Eigen
>>>>> comparable to CUSTOM_GEMM.
>>>>>
>>>>> However, I went ahead and replaced all use of small block operations in
>>>>> the eliminator with simple gemm and gemv implementations. And the time has
>>>>> dropped even further.  Which would not be the case if inlining were the only
>>>>> thing at work here.
>>>>>
>>>>> With the increased inlining 1.02s
>>>>> With custom blas            0.634s
>>>>>
>>>>> I get roughy similar numbers with g++4.2 on macos. I also tested this on
>>>>> linux with g++ 4.6.3, where the linear solver time goes from 0.8 to .5
>>>>> seconds.
>>>>>
>>>>> Sameer
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 2, 2013 at 5:23 AM, Gael Guennebaud
>>>>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>>>>
>>>>>> On Tue, Apr 2, 2013 at 1:58 PM, Gael Guennebaud
>>>>>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>>>> > After adding a few always_inline attributes
>>>>>>
>>>>>> An alternative is to add the following compiler option:
>>>>>>
>>>>>> -mllvm -inline-threshold=600
>>>>>>
>>>>>> gael
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>





Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/