Re: [eigen] Slow matrix-matrix multiply |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
I also have difficulties to observe significant differences with
Apple's default compiler:
-- default - clang - macbookpro --
Time (in seconds):
Preprocessor 0.043
Residual Evaluations 0.076
Jacobian Evaluations 0.866
Linear Solver 0.740
Minimizer 1.816
Postprocessor 0.002
Total 1..906
-- CERES_NO_CUSTOM_BLAS - clang-inline-threshold - macbookpro --
Time (in seconds):
Preprocessor 0.043
Residual Evaluations 0.070
Jacobian Evaluations 0.859
Linear Solver 0.779
Minimizer 1.837
Postprocessor 0.002
Total 1..926
-- CERES_NO_CUSTOM_BLAS - clang - macbookpro --
Time (in seconds):
Preprocessor 0.043Jacobian Evaluations 0.863
Residual Evaluations 0.075
Linear Solver 0.896
Minimizer 1.970
Postprocessor 0.002
Total 2..060
On Wed, Apr 3, 2013 at 9:42 AM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> still cannot reproduce with gcc:
>
> -- default - gcc47 - Core2 Q9400 @2.66GHz --
>
> Time (in seconds):
> Preprocessor 0.093
>
> Residual Evaluations 0.117
> Jacobian Evaluations 1.067
> Linear Solver 0.809
> Minimizer 2.237
>
> Postprocessor 0.005
> Total 2.371
>
> -- CERES_NO_CUSTOM_BLAS - gcc47 - Core2 Q9400 @2.66GHz --
>
> Time (in seconds):
> Preprocessor 0.089
>
> Residual Evaluations 0.108
> Jacobian Evaluations 1.054
> Linear Solver 0.803
> Minimizer 2.206
>
> Postprocessor 0.005
> Total 2.335
>
>
> -- default - gcc47 - Xeon X5570 @2.93GHz --
>
> Time (in seconds):
> Preprocessor 0.067
>
> Residual Evaluations 0.085
> Jacobian Evaluations 0.720
> Linear Solver 0.600
> Minimizer 1.557
>
> Postprocessor 0.001
> Total 1.645
>
> -- CERES_NO_CUSTOM_BLAS - gcc47 - Xeon X5570 @2.93GHz --
>
> Time (in seconds):
> Preprocessor 0.067
>
> Residual Evaluations 0.085
> Jacobian Evaluations 0.734
> Linear Solver 0.599
> Minimizer 1.570
>
> Postprocessor 0.001
> Total 1.658
>
> gael
>
> On Wed, Apr 3, 2013 at 5:58 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> wrote:
>> In case there is still interest, the change has been merged into the master
>> branch.
>> Sameer
>>
>>
>>
>>
>> On Tue, Apr 2, 2013 at 12:25 PM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx>
>> wrote:
>>>
>>> On Keir's suggestion, I have updated this CL to optionally compile Eigen
>>> based routines in and out.
>>>
>>> passing -DCUSTOM_BLAS=ON/OFF to cmake switches between custom loops and
>>> eigen inside blas.h
>>>
>>> Sameer
>>>
>>>
>>>
>>> On Tue, Apr 2, 2013 at 11:42 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx>
>>> wrote:
>>>>
>>>> Here is the gerrit CL that is used for generating these numbers
>>>>
>>>> https://ceres-solver-review.googlesource.com/#/c/2870/
>>>>
>>>> Sameer
>>>>
>>>>
>>>>
>>>> On Tue, Apr 2, 2013 at 11:34 AM, Sameer Agarwal
>>>> <sameeragarwal@xxxxxxxxxx> wrote:
>>>>>
>>>>> Gael and Christoph,
>>>>>
>>>>> Thank you for looking into this.
>>>>>
>>>>> Yes adding -mllvm -inline-threshold=600 makes the timing of Eigen
>>>>> comparable to CUSTOM_GEMM.
>>>>>
>>>>> However, I went ahead and replaced all use of small block operations in
>>>>> the eliminator with simple gemm and gemv implementations. And the time has
>>>>> dropped even further. Which would not be the case if inlining were the only
>>>>> thing at work here.
>>>>>
>>>>> With the increased inlining 1.02s
>>>>> With custom blas 0.634s
>>>>>
>>>>> I get roughy similar numbers with g++4.2 on macos. I also tested this on
>>>>> linux with g++ 4.6.3, where the linear solver time goes from 0.8 to .5
>>>>> seconds.
>>>>>
>>>>> Sameer
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 2, 2013 at 5:23 AM, Gael Guennebaud
>>>>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>>>>
>>>>>> On Tue, Apr 2, 2013 at 1:58 PM, Gael Guennebaud
>>>>>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>>>> > After adding a few always_inline attributes
>>>>>>
>>>>>> An alternative is to add the following compiler option:
>>>>>>
>>>>>> -mllvm -inline-threshold=600
>>>>>>
>>>>>> gael
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |