Re: [eigen] Slow matrix-matrix multiply |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [eigen] Slow matrix-matrix multiply
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Wed, 3 Apr 2013 10:19:57 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=cHd9BxZxgzM9lNOgRlj4s4NkI5Yp/8v70cp/yQJtfTw=; b=QhsGRpLcjQIhk9VwdcFu2SsLNogfQFGqv6vgu0SYEMv3Qu0SH1HV4ZLCh0vhU1BN/C ffIBj3kTSGZgHTSCO0YlfXwzeMzKy5gEke57CPgMLz8RwW43I6584MiBFHwkocN0oROu pdFbNEInKnH/p04M8qcxP+K/Wnzpa3mH4oA2L39rK/sOgrZBfiNkxVpXrc3S6xEYcKGb cIR1mH57UTQV9OrZ20NskuHxvD1hBf8g4zRz5v5fzqdPUU/2FX07OuljpexiIsdYd42t epW3xC5P0xFMTrXN5Q9VpTCGrvjVReRuUI5ZJONEkZ+XdHLMWtBjTVVkPEoOGu33eHB+ A04w==
I also have difficulties to observe significant differences with
Apple's default compiler:
-- default - clang - macbookpro --
Time (in seconds):
Preprocessor 0.043
Residual Evaluations 0.076
Jacobian Evaluations 0.866
Linear Solver 0.740
Minimizer 1.816
Postprocessor 0.002
Total 1.906
-- CERES_NO_CUSTOM_BLAS - clang-inline-threshold - macbookpro --
Time (in seconds):
Preprocessor 0.043
Residual Evaluations 0.070
Jacobian Evaluations 0.859
Linear Solver 0.779
Minimizer 1.837
Postprocessor 0.002
Total 1.926
-- CERES_NO_CUSTOM_BLAS - clang - macbookpro --
Time (in seconds):
Preprocessor 0.043
Residual Evaluations 0.075
Jacobian Evaluations 0.863
Linear Solver 0.896
Minimizer 1.970
Postprocessor 0.002
Total 2.060
On Wed, Apr 3, 2013 at 9:42 AM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> still cannot reproduce with gcc:
>
> -- default - gcc47 - Core2 Q9400 @2.66GHz --
>
> Time (in seconds):
> Preprocessor 0.093
>
> Residual Evaluations 0.117
> Jacobian Evaluations 1.067
> Linear Solver 0.809
> Minimizer 2.237
>
> Postprocessor 0.005
> Total 2.371
>
> -- CERES_NO_CUSTOM_BLAS - gcc47 - Core2 Q9400 @2.66GHz --
>
> Time (in seconds):
> Preprocessor 0.089
>
> Residual Evaluations 0.108
> Jacobian Evaluations 1.054
> Linear Solver 0.803
> Minimizer 2.206
>
> Postprocessor 0.005
> Total 2.335
>
>
> -- default - gcc47 - Xeon X5570 @2.93GHz --
>
> Time (in seconds):
> Preprocessor 0.067
>
> Residual Evaluations 0.085
> Jacobian Evaluations 0.720
> Linear Solver 0.600
> Minimizer 1.557
>
> Postprocessor 0.001
> Total 1.645
>
> -- CERES_NO_CUSTOM_BLAS - gcc47 - Xeon X5570 @2.93GHz --
>
> Time (in seconds):
> Preprocessor 0.067
>
> Residual Evaluations 0.085
> Jacobian Evaluations 0.734
> Linear Solver 0.599
> Minimizer 1.570
>
> Postprocessor 0.001
> Total 1.658
>
> gael
>
> On Wed, Apr 3, 2013 at 5:58 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> wrote:
>> In case there is still interest, the change has been merged into the master
>> branch.
>> Sameer
>>
>>
>>
>>
>> On Tue, Apr 2, 2013 at 12:25 PM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx>
>> wrote:
>>>
>>> On Keir's suggestion, I have updated this CL to optionally compile Eigen
>>> based routines in and out.
>>>
>>> passing -DCUSTOM_BLAS=ON/OFF to cmake switches between custom loops and
>>> eigen inside blas.h
>>>
>>> Sameer
>>>
>>>
>>>
>>> On Tue, Apr 2, 2013 at 11:42 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx>
>>> wrote:
>>>>
>>>> Here is the gerrit CL that is used for generating these numbers
>>>>
>>>> https://ceres-solver-review.googlesource.com/#/c/2870/
>>>>
>>>> Sameer
>>>>
>>>>
>>>>
>>>> On Tue, Apr 2, 2013 at 11:34 AM, Sameer Agarwal
>>>> <sameeragarwal@xxxxxxxxxx> wrote:
>>>>>
>>>>> Gael and Christoph,
>>>>>
>>>>> Thank you for looking into this.
>>>>>
>>>>> Yes adding -mllvm -inline-threshold=600 makes the timing of Eigen
>>>>> comparable to CUSTOM_GEMM.
>>>>>
>>>>> However, I went ahead and replaced all use of small block operations in
>>>>> the eliminator with simple gemm and gemv implementations. And the time has
>>>>> dropped even further. Which would not be the case if inlining were the only
>>>>> thing at work here.
>>>>>
>>>>> With the increased inlining 1.02s
>>>>> With custom blas 0.634s
>>>>>
>>>>> I get roughy similar numbers with g++4.2 on macos. I also tested this on
>>>>> linux with g++ 4.6.3, where the linear solver time goes from 0.8 to .5
>>>>> seconds.
>>>>>
>>>>> Sameer
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 2, 2013 at 5:23 AM, Gael Guennebaud
>>>>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>>>>
>>>>>> On Tue, Apr 2, 2013 at 1:58 PM, Gael Guennebaud
>>>>>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>>>> > After adding a few always_inline attributes
>>>>>>
>>>>>> An alternative is to add the following compiler option:
>>>>>>
>>>>>> -mllvm -inline-threshold=600
>>>>>>
>>>>>> gael
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>