Re: [eigen] Slow matrix-matrix multiply |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [eigen] Slow matrix-matrix multiply
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Wed, 3 Apr 2013 09:42:22 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=/cPiU06o01iNstI6QN4WSqXUCq1q1z529l1visDVPQU=; b=g8sq0ebDvEwMYasU64Qvi5M/nsTYuOLki8xw1NPjrYyjsvI42Y8F9kgoeN7FJttJW2 0lQpGOFpX27tfkOgnCYrojWdiNef0oe0OeJxCotbIahikmHLT9jGfZfRjNn6dsSX0HS6 Ct6WTlfp2taDGheeszjWSEPsUu/C1ClV+K1qaZDe+60T2SmGVkMr21071lbmpv8/ggNP TWyjztvMuPFc17x9rKkiPWKqGhVXvunbX0bslHd8CtvFJ148LU7zK7LAjT9pTeFLSi9J JiAPF/nK7uoqnhBDOaT/e00U4d3+xcRQPMS66hEESuSdCxKxHonlaW/qqg3rHSfn/TKB faRw==
still cannot reproduce with gcc:
-- default - gcc47 - Core2 Q9400 @2.66GHz --
Time (in seconds):
Preprocessor 0.093
Residual Evaluations 0.117
Jacobian Evaluations 1.067
Linear Solver 0.809
Minimizer 2.237
Postprocessor 0.005
Total 2.371
-- CERES_NO_CUSTOM_BLAS - gcc47 - Core2 Q9400 @2.66GHz --
Time (in seconds):
Preprocessor 0.089
Residual Evaluations 0.108
Jacobian Evaluations 1.054
Linear Solver 0.803
Minimizer 2.206
Postprocessor 0.005
Total 2.335
-- default - gcc47 - Xeon X5570 @2.93GHz --
Time (in seconds):
Preprocessor 0.067
Residual Evaluations 0.085
Jacobian Evaluations 0.720
Linear Solver 0.600
Minimizer 1.557
Postprocessor 0.001
Total 1.645
-- CERES_NO_CUSTOM_BLAS - gcc47 - Xeon X5570 @2.93GHz --
Time (in seconds):
Preprocessor 0.067
Residual Evaluations 0.085
Jacobian Evaluations 0.734
Linear Solver 0.599
Minimizer 1.570
Postprocessor 0.001
Total 1.658
gael
On Wed, Apr 3, 2013 at 5:58 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx> wrote:
> In case there is still interest, the change has been merged into the master
> branch.
> Sameer
>
>
>
>
> On Tue, Apr 2, 2013 at 12:25 PM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx>
> wrote:
>>
>> On Keir's suggestion, I have updated this CL to optionally compile Eigen
>> based routines in and out.
>>
>> passing -DCUSTOM_BLAS=ON/OFF to cmake switches between custom loops and
>> eigen inside blas.h
>>
>> Sameer
>>
>>
>>
>> On Tue, Apr 2, 2013 at 11:42 AM, Sameer Agarwal <sameeragarwal@xxxxxxxxxx>
>> wrote:
>>>
>>> Here is the gerrit CL that is used for generating these numbers
>>>
>>> https://ceres-solver-review.googlesource.com/#/c/2870/
>>>
>>> Sameer
>>>
>>>
>>>
>>> On Tue, Apr 2, 2013 at 11:34 AM, Sameer Agarwal
>>> <sameeragarwal@xxxxxxxxxx> wrote:
>>>>
>>>> Gael and Christoph,
>>>>
>>>> Thank you for looking into this.
>>>>
>>>> Yes adding -mllvm -inline-threshold=600 makes the timing of Eigen
>>>> comparable to CUSTOM_GEMM.
>>>>
>>>> However, I went ahead and replaced all use of small block operations in
>>>> the eliminator with simple gemm and gemv implementations. And the time has
>>>> dropped even further. Which would not be the case if inlining were the only
>>>> thing at work here.
>>>>
>>>> With the increased inlining 1.02s
>>>> With custom blas 0.634s
>>>>
>>>> I get roughy similar numbers with g++4.2 on macos. I also tested this on
>>>> linux with g++ 4.6.3, where the linear solver time goes from 0.8 to .5
>>>> seconds.
>>>>
>>>> Sameer
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Apr 2, 2013 at 5:23 AM, Gael Guennebaud
>>>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>>>
>>>>> On Tue, Apr 2, 2013 at 1:58 PM, Gael Guennebaud
>>>>> <gael.guennebaud@xxxxxxxxx> wrote:
>>>>> > After adding a few always_inline attributes
>>>>>
>>>>> An alternative is to add the following compiler option:
>>>>>
>>>>> -mllvm -inline-threshold=600
>>>>>
>>>>> gael
>>>>>
>>>>>
>>>>
>>>
>>
>