Re: [eigen] Slow matrix-matrix multiply |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [eigen] Slow matrix-matrix multiply
- From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
- Date: Tue, 2 Apr 2013 13:58:12 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=vgJWC6A6doU8gTffVSYm6R95zcQTAdutXBM77ogX3KU=; b=lppu34VN5MeDpLWZipMnfnzCaUXd6q0lOcbKOukNTU/uaUypPSxkRQ60Sf9aYLdvg4 21qB+/c90gn+hUtLqXW+03uJtjeTUxnnzHaUAcNlGEbyc4XwVBtxBRQ6fkYYxt8880xd DGEjc5ln1XS1i7cRI5eK4PozkyTSs0Bih1jbigIj/aLVZRn+/fAVva858YAkk6zzFDLt EUR23LWcIpl/QfCDVmIGkiRs683A7p8vrEUaVmhVElbqgRgOPpMSKp/al3Jl3MpVwlY5 X+9x4YAmylrrO/xLCmTGRljIu5Yu4vFBTdXwqqsybgbASWR/K2oBgNaoC/HWOuAHo70I 3MiA==
Ok, so the problem you hit with clang is simply that clang/llvm does a
bad job at inlining. After adding a few always_inline attributes, I
get:
-- Eigen with always_inline --
Time (in seconds):
Preprocessor 0.042
Residual Evaluations 0.074
Jacobian Evaluations 0.872
Linear Solver 1.458
Minimizer 2.539
Postprocessor 0.002
Total 2.628
-- custom gemm --
Time (in seconds):
Preprocessor 0.043
Residual Evaluations 0.075
Jacobian Evaluations 0.862
Linear Solver 1.540
Minimizer 2.612
Postprocessor 0.002
Total 2.702
So as with gcc, Eigen is faster. Need to find a cleaner workaround though.
gael
On Tue, Apr 2, 2013 at 1:00 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> On Tue, Apr 2, 2013 at 11:26 AM, Gael Guennebaud
> <gael.guennebaud@xxxxxxxxx> wrote:
>> I can also reproduce the performance drop with linux/gcc-4.7. However,
>> the generated assembly in both cases are extremely similar (see the
>> attached files), with even an advantage to Eigen with only 18
>> additions compared to 27 for custom_gemm. Frankly, I cannot explain
>> the perf difference.
>
> oops, actually my system was a bit too loaded and the results too
> random. Stable results with gcc4.7 on an Intel(R) Xeon(R) CPU X5570 @
> 2.93GHz:
>
> -- Eigen --
>
> Time (in seconds):
> Preprocessor 0.050
>
> Residual Evaluations 0.077
> Jacobian Evaluations 0.695
> Linear Solver 0.945
> Minimizer 1.839
>
> Postprocessor 0.001
> Total 1.907
>
>
>
> -- Custom GEMM --
>
> Time (in seconds):
> Preprocessor 0.067
>
> Residual Evaluations 0.085
> Jacobian Evaluations 0.712
> Linear Solver 0.952
> Minimizer 1.901
>
> Postprocessor 0.001
> Total 1.990
>
>
> gael