Re: [eigen] Slow matrix-matrix multiply

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Ok, so the problem you hit with clang is simply that clang/llvm does a
bad job at inlining. After adding a few always_inline attributes, I
get:

-- Eigen with always_inline --

Time (in seconds):
Preprocessor                            0.042

  Residual Evaluations                  0.074
  Jacobian Evaluations                  0.872
  Linear Solver                         1.458
Minimizer                               2.539

Postprocessor                           0.002
Total                                   2.628

-- custom gemm --

Time (in seconds):
Preprocessor                            0.043

  Residual Evaluations                  0.075
  Jacobian Evaluations                  0.862
  Linear Solver                         1.540
Minimizer                               2.612

Postprocessor                           0.002
Total                                   2.702

So as with gcc, Eigen is faster. Need to find a cleaner workaround though.

gael


On Tue, Apr 2, 2013 at 1:00 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> On Tue, Apr 2, 2013 at 11:26 AM, Gael Guennebaud
> <gael.guennebaud@xxxxxxxxx> wrote:
>> I can also reproduce the performance drop with linux/gcc-4.7. However,
>> the generated assembly in both cases are extremely similar (see the
>> attached files), with even an advantage to Eigen with only 18
>> additions compared to 27 for custom_gemm. Frankly, I cannot explain
>> the perf difference.
>
> oops, actually my system was a bit too loaded and the results too
> random. Stable results with gcc4.7 on an Intel(R) Xeon(R) CPU X5570  @
> 2.93GHz:
>
> -- Eigen --
>
> Time (in seconds):
> Preprocessor                            0.050
>
>   Residual Evaluations                  0.077
>   Jacobian Evaluations                  0.695
>   Linear Solver                         0.945
> Minimizer                               1.839
>
> Postprocessor                           0.001
> Total                                   1.907
>
>
>
> -- Custom GEMM --
>
> Time (in seconds):
> Preprocessor                            0.067
>
>   Residual Evaluations                  0.085
>   Jacobian Evaluations                  0.712
>   Linear Solver                         0.952
> Minimizer                               1.901
>
> Postprocessor                           0.001
> Total                                   1.990
>
>
> gael



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/