Ok, so the problem you hit with clang is simply that clang/llvm does a bad job at inlining. After adding a few always_inline attributes, I get: -- Eigen with always_inline -- Time (in seconds): Preprocessor 0.042 Residual Evaluations 0.074 Jacobian Evaluations 0.872 Linear Solver 1.458 Minimizer 2.539 Postprocessor 0.002 Total 2.628 -- custom gemm -- Time (in seconds): Preprocessor 0.043 Residual Evaluations 0.075 Jacobian Evaluations 0.862 Linear Solver 1.540 Minimizer 2.612 Postprocessor 0.002 Total 2.702 So as with gcc, Eigen is faster. Need to find a cleaner workaround though. gael On Tue, Apr 2, 2013 at 1:00 PM, Gael Guennebaud <gael.guennebaud@xxxxxxxxx> wrote: > On Tue, Apr 2, 2013 at 11:26 AM, Gael Guennebaud > <gael.guennebaud@xxxxxxxxx> wrote: >> I can also reproduce the performance drop with linux/gcc-4.7. However, >> the generated assembly in both cases are extremely similar (see the >> attached files), with even an advantage to Eigen with only 18 >> additions compared to 27 for custom_gemm. Frankly, I cannot explain >> the perf difference. > > oops, actually my system was a bit too loaded and the results too > random. Stable results with gcc4.7 on an Intel(R) Xeon(R) CPU X5570 @ > 2.93GHz: > > -- Eigen -- > > Time (in seconds): > Preprocessor 0.050 > > Residual Evaluations 0.077 > Jacobian Evaluations 0.695 > Linear Solver 0.945 > Minimizer 1.839 > > Postprocessor 0.001 > Total 1.907 > > > > -- Custom GEMM -- > > Time (in seconds): > Preprocessor 0.067 > > Residual Evaluations 0.085 > Jacobian Evaluations 0.712 > Linear Solver 0.952 > Minimizer 1.901 > > Postprocessor 0.001 > Total 1.990 > > > gael

