Re: [eigen] Slow matrix-matrix multiply

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


On 02.04.2013 11:26, Gael Guennebaud wrote:
For small dynamic-sizes matrices, I agree there is room for
optimization. However, for small fixed-sizes matrices, Eigen should
already be at least as fast as a naive implementation.

I can also reproduce the performance drop with linux/gcc-4.7. However,
the generated assembly in both cases are extremely similar (see the
attached files), with even an advantage to Eigen with only 18
additions compared to 27 for custom_gemm. Frankly, I cannot explain
the perf difference.

Did you (or anybody else) checked how well instruction latencies are compensated? An interesting tool for that seems to be this (never tried it myself, though):
http://software.intel.com/en-us/articles/intel-architecture-code-analyzer-download/

Bad thing when going down to that level of optimization is that latencies are quite CPU dependent. My favorite resource for that:
http://www.agner.org/optimize/instruction_tables.pdf

Christoph


--
----------------------------------------------
Dipl.-Inf., Dipl.-Math. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: +49 (421) 218-64252
----------------------------------------------



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/