Re: [eigen] Slow matrix-matrix multiply

For small dynamic-sizes matrices, I agree there is room for
optimization. However, for small fixed-sizes matrices, Eigen should
already be at least as fast as a naive implementation.

I can also reproduce the performance drop with linux/gcc-4.7. However,
the generated assembly in both cases are extremely similar (see the
attached files), with even an advantage to Eigen with only 18
additions compared to 27 for custom_gemm. Frankly, I cannot explain
the perf difference.

Side note: it's amazing to see how compilers became good at loop
unrolling. Clearly, this was not the case at the time we started


On Tue, Apr 2, 2013 at 10:26 AM, Christoph Hertzberg
wrote:
> On 02.04.2013 03:21, Sameer Agarwal wrote:
>> We replaced one of the more frequently called eigen expressions with a
>> simple three loop GEMM implementation (with some template sizing tricks)
>> and it instantly gives us >10% speedups. Doing the same to two other GEMM
>> expressions givs us an overall 30% speedup. The sizes of the matrices
>> involved is fairly small; in our benchmark, our matrices are of sizes 6x3,
>> 3x3, 3x6, and are sized at compile time.
> Yes, small matrices have very much room for optimization, see this bug:
> For small fixed sizes it should be possible to solve this with template
> specializations (i.e. fall back to text-book GEMM, if vectorization/blocking
> gives no benefit).
> Another thing that bugs me are that dynamic matrices (even if only one
> dimension is dynamic and the other fixed and small) always fall back to the
> generic matrix multiplication which is mostly optimized for very large
> products.
> Maybe it would be possible to fall back to a very simple "three loop GEMM"
> if the sizes are small. This could be checked at runtime or indicated by the
> user somehow (maybe configurable by a compile flag). If a program only uses
> small matrix products this might also reduce the binary size noticeably.
> Christoph
