On 02.04.2013 03:21, Sameer Agarwal wrote:

We replaced one of the more frequently called eigen expressions with a
simple three loop GEMM implementation (with some template sizing tricks)
and it instantly gives us >10% speedups. Doing the same to two other GEMM
expressions givs us an overall 30% speedup. The sizes of the matrices
involved is fairly small; in our benchmark, our matrices are of sizes 6x3,
3x3, 3x6, and are sized at compile time.

Yes, small matrices have very much room for optimization, see this bug:
http://eigen.tuxfamily.org/bz/show_bug.cgi?id=404

`For small fixed sizes it should be possible to solve this with template
``specializations (i.e. fall back to text-book GEMM, if
``vectorization/blocking gives no benefit).
`

`Another thing that bugs me are that dynamic matrices (even if only one
``dimension is dynamic and the other fixed and small) always fall back to
``the generic matrix multiplication which is mostly optimized for very
``large products.
`

`Maybe it would be possible to fall back to a very simple "three loop
``GEMM" if the sizes are small. This could be checked at runtime or
``indicated by the user somehow (maybe configurable by a compile flag). If
``a program only uses small matrix products this might also reduce the
``binary size noticeably.
`
