Re: [eigen] Slow matrix-matrix multiply

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


For small dynamic-sizes matrices, I agree there is room for
optimization. However, for small fixed-sizes matrices, Eigen should
already be at least as fast as a naive implementation.

I can also reproduce the performance drop with linux/gcc-4.7. However,
the generated assembly in both cases are extremely similar (see the
attached files), with even an advantage to Eigen with only 18
additions compared to 27 for custom_gemm. Frankly, I cannot explain
the perf difference.

Side note: it's amazing to see how compilers became good at loop
unrolling. Clearly, this was not the case at the time we started
Eigen.


gael

On Tue, Apr 2, 2013 at 10:26 AM, Christoph Hertzberg
<chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> On 02.04.2013 03:21, Sameer Agarwal wrote:
>>
>> We replaced one of the more frequently called eigen expressions with a
>> simple three loop GEMM implementation (with some template sizing tricks)
>> and it instantly gives us >10% speedups. Doing the same to two other GEMM
>> expressions givs us an overall 30% speedup. The sizes of the matrices
>> involved is fairly small; in our benchmark, our matrices are of sizes 6x3,
>> 3x3, 3x6, and are sized at compile time.
>
>
> Yes, small matrices have very much room for optimization, see this bug:
>
> http://eigen.tuxfamily.org/bz/show_bug.cgi?id=404
> For small fixed sizes it should be possible to solve this with template
> specializations (i.e. fall back to text-book GEMM, if vectorization/blocking
> gives no benefit).
>
>
> Another thing that bugs me are that dynamic matrices (even if only one
> dimension is dynamic and the other fixed and small) always fall back to the
> generic matrix multiplication which is mostly optimized for very large
> products.
>
> Maybe it would be possible to fall back to a very simple "three loop GEMM"
> if the sizes are small. This could be checked at runtime or indicated by the
> user somehow (maybe configurable by a compile flag). If a program only uses
> small matrix products this might also reduce the binary size noticeably.
>
>
> Christoph
>
> --
> ----------------------------------------------
> Dipl.-Inf., Dipl.-Math. Christoph Hertzberg
> Cartesium 0.049
> Universität Bremen
> Enrique-Schmidt-Straße 5
> 28359 Bremen
>
> Tel: +49 (421) 218-64252
> ----------------------------------------------
>
>

Attachment: custom_gemm_2_3_9.S
Description: Binary data

Attachment: eigen_gemm_2_3_9.S
Description: Binary data



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/