Re: [eigen] Slow matrix-matrix multiply

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [eigen] Slow matrix-matrix multiply
From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
Date: Tue, 2 Apr 2013 11:26:42 +0200
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=p0PLHQigisGmKAKc2xe00d14K8USGFwL6nuMEtG/sUo=; b=oF58U6QcTj5LFRhcx+znwMOlQEqEyx9sVRGqHHGkziUkdINuJs2ydV3+DwAjQRPp55 OKoVlNSjMnUPOHFl7bgSV35NtoI46X2d3KGhGLSFmadcz5pqoMhYl4bKx3WtsRb9uVNO TvSLnOPJl90tYDpII0fzm5qMajKgfoEG+wTkXhg7psVIG4tTJDbfiTNMFIR9taUjHlbr zgJjJ7STYE7nIJAH93wW6oQEhppHwE65Kp4AmUM4jPe9SGDCcsJxggHWDkggseU5cuCf 7FtOAls+mcgXwQ9ujL4LEUoSZDg/ZcHap9WCqJ1zgo/78x5l2ZLF8Ez+qdQ6pOcNNXEU Z0pg==

For small dynamic-sizes matrices, I agree there is room for
optimization. However, for small fixed-sizes matrices, Eigen should
already be at least as fast as a naive implementation.

I can also reproduce the performance drop with linux/gcc-4.7. However,
the generated assembly in both cases are extremely similar (see the
attached files), with even an advantage to Eigen with only 18
additions compared to 27 for custom_gemm. Frankly, I cannot explain
the perf difference.

Side note: it's amazing to see how compilers became good at loop
unrolling. Clearly, this was not the case at the time we started
Eigen.


gael

On Tue, Apr 2, 2013 at 10:26 AM, Christoph Hertzberg
<chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> On 02.04.2013 03:21, Sameer Agarwal wrote:
>>
>> We replaced one of the more frequently called eigen expressions with a
>> simple three loop GEMM implementation (with some template sizing tricks)
>> and it instantly gives us >10% speedups. Doing the same to two other GEMM
>> expressions givs us an overall 30% speedup. The sizes of the matrices
>> involved is fairly small; in our benchmark, our matrices are of sizes 6x3,
>> 3x3, 3x6, and are sized at compile time.
>
>
> Yes, small matrices have very much room for optimization, see this bug:
>
> http://eigen.tuxfamily.org/bz/show_bug.cgi?id=404
> For small fixed sizes it should be possible to solve this with template
> specializations (i.e. fall back to text-book GEMM, if vectorization/blocking
> gives no benefit).
>
>
> Another thing that bugs me are that dynamic matrices (even if only one
> dimension is dynamic and the other fixed and small) always fall back to the
> generic matrix multiplication which is mostly optimized for very large
> products.
>
> Maybe it would be possible to fall back to a very simple "three loop GEMM"
> if the sizes are small. This could be checked at runtime or indicated by the
> user somehow (maybe configurable by a compile flag). If a program only uses
> small matrix products this might also reduce the binary size noticeably.
>
>
> Christoph
>
> --
> ----------------------------------------------
> Dipl.-Inf., Dipl.-Math. Christoph Hertzberg
> Cartesium 0.049
> Universität Bremen
> Enrique-Schmidt-Straße 5
> 28359 Bremen
>
> Tel: +49 (421) 218-64252
> ----------------------------------------------
>
>

Attachment: custom_gemm_2_3_9.S
Description: Binary data

Attachment: eigen_gemm_2_3_9.S
Description: Binary data

Follow-Ups:
- Re: [eigen] Slow matrix-matrix multiply
  - From: Christoph Hertzberg
- Re: [eigen] Slow matrix-matrix multiply
  - From: Gael Guennebaud

References:
- [eigen] Slow matrix-matrix multiply
  - From: Sameer Agarwal
- Re: [eigen] Slow matrix-matrix multiply
  - From: Christoph Hertzberg

Messages sorted by: [ date | thread ]
Prev by Date: Re: [eigen] Slow matrix-matrix multiply
Next by Date: Re: [eigen] Slow matrix-matrix multiply
Previous by thread: Re: [eigen] Slow matrix-matrix multiply
Next by thread: Re: [eigen] Slow matrix-matrix multiply

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/