Re: [eigen] Slow matrix-matrix multiply |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen <eigen@xxxxxxxxxxxxxxxxxxx>*Subject*: Re: [eigen] Slow matrix-matrix multiply*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Tue, 2 Apr 2013 11:26:42 +0200*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=p0PLHQigisGmKAKc2xe00d14K8USGFwL6nuMEtG/sUo=; b=oF58U6QcTj5LFRhcx+znwMOlQEqEyx9sVRGqHHGkziUkdINuJs2ydV3+DwAjQRPp55 OKoVlNSjMnUPOHFl7bgSV35NtoI46X2d3KGhGLSFmadcz5pqoMhYl4bKx3WtsRb9uVNO TvSLnOPJl90tYDpII0fzm5qMajKgfoEG+wTkXhg7psVIG4tTJDbfiTNMFIR9taUjHlbr zgJjJ7STYE7nIJAH93wW6oQEhppHwE65Kp4AmUM4jPe9SGDCcsJxggHWDkggseU5cuCf 7FtOAls+mcgXwQ9ujL4LEUoSZDg/ZcHap9WCqJ1zgo/78x5l2ZLF8Ez+qdQ6pOcNNXEU Z0pg==

For small dynamic-sizes matrices, I agree there is room for optimization. However, for small fixed-sizes matrices, Eigen should already be at least as fast as a naive implementation. I can also reproduce the performance drop with linux/gcc-4.7. However, the generated assembly in both cases are extremely similar (see the attached files), with even an advantage to Eigen with only 18 additions compared to 27 for custom_gemm. Frankly, I cannot explain the perf difference. Side note: it's amazing to see how compilers became good at loop unrolling. Clearly, this was not the case at the time we started Eigen. gael On Tue, Apr 2, 2013 at 10:26 AM, Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: > On 02.04.2013 03:21, Sameer Agarwal wrote: >> >> We replaced one of the more frequently called eigen expressions with a >> simple three loop GEMM implementation (with some template sizing tricks) >> and it instantly gives us >10% speedups. Doing the same to two other GEMM >> expressions givs us an overall 30% speedup. The sizes of the matrices >> involved is fairly small; in our benchmark, our matrices are of sizes 6x3, >> 3x3, 3x6, and are sized at compile time. > > > Yes, small matrices have very much room for optimization, see this bug: > > http://eigen.tuxfamily.org/bz/show_bug.cgi?id=404 > For small fixed sizes it should be possible to solve this with template > specializations (i.e. fall back to text-book GEMM, if vectorization/blocking > gives no benefit). > > > Another thing that bugs me are that dynamic matrices (even if only one > dimension is dynamic and the other fixed and small) always fall back to the > generic matrix multiplication which is mostly optimized for very large > products. > > Maybe it would be possible to fall back to a very simple "three loop GEMM" > if the sizes are small. This could be checked at runtime or indicated by the > user somehow (maybe configurable by a compile flag). If a program only uses > small matrix products this might also reduce the binary size noticeably. > > > Christoph > > -- > ---------------------------------------------- > Dipl.-Inf., Dipl.-Math. Christoph Hertzberg > Cartesium 0.049 > Universität Bremen > Enrique-Schmidt-Straße 5 > 28359 Bremen > > Tel: +49 (421) 218-64252 > ---------------------------------------------- > >

**Attachment:
custom_gemm_2_3_9.S**

**Attachment:
eigen_gemm_2_3_9.S**

**Follow-Ups**:**Re: [eigen] Slow matrix-matrix multiply***From:*Christoph Hertzberg

**Re: [eigen] Slow matrix-matrix multiply***From:*Gael Guennebaud

**References**:**[eigen] Slow matrix-matrix multiply***From:*Sameer Agarwal

**Re: [eigen] Slow matrix-matrix multiply***From:*Christoph Hertzberg

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Slow matrix-matrix multiply** - Next by Date:
**Re: [eigen] Slow matrix-matrix multiply** - Previous by thread:
**Re: [eigen] Slow matrix-matrix multiply** - Next by thread:
**Re: [eigen] Slow matrix-matrix multiply**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |