Re: [eigen] Slow matrix-matrix multiply |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
For small dynamic-sizes matrices, I agree there is room for optimization. However, for small fixed-sizes matrices, Eigen should already be at least as fast as a naive implementation. I can also reproduce the performance drop with linux/gcc-4.7. However, the generated assembly in both cases are extremely similar (see the attached files), with even an advantage to Eigen with only 18 additions compared to 27 for custom_gemm. Frankly, I cannot explain the perf difference. Side note: it's amazing to see how compilers became good at loop unrolling. Clearly, this was not the case at the time we started Eigen. gael On Tue, Apr 2, 2013 at 10:26 AM, Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote: > On 02.04.2013 03:21, Sameer Agarwal wrote: >> >> We replaced one of the more frequently called eigen expressions with a >> simple three loop GEMM implementation (with some template sizing tricks) >> and it instantly gives us >10% speedups. Doing the same to two other GEMM >> expressions givs us an overall 30% speedup. The sizes of the matrices >> involved is fairly small; in our benchmark, our matrices are of sizes 6x3, >> 3x3, 3x6, and are sized at compile time. > > > Yes, small matrices have very much room for optimization, see this bug: > > http://eigen.tuxfamily.org/bz/show_bug.cgi?id=404 > For small fixed sizes it should be possible to solve this with template > specializations (i.e. fall back to text-book GEMM, if vectorization/blocking > gives no benefit). > > > Another thing that bugs me are that dynamic matrices (even if only one > dimension is dynamic and the other fixed and small) always fall back to the > generic matrix multiplication which is mostly optimized for very large > products. > > Maybe it would be possible to fall back to a very simple "three loop GEMM" > if the sizes are small. This could be checked at runtime or indicated by the > user somehow (maybe configurable by a compile flag). If a program only uses > small matrix products this might also reduce the binary size noticeably. > > > Christoph > > -- > ---------------------------------------------- > Dipl.-Inf., Dipl.-Math. Christoph Hertzberg > Cartesium 0.049 > Universität Bremen > Enrique-Schmidt-Straße 5 > 28359 Bremen > > Tel: +49 (421) 218-64252 > ---------------------------------------------- > >
Attachment:
custom_gemm_2_3_9.S
Description: Binary data
Attachment:
eigen_gemm_2_3_9.S
Description: Binary data
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |