Re: [eigen] Parallel matrix multiplication causes heap allocation

[ Thread Index | Date Index | More Archives ]

> Did someone have a look at what blaze [1] does? They seem to be pretty advanced regarding parallelism -- they also parallelize "simple" things like vector addition, which if we trust their benchmarks [2] seems to be beneficial starting at something like 50000 doubles

You should not trust benchmarks, especially where they have been done by people who wrote the software :-)

Here is my attempt at this game, multiplying 10 000 x 10 000 "double" matrices on a Dual Xeon 2660-v4 (Haswell), 2x14 cores, Broadwell, 2400 Mhz memory, with the latest release of every single library (as of today) :
- MKL: 740 GFlops
- OpenBLAS: 540 GFlops
- Eigen: 440 GFlops
- Blaze: 440 GFlops
This machine is capable of: 2 x 14 (cores) x 2 (2 FMA ports) x 2 (FMA) x 4 (AVX2) x 2 (GHz) = 896 GFlops. Hyperthreading is turned off and the CPU frequency is blocked at 2 GHz.
On a MacBook Pro (2014), with a 4 core Haswell, I get:
- MKL: 170 GFlops
- Eigen: 130 GFlops

For Blaze, for things such as vector addition, they also use streaming stores to speed up the process.


Mail converted by MHonArc 2.6.19+