|Re: [eigen] Parallel matrix multiplication causes heap allocation|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
> Did someone have a look at what blaze  does? They seem to be pretty advanced regarding parallelism -- they also parallelize "simple" things like vector addition, which if we trust their benchmarks  seems to be beneficial starting at something like 50000 doubles
You should not trust benchmarks, especially where they have been done by people who wrote the software :-)
Here is my attempt at this game, multiplying 10 000 x 10 000 "double" matrices on a Dual Xeon 2660-v4 (Haswell), 2x14 cores, Broadwell, 2400 Mhz memory, with the latest release of every single library (as of today) :
- MKL: 740 GFlops
- OpenBLAS: 540 GFlops
- Eigen: 440 GFlops
- Blaze: 440 GFlops
This machine is capable of: 2 x 14 (cores) x 2 (2 FMA ports) x 2 (FMA) x 4 (AVX2) x 2 (GHz) = 896 GFlops. Hyperthreading is turned off and the CPU frequency is blocked at 2 GHz.
On a MacBook Pro (2014), with a 4 core Haswell, I get:
- MKL: 170 GFlops
- Eigen: 130 GFlops
For Blaze, for things such as vector addition, they also use streaming stores to speed up the process.