|Re: [eigen] Parallel matrix multiplication causes heap allocation|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
> On 19 Dec 2016, at 18:31, Jeff Hammond <jeff.science@xxxxxxxxx> wrote:
> More than just that, OpenMP runtimes are nontrivial beasts to control and any multithreaded performance data that does not include a complete list of compiler and runtime versions, affinity information, complete processor details, and OS+distro version should be viewed with skepticism.
> For example, most OpenMP runtimes do not set affinity by default, and I've seen this reduce performance by ~2x in DGEMM, and once affinity is enabled, breadth- vs depth-first placement makes a large difference in some cases.
For my tests with the MKL, I have used the MKL multithreaded with TBB which gives consistent results.
I never managed to get consistent results with OpenMP, even with KMP_AFFINITY set to compact or scatter.