Re: [eigen] Status of non-OpenMP-based multithreading

[ Thread Index | Date Index | More Archives ]

Here are some numbers for matrix-matrix product (OpenMP) versus tensor contraction (C++11) for 100^2 matrices/ 2D tensors, 4 threads, OSX on Haswell:

gcc 4.7: 0.0438714ms / 0.165256ms
gcc 5  : 0.0452736ms / 0.169825ms
clang 3.7: 0.0287972ms / 0.163851ms

As the matrix sizes increase, the two versions converge to same perf.

The performance of small contractions is almost entirely linked to the performance of the thread pool, which currently performs very poorly on some systems. We're working on a fix and we're measuring significant preliminary gains. For example, using internal benchmarks we're measuring a 10x reduction in total cpu time and a 5x reduction in wall time for the contraction of 2 64x64 2d tensors using 6 threads.

Mail converted by MHonArc 2.6.19+