Re: [eigen] Status of non-OpenMP-based multithreading

[ Thread Index | Date Index | More Archives ]

2016-03-07 3:52 GMT-05:00 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:


The lack of openmp support would indeed be a good argument for providing alternatives, but which common platforms are not supported yet?

For instance, neither iOS nor Android support OpenMP, so "mobile" is out.

Android is an interesting case because its compilers have been OpenMP-capable for a long time, yet there seems to be no intention to provide a OpenMP runtime.

I understand that MSVC 2015 now supports OpenMP 2.0 and that Clang 3.7 supports OpenMP 3, but even as Apple toolchains are based on OpenMP, there's no indication that there will be an OpenMP runtime on OSX / iOS.

Regarding, the performance issues, again which platforms are suffering from those? I never really observed such issues and some of my HPC colleagues have also moved to OpenMP after performing extensive benchmarks...

Sorry, after some testing, I can't reproduce myself the issues that I heard about; I'll have to ask for more details.

Here are some numbers for matrix-matrix product (OpenMP) versus tensor contraction (C++11) for 100^2 matrices/ 2D tensors, 4 threads, OSX on Haswell:

gcc 4.7: 0.0438714ms / 0.165256ms
gcc 5  : 0.0452736ms / 0.169825ms
clang 3.7: 0.0287972ms / 0.163851ms

Thanks, it's very useful to measure performance at such small-ish sizes as 100x100. That's a range of sizes that I care a lot about.

As the matrix sizes increase, the two versions converge to same perf.

Certainly, for large enough sizes, we fall back into OpenMP's canonical use case. Good to know, too, that it performs well on smaller sizes as well.

So let's focus on the platform support issue: regardless of performance, OpenMP isn't available at least in the default toolchain on Android, iOS, OSX.




On Fri, Mar 4, 2016 at 9:31 PM, Mark Borgerding <mark@xxxxxxxxxxxxxx> wrote:
It looks like TF's threadpool uses c++11 constructs.  Won't that be a barrier to wide acceptance in Eigen?

What about a solution/discipline that uses boost::thread for c++03 and maybe std::thread for c++11? They are very similar.  While boost::thread would be a new dependency for Eigen, it would only be necessary for those still tied to older compilers (like me) and boost::thread might be an easier dependency to swallow than OpenMP.


PS. I echo the dislike for OpenMP.  The implementations I've tried in gcc and intel seem like a collection of half-broken promises.

On 03/04/2016 02:51 PM, Rasmus Larsen wrote:
This is something that frustrates me as well. The tensor library has its own threadpool implementation and most operations are multi-threaded. In particular it provides a multi-threaded contraction, which is a super set of GEMM - Benoit Steiner can give more details. I would be in favor making this mechanism available to the rest of Eigen (maybe implement a simple ParallelFor mechanism on top of it for developers to use), such that we can provide multi-threading independent of OpenMP.  I'd be happy to help.


On Fri, Mar 4, 2016 at 11:33 AM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:

I keep running into Eigen users who can't get good multithreading, either because their toolchain doesn't support OpenMP, or because they run into performance bugs in their OpenMP implementation.

I would like to know what is the current status of non-OpenMP-based multithreading in Eigen?

In particular, does the Tensor module offer its own implementation of a multi-threaded GEMM? Can that be merged into Core?


Mail converted by MHonArc 2.6.19+