Re: [eigen] Status of non-OpenMP-based multithreading

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]




2016-03-07 3:54 GMT-05:00 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:


On Fri, Mar 4, 2016 at 9:45 PM, Benoit Steiner <benoit.steiner.goog@xxxxxxxxx> wrote:
This is correct, we currently rely on c++11 threads, mutexes and condition variables to implement the thread pool use in the tensor module. The main reason to do this is that c++11 is now very commonly available, so this makes the code very portable (at least more so that having to implement the functionality for every target platform).

A lot if not all the functionality is already abstracted: for example, the Notification class wraps the std::condition_variable, and the ThreadPool hides the std::thread class. It should be possible to reimplement both classes using boost to remove the dependency on c++11. 

I guess, that one could also implement one on top of openmp.
 
Last but not least, all the multithreading code is guarded by a EIGEN_USE_THREADS #define. I think it's pretty reasonable to require either c++11 or boost to be able to enable this #define.

What about enabling threads by default and are rather add a EIGEN_NO_THREADS?

I actually like having threading opt-in. This also fits better with having multiple coexisting paths, letting the user choose between them; indeed, I think it's important here to acknowledge the absence of a single universal solution: neither OpenMP nor C++11 are universal as far as Eigen is concerned, and both hide enough 'magic' in their implementation that we can easily imagine having users running into trouble on a particular implementation (concretely, at least some Android toolchains have a very poorly performing implementation of C++11 threads).

There is also the option of using pthreads directly. It's not universal either, but outside of Windows, it's a lot more universal than either OpenMP or C++11 threads, and concretely, precisely because it exposes a more direct mapping to OS-level threading primitives, it's at an inherent advantage in situations where we have to trick the OS into giving more throughput than it normally would, which is the case typically on mobile platforms (where the OS often favors power savings over throughput).

Here's a story of the things we ended up doing in to get a multi-threaded GEMM to perform well on Android (near-linear scaling to 4 cores at size ~= 300), and here's the code; it's quite likely that those things could have been done as well with C++11 threads, but that would have been much harder to debug in that more abstract context, and the result would have partly defeated the abstraction of c++11 threads. It would have been very cumbersome to implement such tricks with OpenMP, if OpenMP had been available on that platform.

Cheers,
Benoit
 


gael



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/