This is something that frustrates me as well. The tensor library has its own
threadpool implementation and most operations are multi-threaded. In particular it provides a multi-threaded contraction, which is a super set of GEMM - Benoit Steiner can give more details. I would be in favor making this mechanism available to the rest of Eigen (maybe implement a simple ParallelFor mechanism on top of it for developers to use), such that we can provide multi-threading independent of OpenMP. I'd be happy to help..
Rasmus