I keep running into Eigen users who can't get good multithreading, either because their toolchain doesn't support OpenMP, or because they run into performance bugs in their OpenMP implementation.
I would like to know what is the current status of non-OpenMP-based multithreading in Eigen?
In particular, does the Tensor module offer its own implementation of a multi-threaded GEMM? Can that be merged into Core?