Re: [eigen] Parallelization of SparseMatrix * dense

Dear all,

I hope that this new implementation also improves the scaling of Eigen solvers, e.g. conjugate gradient and biconjugate gradient. What I usually get is a factor 2 speed up independently from the number of threads.

Thanks for the effort.

Best regards.

Lorenzo

On Mon, Sep 13, 2021, 17:52 Rasmus Munk Larsen <rmlarsen@xxxxxxxxxx> wrote:

Hi Peter,

I would recommend that you hold off on this change for a bit. We are planning on moving the non-blocking threadpool from unsupported into core in the near future and use that as the basis for multi-threading without depending on OpenMP.

Rasmus

On Mon, Sep 13, 2021 at 5:21 AM Peter <list@xxxxxxxxxxxxxxxxx> wrote:
Dear all,

I'm currently playing with the sparse matrix implementation within eigen,
namely Eigen::SparseMatrix<double, Eigen::ColMajor> combined with a self adjoint view.

In my application I need the multiplication of a symmetric sparse matrix with a dense matrix,
where the dimension of the matrices are of the order of up to a few ten thousands.

I tried to parallelize the dot product in SparseSelfAdjointView.h by copying a omp directive
from other parts in eigen, and I tried to parallelize the outer loop directly
with std::execution::par, see below.

In summary, I do not see any effect of the parallelization.
Before digging deeper into it and building a minimal working example,
has someone already achieved this?

Could one actually directly call the corresponding MKL routine or are the internal storage schemes different?

Best regards
Peter

P.S.: That's what I tried as a diff to the current master branch:

:~/Eigen/eigen/Eigen/src/SparseCore$ git diff SparseSelfAdjointView.h
diff --git a/Eigen/src/SparseCore/SparseSelfAdjointView.h b/Eigen/src/SparseCore/SparseSelfAdjointView.h
index 0302ef3a4..91e7d495b 100644
--- a/Eigen/src/SparseCore/SparseSelfAdjointView.h
+++ b/Eigen/src/SparseCore/SparseSelfAdjointView.h
@@ -10,6 +10,11 @@
#ifndef EIGEN_SPARSE_SELFADJOINTVIEW_H
#define EIGEN_SPARSE_SELFADJOINTVIEW_H

+#include <iostream> /// only for debugging
+#include <boost/range/irange.hpp>
+
+#include <execution>
+
#include "./InternalHeaderCheck.h"

namespace Eigen {
@@ -295,8 +300,19 @@ inline void sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons
SparseLhsTypeNested lhs_nested(lhs);
LhsEval lhsEval(lhs_nested);

+ Eigen::initParallel();
+ Index threads = Eigen::nbThreads();
+
+ std::cout << "\ndot threads: "<< threads << " rhs-cols: " << rhs.cols() << std::endl;
+
+ // #pragma omp parallel for
+ // #pragma omp parallel for schedule(dynamic,(rhs.cols()+threads*4-1)/(threads*4)) num_threads(threads)
+ // for (Index k=0; k<rhs.cols(); ++k)
+
+ auto r = boost::irange(rhs.cols());
+
+ std::for_each(std::execution::par, r.begin(), r.end(), [&](Index k)
// work on one column at once
- for (Index k=0; k<rhs.cols(); ++k)
{
for (Index j=0; j<lhs.outerSize(); ++j)
{
@@ -330,6 +346,7 @@ inline void sparse_selfadjoint_time_dense_product(const SparseLhsType& lhs, cons
res.coeffRef(j,k) += alpha * i.value() * rhs.coeff(j,k);
}
}
+ );
}