Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen

On Fri, Apr 6, 2018 at 3:06 AM, Edward Lam <edward@xxxxxxxxxx> wrote:

On a related topic, I have similar questions about this with the solver and matrix class too. What happens if the matrix is destroyed before the solver in Eigen? I've always been careful that doesn't happen when I write code because I'm not sure of the consequences.

Basically, no dependency for direct solvers whereas iterative solvers store a reference. There are red warnings in the reference doc, e.g.:

https://eigen.tuxfamily.org/dox/classEigen_1_1ConjugateGradient.html#ac10f778fcd137eca1f6057c8ddd3d644

https://eigen.tuxfamily.org/dox/classEigen_1_1IterativeSolverBase.html#a7dfa55c55e82d697bde227696a630914

but I agree that that's not clear from the user manual pages.

gael

-Edward

...

Actually, this last example is not conceivable at the moment because the facto
Anyways, all these variants are essentially syntactic sugar/implementation details, and at this stage I'm more concerned about possible shortcomings in the general approach. Is it flexible enough? I'm also not fan of introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a better offer for now that is generic and easily applicable to decompositions.

gael

However, if this is required, I would suggest to add this method not only to matrices but also to the decomposition (and pass it through to the internal matrices). Otherwise, this does not scale for decompositions which use more than one matrix (like SparseLU). And we could even let the sparseAnalyze() functions return a proxy to the decomposition which would allow writing something like:

x = llt.sparseAnalyzeAndRun(100)->solve(b);
// equivalent to
llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});

or
{
auto llt_ = llt.sparseAnalyzeAndRun(100);
x = llt_-> solve(b);
y = llt_-> matrixL() * c;
} // destructor of llt_ calls `endAnalyze`

The naming of this method is still debatable, of course.
And I have no idea what actually happens inside MKL when it 'analyzes' an operation (after how many iterations do you actually benefit from the overhead of analyzing the operation?)

Christoph

Cheers,
-Edward

On 4/5/2018 8:01 AM, Gael Guennebaud wrote:

Thank you for opening this discussion on the public mailing list.

So let's discuss about the public API, which currently is not very convenient as already noticed by others. Issues are:

(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very generic.
   - We need a way to control:
(i2) - which operations are going to be analyzed/optimized,
(i3) - and specify the 'expected_calls' parameter.

In order to discuss these issues, let's consider the following typical pattern: (e.g., non-linear optimization, eigenvalues, ...)

SimplicialLDLT<SparseMatrix<double> > llt(A);

while(...) {
   ...
x = llt.solve(b);
....
}

Here the triangular L factor is going to be used for triangular and transposed-triangular solves dozens to hundreds of time but only the user of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix. Moreover, the user does not own the SparseMatrix that we want to analyze/optimize for. Other patterns are likely easier to handle, so let's focus on it for now.

Regarding (i1), I would suggest to introduce a new type, say MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance. Then for (i2) and (i3) we could imagine something like:

MklSparseMatrix::beginAnalysis(Index expected_calls) const {
// turn *this to compressed mode
// create handle
// store expected_calls
// enable recording mode
}
MklSparseMatrix::endAnalysis() const {
// disable recording mode
// [optional] call mkl_sparse_optimize
}

All states in MklSparseMatrix would be mutable.

Between a pair of beginAnalysis/endAnalysis each call to a supported operation would trigger calls to mkl_sparse_set_*_hint()/mkl_sparse_optimize..
Optionally, we could even add a "dryrun" mode for which no operation would be performed, only calls to mkl_sparse_set_*_hint() and then mkl_sparse_optimize would be called in endAnalysis(). This way mkl_sparse_optimize() would be called only once.

And that's it. Our example would look-like:

SimplicialLDLT<MklSparseMatrix<double> > llt(A);
int it=0;
while(...) {
   ...
   if(it==0) llt.matrixL().beginAnalysis(100);
x = llt.solve(b);
if(it==0) llt.matrixL().endAnalysis();
....
++it;
}

or using a "dry-run" mode:

SimplicialLDLT<MklSparseMatrix<double> > llt(A);

llt.matrixL().beginAnalysis(100, DryRun);
x = llt.solve(b); // permutation and division by the diagonal matrix D would still be performed, but calls to actual triangular solves would be by-passed
llt.matrixL().endAnalysis();

while(...) {
   ...
   x = llt.solve(b);
....
}

If someone directly deal with the factor L, then we could follow the same pattern or copy the SparseMatrix factor L to a MklSparseMatrix:

SimplicialLLT<SparseMatrix<double> > llt(A);

MklSparseMatrix L(llt.matrixL());
L.beginAnalysis(100,DryRun);
y = L.triangularView<Lower>() * x;
L.endAnalysis();
while(...) {
   ...
y = L.triangularView<Lower>() * x;
....
}

This design in quite general and expendable to any sparse-optimizers, even built-in ones in the future.

In contrast to the current proposal, only selected operations would be passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).

What do you think?

gael

On Tue, Apr 3, 2018 at 11:39 PM, Zhukova, Maria <maria.zhukova@xxxxxxxxx <mailto:maria.zhukova@intel.com>> wrote:

    Hello Eigen community,

    My name is Maria Zhukova and I’m a software development engineer at Intel ®
    MKL Sparse team.

    My team is interested in contributing into Eigen, so I’ve investigated our
    possibilities and so far this is what I have:
    Eigen support different operations for sparse matrices stored in CSR and CSC
    format which can be implemented on a basis of IE SpBLAS kernels (please,
    refer to
https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines
<https://software.intel.com/en-us/mkl-developer-reference-c-inspector-executor-sparse-blas-routines>
    for the general idea of interfaces)
    , basically we want to implement calls to our IE SpBLAS into next
    operations:____

                     SparseMatrix + SparseMatrix (mkl_sparse_?_add)
                     SparseMatrix * DenseVector (mkl_sparse_?_mv)____

                     SparseMatrix * DenseMatrix   (mkl_sparse_?_mm)____

                     SparseMatrix * SparseMatrix (mkl_sparse_spmm),
    and Triangular solve (mkl_sparse_?_trsv).____

    I’ve already started with implementation of sparse_time_dense_impl_mkl
    kernel which is based on mkl_sparse_?_mv (included in patch).____

    This is how it will look like for user:
    *#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____

    void main () {
     SparseMatrix<double, RowMajor> A;
     Matrix<double, Dynamic, 1> x, y;

     A.makeCompressed(); /* Convert matrix A into CSR/CSC format */
    *A.createSparseHandle();*/* *NEW*: is used to create handle required for all
    IE SpBLAS routines */____

    // support of IE SpBLAS is here
    y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =
    SPARSE_OPERATION_NON_TRANSPOSE */
    y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with
    operation = SPARSE_OPERATION_TRANSPOSE */
    y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation
    = SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____

    *A.destroySparseHandle();* /* *NEW*: is used to delete created handle */
    }____

    __ __

    I’ve attached a draft patch including all necessary changes and would like
    to hear your feedback.
    Please, let me know if you have any questions and comments.____

    __ __

    Best regards,
    Maria____

    __ __

    __ __

    __ __

--
Dr.-Ing. Christoph Hertzberg

Besuchsadresse der Nebengeschäftsstelle:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 5
28359 Bremen, Germany

Postadresse der Hauptgeschäftsstelle Standort Bremen:
DFKI GmbH
Robotics Innovation Center
Robert-Hooke-Straße 1
28359 Bremen, Germany

Tel.: +49 421 178 45-4021
Zentrale: +49 421 178 45-0
E-Mail: christoph.hertzberg@xxxxxxx

Weitere Informationen: http://www.dfki.de/robotik
-----------------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Sitz der Gesellschaft: Kaiserslautern (HRB 2313)
USt-Id.Nr.: DE 148646973
Steuernummer: 19/672/50006
-----------------------------------------------------------------------