Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen <eigen@xxxxxxxxxxxxxxxxxxx>*Subject*: Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Fri, 6 Apr 2018 13:42:11 +0200*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=vLm4J9wmnTjZsHoeLO5rv5C7wI6it71dS2G0tlK13pY=; b=c6TLVf6urqWmVP9cIhXjkugDqEDBRUHJcNvW0fFP6nBaHCRdmlVUH3ugmZi+AUBda1 gP4A1+4VAEKxHFWmr8BnLnjls4sPQmfYFmZASzfWdLBkjOZj1K1ZjK6cammAez3KrpfE xnGzvahQtdZrSzMG7nI1jnAj3LA2uLMeTaVBTF1RaJGhi8HqNqx6PyCyd0E9AibVtmhG cDvfldCbnSj6Ws6nkvnIrrA2RyAA59Ul32MwBvbRj9fCJmcW9kGdUA34y6HAz68xS7hc 7IXizmSS+MCqlfaQhgNJK6DzxC1c9mehmwk+A2prgoudeVglyY3u2UCiPJlmx8pJF1hb AsEw==

On Fri, Apr 6, 2018 at 12:49 AM, Zhukova, Maria <maria.zhukova@xxxxxxxxx> wrote:

Hello Gael,

Thanks for your feedback! It’s greatly appreciated.

Regarding your list of issues:

(i1) Our initial suggest was to add all code related to IE SpBLAS together with #ifdef EIGEN_USE_MKL_SPBLAS /*Code is here*/ #endif, so I believe this shouldn’t brake ABI.

This makes binaries compiled with and without MKL support incompatible., so yes it breaks ABI and users have to very careful to enable MKL support in all translation units, all libraries and dependencies...

(i2) Calling mkl_sparse_set_?_hint() and mkl_sparse_optimize() on-the-fly with every operation won’t be a problem and the overhead is expected to be minimal, because internally we’re checking for previous optimizations and what data can be reused. So adding it into the code just before calling any IE SpBLAS computational routine means that we will take care of required optimizations automatically and it won’t be user’s work.

(i3) Surely in addition we can implement any type of MKLOptimize(/*args TBD*/) routine which will be used to set user-specified hints, desired number of calls, required operations, etc.

The reason why we do not want to implement our own type is that it will be much easier for customers to benefit from Intel architecture: the only thing they should take care about is to include SpBLASSupport module and link MKL, all the optimization work will be hidden (together with IE SpBLAS calls). So if there was already an application which used sparse algebra then it will benefit automatically from MKL usage.

I just tried your patch with:

vec1_dense.noalias() += row_major_sparse * vec2_dense

and I observe a very significant drop of performance (almost x10) compared to pure Eigen (with -fopenmp) if this operation is performed only once, or very few times... Changing the 'expected_calls' parameter from 10 to 1 does not reduce the performance drop due to mkl_sparse_optimize. If I repeat this operation hundreds of times, then both Eigen and MKL achieve the same level of performance for this particular operation and the matrices I tried (1M^2 Laplacian, bi-Laplacian, and tri-Laplacian matrices).

So I still believe that silently and unconditionally falling back to IE SpBLAS is not the right approach. It could be that calling MKL's SpBLAS is never a loss (we have to bench), but that's clearly not the case of the optimize step. So, clearly, the optimize step must be explicitly activated on user selected expressions only.

gael

Best regards,

Maria

From:Gael Guennebaud [mailto:gael.guennebaud@gmail.com ]

Sent:Thursday, April 5, 2018 2:54 PM

To:eigen <eigen@xxxxxxxxxxxxxxxxxxx>

Subject:Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen

On Thu, Apr 5, 2018 at 11:00 PM, Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx

> wrote: On 2018-04-05 15:31, Edward Lam wrote:

Would it be useful to incorporate lambda's into the interface to avoid begin/end pairs? So from the user side, we would write code like this instead:

1) Analyze and run

SimplicialLDLT<MklSparseMatrix<double>> llt(A);

int it = 0;

for (int it = 0; ...; ++it) {

if (it == 0)

llt.matrixL().sparseAnalyzeAndRun(100, [&] { llt.solve(b); });

else

llt.solve(b);

// ...

}

2) Analyze only

SimplicialLDLT<MklSparseMatrix<double>> llt(A);

llt.matrixL().sparseAnalyze(100, [&] { llt.solve(b); });

// and solve as usual

For more complicated algorithms, one can always outline the lambda and pass it into the analysis.

That would certainly clean up things a lot. Having to call

A.beginXY();

doStuff();

A.endXY();

is a very C-stylish API, which is error-prone (e.g., one of the calls might not get called, because it is masked by a wrong if-condition, or due to an exception, which is caught only outside).

This can usually be avoided with proper C++ constructs.

The advantages of the begin/end pair are that 1) this keeps the API minimal, and 2) the user can easily wrap them to accommodate its taste and usages. To avoid all sorts of redundancies, one could even write a free function on top the begin/end pair to do:

while(...) {

optimize_if(iter==0, 100, [&]{x=llt.solve(b);}, llt.matrixL());

}

and with variadic templates:

while(...) {

optimize_if(iter==0, 100, [&]{x=LU.solve(b);}, LU.

matrixL(), LU.matrixU()); }

Actually, this last example is not conceivable at the moment because the factors in SparseLU are not stored in a standard compressed format, we would need to add an option to tell SparseLU to turn them to regular SparseMatrix and to use them for solving.

Anyways, all these variants are essentially syntactic sugar/implementation details, and at this stage I'm more concerned about possible shortcomings in the general approach. Is it flexible enough? I'm also not fan of introducing a MklSparseMatrix inheriting SparseMatrix, but I don't have a better offer for now that is generic and easily applicable to decompositions.

gael

However, if this is required, I would suggest to add this method not only to matrices but also to the decomposition (and pass it through to the internal matrices). Otherwise, this does not scale for decompositions which use more than one matrix (like SparseLU). And we could even let the sparseAnalyze() functions return a proxy to the decomposition which would allow writing something like:

x = llt.sparseAnalyzeAndRun(100)->solve(b);

// equivalent to

llt.sparseAnalyzeAndRun(100, [&]{x=llt.solve(b);});

or

{

auto llt_ = llt.sparseAnalyzeAndRun(100);

x = llt_-> solve(b);

y = llt_-> matrixL() * c;

} // destructor of llt_ calls `endAnalyze`

The naming of this method is still debatable, of course.

And I have no idea what actually happens inside MKL when it 'analyzes' an operation (after how many iterations do you actually benefit from the overhead of analyzing the operation?)

Christoph

Cheers,

-Edward

On 4/5/2018 8:01 AM, Gael Guennebaud wrote:Thank you for opening this discussion on the public mailing list.

So let's discuss about the public API, which currently is not very convenient as already noticed by others. Issues are:

(i1) - Storing MKL's handle in SparseMatrix breaks ABI and does not sounds very generic.

- We need a way to control:

(i2) - which operations are going to be analyzed/optimized,

(i3) - and specify the 'expected_calls' parameter.

In order to discuss these issues, let's consider the following typical pattern: (e.g., non-linear optimization, eigenvalues, ...)

SimplicialLDLT<SparseMatrix<double> > llt(A);

while(...) {

...

x = llt.solve(b);

....

}

Here the triangular L factor is going to be used for triangular and transposed-triangular solves dozens to hundreds of time but only the user of SimplicialLDLT knowns that, not SimplicialLDLT, nor SparseMatrix. Moreover, the user does not own the SparseMatrix that we want to analyze/optimize for. Other patterns are likely easier to handle, so let's focus on it for now.

Regarding (i1), I would suggest to introduce a new type, say MklSparseMatrix<> that would enhance SparseMatrix<> through inheritance. Then for (i2) and (i3) we could imagine something like:

MklSparseMatrix::beginAnalysis(Index expected_calls) const {

// turn *this to compressed mode

// create handle

// store expected_calls

// enable recording mode

}

MklSparseMatrix::endAnalysis() const {

// disable recording mode

// [optional] call mkl_sparse_optimize

}

All states in MklSparseMatrix would be mutable.

Between a pair of beginAnalysis/endAnalysis each call to a supported operation would trigger calls to mkl_sparse_set_*_hint()/mkl_sparse_optimize..

Optionally, we could even add a "dryrun" mode for which no operation would be performed, only calls to mkl_sparse_set_*_hint() and then mkl_sparse_optimize would be called in endAnalysis(). This way mkl_sparse_optimize() would be called only once.

And that's it. Our example would look-like:

SimplicialLDLT<MklSparseMatrix<double> > llt(A);

int it=0;

while(...) {

...

if(it==0) llt.matrixL().beginAnalysis(100);

x = llt.solve(b);

if(it==0) llt.matrixL().endAnalysis();

....

++it;

}

or using a "dry-run" mode:

SimplicialLDLT<MklSparseMatrix<double> > llt(A);

llt.matrixL().beginAnalysis(100, DryRun);

x = llt.solve(b); // permutation and division by the diagonal matrix D would still be performed, but calls to actual triangular solves would be by-passed

llt.matrixL().endAnalysis();

while(...) {

...

x = llt.solve(b);

....

}

If someone directly deal with the factor L, then we could follow the same pattern or copy the SparseMatrix factor L to a MklSparseMatrix:

SimplicialLLT<SparseMatrix<double> > llt(A);

MklSparseMatrix L(llt.matrixL());

L.beginAnalysis(100,DryRun);

y = L.triangularView<Lower>() * x;

L.endAnalysis();

while(...) {

...

y = L.triangularView<Lower>() * x;

....

}

This design in quite general and expendable to any sparse-optimizers, even built-in ones in the future.

In contrast to the current proposal, only selected operations would be passed to MKL (need to use a MklSparseMatrix + begin/end recording phase).

What do you think?

gael

On Tue, Apr 3, 2018 at 11:39 PM, Zhukova, Maria <maria.zhukova@xxxxxxxxx <mailto:maria.zhukova@intel.com >> wrote:

Hello Eigen community,

My name is Maria Zhukova and I’m a software development engineer at Intel ®

MKL Sparse team.

My team is interested in contributing into Eigen, so I’ve investigated our

possibilities and so far this is what I have:

Eigen support different operations for sparse matrices stored in CSR and CSC

format which can be implemented on a basis of IE SpBLAS kernels (please,

refer to

https://software.intel.com/en-us/mkl-developer-reference-c- inspector-executor-sparse- blas-routines

<https://software.intel.com/en-us/mkl-developer-reference- >c-inspector-executor-sparse- blas-routines

for the general idea of interfaces)

, basically we want to implement calls to our IE SpBLAS into next

operations:____

SparseMatrix + SparseMatrix (mkl_sparse_?_add)

SparseMatrix * DenseVector (mkl_sparse_?_mv)____

SparseMatrix * DenseMatrix (mkl_sparse_?_mm)____

SparseMatrix * SparseMatrix (mkl_sparse_spmm),

and Triangular solve (mkl_sparse_?_trsv).____

I’ve already started with implementation of sparse_time_dense_impl_mkl

kernel which is based on mkl_sparse_?_mv (included in patch).____

This is how it will look like for user:

*#include <Eigen/SpBLASSupport> *<-- *NEW:* IE SpBLAS include module ____

void main () {

SparseMatrix<double, RowMajor> A;

Matrix<double, Dynamic, 1> x, y;

A.makeCompressed(); /* Convert matrix A into CSR/CSC format */

*A.createSparseHandle();*/* *NEW*: is used to create handle required for all

IE SpBLAS routines */____

// support of IE SpBLAS is here

y = beta*y + alpha*A*x; /* call to mkl_sparse_?_mv with operation =

SPARSE_OPERATION_NON_TRANSPOSE */

y = beta*y + alpha*A.transpose()*x; /* call to mkl_sparse_?_mv with

operation = SPARSE_OPERATION_TRANSPOSE */

y = beta*y + alpha*A.adjoint()*x; /* call to mkl_sparse_?_mv with operation

= SPARSE_OPERATION_CONJUGATE_TRANSPOSE */____

*A.destroySparseHandle();* /* *NEW*: is used to delete created handle */

}____

__ __

I’ve attached a draft patch including all necessary changes and would like

to hear your feedback.

Please, let me know if you have any questions and comments.____

__ __

Best regards,

Maria____

__ __

__ __

__ __

--

Dr.-Ing. Christoph Hertzberg

Besuchsadresse der Nebengeschäftsstelle:

DFKI GmbH

Robotics Innovation Center

Robert-Hooke-Straße 5

28359 Bremen, Germany

Postadresse der Hauptgeschäftsstelle Standort Bremen:

DFKI GmbH

Robotics Innovation Center

Robert-Hooke-Straße 1

28359 Bremen, Germany

Tel.: +49 421 178 45-4021

Zentrale: +49 421 178 45-0

E-Mail: christoph.hertzberg@xxxxxxx

Weitere Informationen: http://www.dfki.de/robotik

----------------------------------------------------------- ------------

Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH

Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern

Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster

(Vorsitzender) Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313

Sitz der Gesellschaft: Kaiserslautern (HRB 2313)

USt-Id.Nr.: DE 148646973

Steuernummer: 19/672/50006

----------------------------------------------------------- ------------

**Follow-Ups**:**Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen***From:*Gael Guennebaud

**References**:**[eigen] Intel (R) MKL IE SpBLAS support in Eigen***From:*Zhukova, Maria

**Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen***From:*Gael Guennebaud

**Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen***From:*Edward Lam

**Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen***From:*Christoph Hertzberg

**Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen***From:*Gael Guennebaud

**RE: [eigen] Intel (R) MKL IE SpBLAS support in Eigen***From:*Zhukova, Maria

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen** - Next by Date:
**Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen** - Previous by thread:
**Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen** - Next by thread:
**Re: [eigen] Intel (R) MKL IE SpBLAS support in Eigen**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |