Re: [eigen] Sparse Arrays for Eigen? |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen <eigen@xxxxxxxxxxxxxxxxxxx>*Subject*: Re: [eigen] Sparse Arrays for Eigen?*From*: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>*Date*: Wed, 20 Jan 2016 00:29:28 +0100*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=tP3XykYSqgAYFLFuxBrlGFmK1YoMfLX1iSZEvbz3TwU=; b=bRYyC/YfBbY94s0s0g/xb5/lXz4BHCmk0ZaoLysLFO1vrzK4ytqc7zFOa5CCQcOZWo tu7aapyReLvLNdrvAmeJEp6ZnpFv1Bn59UXbMTybnsWlkHa5WcILmxUXiPZ9/L4TAynQ uuAeKoZVQNUKcKkZEGOeTXXgbRkCZB5rjlxhbK1rU4Zq8HjxiSvo6SmwOfKme2KeD7q4 xmiG/z56tIU/yrf8OBn+c8Y+4sw8/Dan2Ev1tIuttHgBP98hRDvgIgYrc36QtGQYHeP/ XDCYOdcRPc4/5bhP9u8LDadTfu30cQZEjug4nYiUjO6ki7ZpeQKblRInZMzbexf7GTHF I12w==

Hi,

sorry for late reply, I’ve been 100% busy these past days.

Let me try to clarify some aspects.

Thank you for your informative reply. It looks like maybe the Eigen code is ahead of its documentation, in terms of capability.

Indeed, the documentation on sparse matrices is a bit "sparse” because it somehow assumes that the user is familiar with the dense part of Eigen. The documentation on sparse matrices thus focuses more the specificities. Nonetheless, the following page attempts to give a compact overview of the API:

http://eigen.tuxfamily.org/dox-devel/group__SparseQuickRefPage.html

and does include example for products.

> If you have two index arrays and a value array, you can directly use them in an Eigen::SparseView

Where? It's not here:

http://eigen.tuxfamily.org/dox-devel/namespaceEigen.html

Better use the embedded search engine to find methods and classes from their name.

Anyway, actually SparseView is rather to copy a dense _expression_ to a sparse matrix by pruning zeros (or small entries).. For triplets defined through three different containers, you can glue them through a unified iterator and pass it to SparseMatrix::setFromTriplets.

> Eigen also supports, CSR, CSC and triplet format (stored as three vectors).

> A single vector of triplets can easily be converted by Eigen. Also, most algorithms

> expect CSR or CSC inputs.

That's really nice functionality. But again... where is it? The following only mentions CSR and CSC:

http://eigen.tuxfamily.org/dox-devel/group__TutorialSparse.html

Had I been able to find this information on the website in a usable way, I probably wouldn't have written SpSparse. The general tensor stuff is nice, but it's not make-or-break for my current application.

Actually, internally we only support CSR and CSC because for any linear algebra algorithms you have to be able to quickly iterate over the non-zeros on a given column., and in many cases you even need that the entries are sorted.

Nonetheless, you can easily assemble a SparseMatrix (CSR or CSC) from a triplet list.

> 7. Looking at Eigen's sparse matrix multiplication source code, my guess is

> that SpSparse does this faster. SpSparse multiples sparse matrices by

> sorting LHS row-major and RHS column-major, and then multiplying each row

> by each column. Each row-column multiplication is accomplished with a

> "join" iterator that scans through both sides in order, finding places

> where indices match. This is a direct analog to dense matrix

> multiplication. SpSparse multiplication would be faster because it has

> less nested looping. I realize that the required sorting might not be

> desirable or possible for all use cases.

> I'm not sure if that is really faster (except for special cases) -- I assume you'll end up with much more branching.

>

> Actually, I once thought the same when calculating a J^T*J product, but it turned out that transposing J and and doing a classical sparse multiplication was faster (I did not check your implementation, so maybe I introduced some overhead in my implementation, back then).

SparseMatrix<double,ColMajor> B;

for(i,j …)

res(i,j) = A.row(i).dot(B.col(j);

The “dot” is itself implemented as:

x.dot(y) <-> x.cwiseProduct(y).sum();

> That might be faster in some cases (i.e., depending on the structure of A, B), though I wouldn't trust my gut-feeling on that either.

> But since Eigen has no optimization for sparse diagonal matrices, this is not unlikely to be faster with SpSparse.

> Ok, I guess essentially what is missing in Eigen and what SpSparse provides are SparseTensor i.e. multidimensional sparse Arrays.

> What could be added (relatively) easy are element-wise products for sparse matrices -- if that is in fact needed.

> For performance improvements, I'd like to see benchmark results first.

>

> Actually, I once thought the same when calculating a J^T*J product, but it turned out that transposing J and and doing a classical sparse multiplication was faster (I did not check your implementation, so maybe I introduced some overhead in my implementation, back then).

I confirm that implementing sparse matrix products in terms of “dot product” is extremely slow compared to the two algorithms we have in Eigen. I’ve implemented both, and at that time I’ve also played with the “dot product” strategy. There are two problems with this later approach:

1 - unless you know that the result is 100% dense (which is rather rare), you need to find a way to predict the sparsity pattern of the result. Otherwise you start with a O(n^2) complexity which is usually already much larger than the theoretical complexity: O(n*nnz_per_col^2)

2 - for each dot product you will iterate over numerous entries with no correspondances, thus wasting time.

In Eigen, implementing this “dot-product” version on SparseMatrix boils down to:

SparseMatrix<double,RowMajor> A;SparseMatrix<double,ColMajor> B;

for(i,j …)

res(i,j) = A.row(i).dot(B.col(j);

The “dot” is itself implemented as:

x.dot(y) <-> x.cwiseProduct(y).sum();

where cwiseProduct essentially performs a “joint’ by constructing an iterator on the structural non-zero, and sum() performs the reduction of this iterator. All of this without any temporary.

Regarding nested loops, you might be surprised that for dense matrix products, Eigen’s implementation exhibits up to 7 nested loops and thousands of line codes, but it is orders of magnitude faster than a simple 4-lines-of-code 3-loops implementation for large matrices (16x16 is already “large”).

> SpSparse join iterators can also be used to do many operations at once. I

> was able to make a single-pass matrix multiplication that computes:

> C * diag(S1) * A * diag(S2) * B * diag(S3)

> ...where C is a scalar, S1,S2,S3 are (optional) spsparse::Array<..., 1> and

> A and B are spsparse::Array<..., 2>. If this is the kind of product you

> need to compute, I doubt there is a faster way to do it.

> That might be faster in some cases (i.e., depending on the structure of A, B), though I wouldn't trust my gut-feeling on that either.

> But since Eigen has no optimization for sparse diagonal matrices, this is not unlikely to be faster with SpSparse.

Actually, diag(S) * SparseMatrix with S a dense vector, is well implemented in Eigen: it returns a sparse _expression_ which can be used in products.

However, I doubt that implementing A*B*C (regardless of the scalings) as a single loop is really faster. If A, B, C are dense that’s clearly not the case as the complexity is O(N^4) compared to O(2*N^3) for:

T1 = A*B;

C*T1;

and I don’t see why this would be different when A, B, C are sparse.

When it comes to matrix products, temporaries are not so evil ! This is typically the kind of shortcomings we found in other _expression_ template libs as blitz++ that motivated us to start Eigen from scratch.

> For all these reasons, I think that SpSparse might make a nice addition to

> Eigen, probably renamed to Eigen::SparseTensor or something.

> Ok, I guess essentially what is missing in Eigen and what SpSparse provides are SparseTensor i.e. multidimensional sparse Arrays.

> What could be added (relatively) easy are element-wise products for sparse matrices -- if that is in fact needed.

> For performance improvements, I'd like to see benchmark results first.

It would indeed be very interesting to see if a simple list of coordinates+values can be competitive for some usages as it is considerably simpler to go to arbitrary array dimensions with such a representation (though in terms of memory consumption this is far to be optimal).

cheers,

Gaël

On Mon, Jan 18, 2016 at 7:44 PM, Elizabeth Fischer <rpf2116@xxxxxxxxxxxx> wrote:

Christoph,Thank you for your informative reply. It looks like maybe the Eigen code is ahead of its documentation, in terms of capability. For example:> You can directly multiply SparseMatrix by SparseVector in EigenI see no examples of this at:> If you have two index arrays and a value array, you can directly use them in an Eigen::SparseViewWhere? It's not here:> Eigen also supports, CSR, CSC and triplet format (stored as three vectors).> A single vector of triplets can easily be converted by Eigen. Also, most algorithms> expect CSR or CSC inputs.That's really nice functionality. But again... where is it? The following only mentions CSR and CSC:Had I been able to find this information on the website in a usable way, I probably wouldn't have written SpSparse. The general tensor stuff is nice, but it's not make-or-break for my current application.> If I understand you correctly, spsparse::Array relates to Eigen::SparseMatrix as> Eigen::Tensor to Eigen::Matrix, right?Yes.> Ok, by tensor multiplication you mean element-wise multiplication or> something like

> Res(i,j,k) = SUM_l A(i,j,l)*B(i,l,k); (or other variants)?The latter.> If I understand you correctly, your library supports both?It's a toolbox that can be used to easily write multiplication-like operations easily. What I needed was:C * diag(S1) * A * diag(S2) * B * diag(S3)... where S1, S2 and S3 are optional. I also needed:C * diag(scalei) * A * diag(scalej) * V # V is a vectorSo I built those two multiplies. Maybe someone could be more clever to write a general n-dimensional tensor multiply template.-- ElizabethOn Mon, Jan 18, 2016 at 12:20 PM, Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:Hi Elizabeth,

sorry for not having answered yet. I try to briefly answer/comment your questions/suggestions below.

On 2016-01-14 14:15, Elizabeth Fischer wrote:

1. I would think of spsparse::Array vs. Eigen::Sparse in the same vein as

Eigen::Tensor vs. Eigen::Matrix. spsparse::Array is desirable for the same

reason as Eigen::Matrix.

If I understand you correctly, spsparse::Array relates to Eigen::SparseMatrix as Eigen::Tensor to Eigen::Matrix, right?

2. Internally, spsparse::Array is (currently) a coordinate tensor: i.e.,

internally it has one array per index, plus an array for values. If

nothing else, this is a convenient way to assemble the required triplets

that can then be used as a constructor for Eigen::Matrix or Eigen::Vector.

SpSparse is also good at consolidating duplicates in your original list.

If you have two index arrays and a value array, you can directly use them in an Eigen::SparseView and Eigen can already convert that to CSC/CSR or to dense matrices. Of course, this only works for two dimensions at the moment..

3. Algorithms are written on SpSparse arrays by iterating through the

elements. Algorithms are implemented on multiple arrays together by

sorting and joining the elements. This is efficient for many operations,

for example sparse-sparse tensor multiplication. You cannot directly ask a

SpSparse array for the (i,j) element.

Ok, by tensor multiplication you mean element-wise multiplication or something like

Res(i,j,k) = SUM_l A(i,j,l)*B(i,l,k); (or other variants)?

If I understand you correctly, your library supports both?

4. SpSparse makes no effort to offer genera linear algebra. Algorithms are

focused on tensor representation, transformation and multiplication.

Believe it or not, my application requires lots of sparse tensors and

tensor multiplication, and no "more sophisticated" linear algebra

algorithms. I think the reasonable thing to do would be to convert a

SpSparse array to an Eigen::Matrix if one wishes to do more linear algebra

on it.

I guess, it should rather be converted an Eigen::SparseMatrix?

5. SpSparse has the capability of supporting multiple storage formats

underneath: variable-length std::vector, fixed-length through blitz::Array

(would be ported to Eigen::Tensor). Also a single array of triplets (or

their rank-n analog) is also a possible format. Storage format is

abstracted away by the iterator interface, allowing algorithms to be

written for these multiple use cases. The Eigen::Tensor underlying is

particularly useful because you can make a spsparse::Array out of existing

blocks of memory (for example, stuff you read out of a netCDF file).

[Disclaimer: I've only implemented one underlying representation.. The

design is for multiple, and a prototype library did just that. Adding more

storage formats will be easy when desired.]

6. CSR and CSC formats are also possible as storage formats.

Eigen also supports, CSR, CSC and triplet format (stored as three vectors). A single vector of triplets can easily be converted by Eigen. Also, most algorithms expect CSR or CSC inputs.

7. Looking at Eigen's sparse matrix multiplication source code, my guess is

that SpSparse does this faster. SpSparse multiples sparse matrices by

sorting LHS row-major and RHS column-major, and then multiplying each row

by each column. Each row-column multiplication is accomplished with a

"join" iterator that scans through both sides in order, finding places

where indices match. This is a direct analog to dense matrix

multiplication. SpSparse multiplication would be faster because it has

less nested looping. I realize that the required sorting might not be

desirable or possible for all use cases.

I'm not sure if that is really faster (except for special cases) -- I assume you'll end up with much more branching.

Actually, I once thought the same when calculating a J^T*J product, but it turned out that transposing J and and doing a classical sparse multiplication was faster (I did not check your implementation, so maybe I introduced some overhead in my implementation, back then).

SpSparse join iterators can also be used to do many operations at once. I

was able to make a single-pass matrix multiplication that computes:

C * diag(S1) * A * diag(S2) * B * diag(S3)

....where C is a scalar, S1,S2,S3 are (optional) spsparse::Array<..., 1> and

A and B are spsparse::Array<..., 2>. If this is the kind of product you

need to compute, I doubt there is a faster way to do it.

That might be faster in some cases (i.e., depending on the structure of A, B), though I wouldn't trust my gut-feeling on that either.

But since Eigen has no optimization for sparse diagonal matrices, this is not unlikely to be faster with SpSparse.

7. SpSparse can also multiply sparse matrices (rank 2 tensors) by sparse

vectors (rank 1 tensors) --- something I needed that Eigen did not provide

in a direct way. More complex tensor multiplication would also be possible.

You can directly multiply SparseMatrix by SparseVector in Eigen

8. I've found that rank-1 SpSpare arrays are useful for many things I never

thought of before.

There are SparseVectors in Eigen, if you need them.

For all these reasons, I think that SpSparse might make a nice addition to

Eigen, probably renamed to Eigen::SparseTensor or something.

Ok, I guess essentially what is missing in Eigen and what SpSparse provides are SparseTensor i.e. multidimensional sparse Arrays.

What could be added (relatively) easy are element-wise products for sparse matrices -- if that is in fact needed.

For performance improvements, I'd like to see benchmark results first.

Regards,

Christoph

--

Dipl. Inf., Dipl. Math. Christoph Hertzberg

Universität Bremen

FB 3 - Mathematik und Informatik

AG Robotik

Robert-Hooke-Straße 1

28359 Bremen, Germany

Zentrale: +49 421 178 45-6611

Besuchsadresse der Nebengeschäftsstelle:

Robert-Hooke-Straße 5

28359 Bremen, Germany

Tel.: +49 421 178 45-4021

Empfang: +49 421 178 45-6600

Fax: +49 421 178 45-4150

E-Mail: chtz@xxxxxxxxxxxxxxxxxxxxxxxx

Weitere Informationen: http://www.informatik.uni-bremen.de/robotik

**Follow-Ups**:**Re: [eigen] Sparse Arrays for Eigen?***From:*Elizabeth Fischer

**References**:**[eigen] Sparse Arrays for Eigen?***From:*Elizabeth Fischer

**Re: [eigen] Sparse Arrays for Eigen?***From:*Christoph Hertzberg

**Re: [eigen] Sparse Arrays for Eigen?***From:*Elizabeth Fischer

**Re: [eigen] Sparse Arrays for Eigen?***From:*Christoph Hertzberg

**Re: [eigen] Sparse Arrays for Eigen?***From:*Elizabeth Fischer

**Messages sorted by:**[ date | thread ]- Prev by Date:
**[eigen] Re: [Blitz-devel] Fork Blitz++?** - Next by Date:
**Re: [eigen] Sparse Arrays for Eigen?** - Previous by thread:
**Re: [eigen] Sparse Arrays for Eigen?** - Next by thread:
**Re: [eigen] Sparse Arrays for Eigen?**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |