Re: [eigen] first benchmark of large and sparse matrix
• To: eigen@xxxxxxxxxxxxxxxxxxx
• Subject: Re: [eigen] first benchmark of large and sparse matrix
• From: "Gael Guennebaud" <gael.guennebaud@xxxxxxxxx>
• Date: Tue, 24 Jun 2008 01:06:55 +0200
• Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=lquZLzT67fXy7T7CWvBKxB3azB5e1E7pkbeEbSYOMjepV22d8MgyxSde+WtleFmL0d 4vvAyV/jdX+WLOVXG93IvEuD9u2K3qWNuHzGdwfED7j/aHFFgtB2DGGi91qpGk5GU50O 3+VUe+WR4ZPfYokA7OSdzv/o9mkFF4cGa4lko=

```Hi,

some news from the front. First of all I implemented HashMatrix which
is essentially a column vector of std::map<>. The purpose is to allow
random writes which at the first place sounds to be needed to
implement matrix-:matrix products. First result: it is about 4 times
slower to fill a HashMatrix than filling a SparseMatrix in a coherent
order.

Then I implemented a  <sparse column matrix> * <sparse column matrix>
product. The algorithm to compute res = lhs * rhs is pretty simple:

std::vector tmp<res.cols()>;
for (int j=0; j<rhs.cols(); ++j)
{
tmp = 0.0;
// for each coeff of the column j of rhs
for (Rhs::InnerIterator rhsIt(rhs, j); rhsIt; ++rhsIt)
{
// for each coeff of the column rhsIt.index() of lhs
for (Lhs::InnerIterator lhsIt(lhs, rhsIt.index()); lhsIt; ++lhsIt)
tmp[lhsIt.index()] += lhsIt.value() * rhsIt.value();
}
// copy skipping the zero entries:
res.col(j) = tmp;
}

as you can see it's pretty simple and and the trick is to allocate a
single dense column to avoid many writes to a sparse array. The memory
cost of this temporary should not be an issue since the SparseMatrix
already stores a dense column vector to index the start of each
column. Moreover this algorithm still fill the result sparse matrix in
a coherent order meaning that so far there is no real need for a
HashMatrix with random write access.

The results are pretty good:

2048^3, 10% of nonzero entries:

Eigen dense = 1.92237
Eigen sparse = 0.399866
Eigen hash = 1.30217
GMM++ sparse = 15.973

2048^3, 1% of nonzero entries:

Eigen dense = 1.86719
Eigen sparse = 0.0492229
Eigen hash = 0.186288
GMM++ sparse = 0.24993

FYI, the same algorihm without the dense temporary  and using a
HashMatrix for the result is much much slower (~ one order of
magnitude).

cheers,
gael.

On Mon, Jun 23, 2008 at 9:54 PM, Gael Guennebaud
<gael.guennebaud@xxxxxxxxx> wrote:
> On Mon, Jun 23, 2008 at 7:38 PM, Schleimer, Ben <bensch128@xxxxxxxxx> wrote:
>> Hi guys,
>>   I didn't know gmm++ matrices were so slow either.
>
> well, so far I've only tested  the sum which is not a critical
> operation.  I guess they mainly focused on the optimization of the
> matrix product and triangular solver which are the main bricks to
> build linear solvers.
>
>> I wonder if we could take the gmm++ algos and use templated eigen sparse matrices with them...
>
> that was more or less the plan, but I'm not sure anymore because of
> licensing issues....
>
>>
>> btw, how are you designing sparse templated matrices?
>> Do you have any papers to point to?
>
> well, nothing is fixed yet ! you might find some info on the wiki.  So
> far,  I found that the most common storage scheme is the "Compressed
> Column Storage" strategy:
> http://www.netlib.org/linalg/html_templates/node90.html
> so I started to play with it such that we are able to use optimized
> backends for the high level algorithms. To manipulate such matrices,
> the idea is to replace the for loops (random access) by Iterators: we
> can specialize such Iterators for each type of expression, and we can
> even nest the Iterators as we nest the expressions.
> CwiseBinaryOp::InnerIterator is a good example of what I mean (in
> src/Sparse/CoreIterators.h).
>
> One of the main challenge with such a storage is to dynamically set a
> matrix.  GMM++ does not allow any write operation, so you have to use
> another kind of sparse matrix, i.e., a row of std::map and eventually
> you do a copy to build your final and efficient sparse matrix. MTL4
> uses a similar principle: they preallocate 5 entries (or more if you
> want more) per column and they manage overflows using a std::map.
> Currently I implemented something different which allows to directly
> fill such a matrix under the constraint that the coefficients must be
> set by increasing indices. I guess this is enough in most cases but we
> definitely need something more flexible, at least to be able to
> implement any kind of matrix product.
>
> of course any idea and suggestion are more than welcome !
>
> cheers,
> gael
>
>
>>
>> Cheers
>> Ben
>>
>>
>> --- On Sun, 6/22/08, Benoît Jacob <jacob@xxxxxxxxxxxxxxx> wrote:
>>
>>> From: Benoît Jacob <jacob@xxxxxxxxxxxxxxx>
>>> Subject: Re: [eigen] first benchmark of large and sparse matrix
>>> To: eigen@xxxxxxxxxxxxxxxxxxx
>>> Cc: "Gael Guennebaud" <gael.guennebaud@xxxxxxxxx>
>>> Date: Sunday, June 22, 2008, 11:05 PM
>>> Man, that is good...!!!
>>>
>>> It's normal that the sparse matrix is a bit slower than
>>> the dense one with 10%
>>> nonzero coeffs; also I suppose that the memory usage is
>>> lower. So your numbers look very good!
>>>
>>> I wasn't expecting GMM++ to perform so poorly; it
>>> sounds to me as if their
>>> sparse matrices are not intented for doing matrix
>>> just intended to apply solving algorithms; still, being
>>> able to form sums is
>>> at least useful to construct the matrices in the first
>>> place! (And I
>>> definitely personnally see use cases for full support for
>>> arithmetic on
>>> sparse matrices, as this models a mathematical object known
>>> as a 'convolution
>>> algebra' and is thus very interesting).
>>>
>>> So it is sounding like you're solving the problem of
>>> sparse matrix storage; if
>>> there exists a possibility of providing an abstraction
>>> layer for sparse
>>> algorithms supporting multiple backends (as discussed on
>>> IRC, implementing
>>> sparse algos ourselves is really too ambitious, so we want
>>> to reuse existing
>>> code for that) then we would get a very, very good
>>> Eigen-Sparse module (or
>>> auxiliary library, depending on how you see it).
>>>
>>> FYI this is of high interest for Krita and for Step and
>>> probably has countless
>>> interesting applications; a while ago Benjamin did play
>>> with some research
>>> papers on vector graphics using sparse matrices to
>>> interpolate intelligently
>>> between vector shapes, which was very interesting (still
>>> have the app he
>>> coded using gmm++ and eigen1, can send it to you)... so no
>>> need to convince
>>> me about the usefulness of all this.
>>>
>>> Cheers,
>>> Benoit
>>>
>>> P.S. I'm adding vectorization to Redux
>>>
>>> On Monday 23 June 2008 02:09:15 Gael Guennebaud wrote:
>>> > Hi,
>>> >
>>> > this WE I started looking at sparse matrices and I
>>> started to write
>>> > some experimental piece of code to see how they could
>>> be integrated
>>> > into the current base code. The current sparse matrix
>>> class implement
>>> > the "Compressed Column Storage" (CCS) scheme
>>> and  does not require any
>>> > change in Core: an InnerIterator template class is
>>> specialized for
>>> > each matrix and expression type. These iterators allow
>>> to efficiently
>>> > skip the zero coefficients of a row while being
>>> compatible with the
>>> > expression template mechanism.
>>> >
>>> > To check whether it was worth it, I benchmarked the
>>> sum of 3 matrices
>>> > of 4000x4000 with 10% of non zero coefficients
>>> (random) :
>>> >
>>> > for (int k=0; k<10; ++k)
>>> >    m4 = m1 + m2 + m3;
>>> >
>>> > the results:
>>> >
>>> > * dense matrix (with vectorization and all eigen
>>> optimizations) : 0.7sec
>>> >
>>> > * dense matrix without xpr template, i.e.:
>>> >     m4 = (m1 + m2).eval() + m3
>>> >     => 1.6 sec
>>> >
>>> > * experimental sparse matrix with generic xpr
>>> template: 1.6 sec
>>> >
>>> > * experimental sparse matrix without xpr template (but
>>> with a
>>> > specialized operator* for the sum): 2.1 sec
>>> >
>>> > * GMM++: 19sec !! (+ very high memory consumption)
>>> >
>>> > so yeah it was worth it ! Even with sparse matrix the
>>> ET is pretty
>>> > useful. Moreover the current implementation is far to
>>> be optimized
>>> > (e.g. it uses std::vector<>) so we can expect
>>> even better performance.
>>> >
>>> > I believe the poor perf of GMM++ is partly due to the
>>> use of "mapped"
>>> > sparse matrix (one std::map per row) for the result
>>> because their
>>> > implementation of CCS matrix is read-only.
>>> >
>>> > cheers,
>>> > gael.
>>
>>
>>
>
```

 Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/