Re: [eigen] first benchmark of large and sparse matrix |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] first benchmark of large and sparse matrix
- From: "Gael Guennebaud" <gael.guennebaud@xxxxxxxxx>
- Date: Tue, 24 Jun 2008 16:29:06 +0200
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=BpGdoSdz7JBjGva9k9IwA4tCzlH+zFvMCB+1Ryis2Cg=; b=JL8tl0LtsOlavzbwomgj/qPjjkzKmqxwr7nhadF8wZYvtW1Fg5selu/MkR63gvJV4X k0Ef8HKoPYjVpk8kcBPqsf6Xy4BUFsL6h3Y8qbaDfQ9WwQdCQV6oFel1d4qSaq0idvdQ BgLmMN1KvX5uvhltmiovJSD+P2sRVNydPIaVY=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=feLI0aQQfWUqu8VniNBS6lVyxZsfBRAe7tUXXD1+meJ5sR9HXZmq3hnFmtZKbcl28x gQlDlXVWxg2dXR23sxQxG9gXscHOQdrgOJNs7DxH+bCPPvSFDSwcec6HMN2L2C/sLSs4 SacpZGx/qyx4bCs5FkdLNUzzzJUzrdZWu+JYY=
note to the list: I added a dedicated page on the Wiki for the the
sparse matrix. What I wrote is only random thoughts and proposals.
You're more than welcome to edit them, propose other solutions and
let's discuss them here.
On Tue, Jun 24, 2008 at 3:53 PM, Benoît Jacob <jacob@xxxxxxxxxxxxxxx> wrote:
> I don't know what to say, this is just plainly awesome :)
actually I did not realized that the product of two not too sparse
matrices can be pretty dense. In that case the dense temporary works
the best. On the hand if the result is sparse (e.g., number of
nonzeros < 4% for 10Kx10K matrices) looping over the zero of the
temporary become really expensive so that you really need something
spars. After some tries to exploit the coherence to speedup std::map I
gave up I tried with custom linked list which appears to be an order
of magnitude faster. Now the best we can do is to estimate the kind of
temporary (dense vector or linked list) per column of the result:
indeed we can know the number of nonzeros of the corresponding column
of rhs (lets call is nnzRC) and if we assume the nonzero of lhs are
uniformly spread, then we can estimate the ratio of nonzeros in the
result column like this:
ratio = nnzRC * ratioLHS
where ratioLHS = lhs.nonzeros()/(lhs.cols() * lhs.rows())
> Updating the list: yesterday we vectorized sum(), giving the best
> vectorization boost we got so far (for floats, it goes 4x faster, which is
> the theoretical maximum). Today I am transposing this code to dot-product and
> I expect similar results.
actually there is room for some factorization of the code since we now
have the same vectorized loops at several places:
- normal product (row major * col major)
- dot product
- redux
- sum
so sum could call redux, dot product could call
(a.conjugate().cwiseProduct(b)).sum(); and the product could call
a.row(row).cwiseProduct(b.col(col))).sum();
what do you think ?
>
> Cheers,
>
> Benoit
>
> On Tuesday 24 June 2008 01:06:55 Gael Guennebaud wrote:
>> Hi,
>>
>> some news from the front. First of all I implemented HashMatrix which
>> is essentially a column vector of std::map<>. The purpose is to allow
>> random writes which at the first place sounds to be needed to
>> implement matrix-:matrix products. First result: it is about 4 times
>> slower to fill a HashMatrix than filling a SparseMatrix in a coherent
>> order.
>>
>> Then I implemented a <sparse column matrix> * <sparse column matrix>
>> product. The algorithm to compute res = lhs * rhs is pretty simple:
>>
>> std::vector tmp<res.cols()>;
>> for (int j=0; j<rhs.cols(); ++j)
>> {
>> tmp = 0.0;
>> // for each coeff of the column j of rhs
>> for (Rhs::InnerIterator rhsIt(rhs, j); rhsIt; ++rhsIt)
>> {
>> // for each coeff of the column rhsIt.index() of lhs
>> for (Lhs::InnerIterator lhsIt(lhs, rhsIt.index()); lhsIt; ++lhsIt)
>> tmp[lhsIt.index()] += lhsIt.value() * rhsIt.value();
>> }
>> // copy skipping the zero entries:
>> res.col(j) = tmp;
>> }
>>
>> as you can see it's pretty simple and and the trick is to allocate a
>> single dense column to avoid many writes to a sparse array. The memory
>> cost of this temporary should not be an issue since the SparseMatrix
>> already stores a dense column vector to index the start of each
>> column. Moreover this algorithm still fill the result sparse matrix in
>> a coherent order meaning that so far there is no real need for a
>> HashMatrix with random write access.
>>
>> The results are pretty good:
>>
>> 2048^3, 10% of nonzero entries:
>>
>> Eigen dense = 1.92237
>> Eigen sparse = 0.399866
>> Eigen hash = 1.30217
>> GMM++ sparse = 15.973
>>
>>
>> 2048^3, 1% of nonzero entries:
>>
>> Eigen dense = 1.86719
>> Eigen sparse = 0.0492229
>> Eigen hash = 0.186288
>> GMM++ sparse = 0.24993
>>
>> FYI, the same algorihm without the dense temporary and using a
>> HashMatrix for the result is much much slower (~ one order of
>> magnitude).
>>
>> cheers,
>> gael.
>>
>
>
>