Re: [eigen] triangular solve
• To: eigen <eigen@xxxxxxxxxxxxxxxxxxx>
• Subject: Re: [eigen] triangular solve
• From: Gael Guennebaud <gael.guennebaud@xxxxxxxxx>
• Date: Sat, 22 Dec 2012 11:02:04 +0100
• Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=s5HFbkMpPS1UlINYT+gei4/qgxoHtctCBd8aRIo4Uys=; b=J/4hWqJuNOy4ZMXcg8GTRuXN2k5+5ueoK80z635CaImAgtyekyZoVe2e45F3knylof 3fNEliG14SDLD6JzX9RytOfAxpYZA0BD97Hht2o/fQtN1si/bAIbhFrBs5d31Bl9PJRX enFd5kHbg3Tk+YH2DlMNeXTFs8vfmHsLCFU2HqRpxjjvOJYbpwPpsKOkZXB6k3diGJ+Z b6i4QXWnHkEh3o2xz5n7JqjYZpBbmwIjWmROJzxu59SsZzg4AG3RK79IvE39HF+X+12z xisG9DfHy7HPY0yHukVD2Cyd+J2mY5jvXVI+eV5011X9sJEMZttPDAn2PnT1s3orh5XW xFNw==

On Sat, Dec 22, 2012 at 2:39 AM, Christoph Hertzberg wrote:
On 21.12.2012 23:31, Gael Guennebaud wrote:
Since your matrices are quite large, the cost of this temporary is
completely negligible, you can still avoid one by doing:

MatrixXf tmp(this->matrixL());
tmp = total_cov.matrixL().solve(tmp); // in-place solving
float denom = tmp.squaredNorm();

Higher performance could be achieved by taking into account that tmp is
triangular and so total_cov.matrixL().solve(tmp) is triangular too.
However, Eigen is not able to do so yet, and I'm not aware of any library
being able to do so.

You can try something like this (pseudocode/not tested):

MatrixXf tmp(this->matrixL());
for([i0, i1]; /* seperate [0..n] into intervals */) {
total_cov.matrixL().bottomRightCorner(n-i1, n-i1).solveInPlace(tmp.block(i0, i0, n-i0, i1-i0));
}

That's indeed quite compact to write.

maybe somewhere a .triangularView() has to be inserted. If you go through the matrix column by column, you can even save the copy.
The question would be how that impacts vectorization

The intervals must be large enough to profit from cache reuse, like 100 to 200 columns at once. No problem regarding vectorization.

-- I guess we need better/"smarter" support for triangular matrices, eventually. E.g. saving big triangular matrices wastes a lot of space at the moment.

That's a old question! Our initial idea was to implement the rectangular full packed format which permits to directly reuse Level3 blas routines (http://netlib.org/lapack/lapack-3.2.html#_9_6_rectangular_full_packed_format). But now I'm more in a favor of extending our product kernels to allow for a compile time outer stride increment so that efficient product kernels can be seamlessly applied to the more simple triangular packed format (the non-zeros are sequentially stored, one column after the other).

Another nice feature would be things such as multiplyInPlace(), inverse()/invertInPlace() (for small matrices it can be worthwile to invert them once and multiply, thus saving high-cost divisions).

Yes a triangular inversion routine is also missing.

too much to do!

gael

Christoph

--
----------------------------------------------
Dipl.-Inf. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: +49 (421) 218-64252
----------------------------------------------

 Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/