Re: [eigen] triangular solve

On Sat, Dec 22, 2012 at 2:39 AM, Christoph Hertzberg <chtz@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:

On 21.12.2012 23:31, Gael Guennebaud wrote:

Since your matrices are quite large, the cost of this temporary is
completely negligible, you can still avoid one by doing:

MatrixXf tmp(this->matrixL());
tmp = total_cov.matrixL().solve(tmp); // in-place solving
float denom = tmp.squaredNorm();

Higher performance could be achieved by taking into account that tmp is
triangular and so total_cov.matrixL().solve(tmp) is triangular too.
However, Eigen is not able to do so yet, and I'm not aware of any library
being able to do so.

You can try something like this (pseudocode/not tested):

MatrixXf tmp(this->matrixL());
for([i0, i1]; /* seperate [0..n] into intervals */) {
total_cov.matrixL().bottomRightCorner(n-i1, n-i1).solveInPlace(tmp.block(i0, i0, n-i0, i1-i0));
}

That's indeed quite compact to write.

maybe somewhere a .triangularView() has to be inserted. If you go through the matrix column by column, you can even save the copy.
The question would be how that impacts vectorization

The intervals must be large enough to profit from cache reuse, like 100 to 200 columns at once. No problem regarding vectorization.

-- I guess we need better/"smarter" support for triangular matrices, eventually. E.g. saving big triangular matrices wastes a lot of space at the moment.

That's a old question! Our initial idea was to implement the rectangular full packed format which permits to directly reuse Level3 blas routines (http://netlib.org/lapack/lapack-3.2.html#_9_6_rectangular_full_packed_format). But now I'm more in a favor of extending our product kernels to allow for a compile time outer stride increment so that efficient product kernels can be seamlessly applied to the more simple triangular packed format (the non-zeros are sequentially stored, one column after the other).

Another nice feature would be things such as multiplyInPlace(), inverse()/invertInPlace() (for small matrices it can be worthwile to invert them once and multiply, thus saving high-cost divisions).

Yes a triangular inversion routine is also missing.

too much to do!

gael

Christoph

--
----------------------------------------------
Dipl.-Inf. Christoph Hertzberg
Cartesium 0.049
Universität Bremen
Enrique-Schmidt-Straße 5
28359 Bremen

Tel: +49 (421) 218-64252
----------------------------------------------