Re: [eigen] Skyline matrix

On Mon, Nov 16, 2009 at 11:06 AM, guillaume saupin <guillaume.saupin@xxxxxx> wrote:

Gael Guennebaud wrote:

On Thu, Nov 12, 2009 at 1:04 PM, guillaume saupin <guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>> wrote:

Gael Guennebaud wrote:

Ok, I've put it in the unsupported/ folder. To use it you just
have add path/to/eigen2/unsupported to your set of include paths.

Since I guess you will certainely want to do some
modifications in the future, I recommend you to create an
account on bitbucket, and I will give you write access. Feel
free to commit any changes you like in this module, but if
your changes affect some other modules, please let's first
dicuss them here !

I've just created my account on bitbucket. My id is simply
guillaume.saupin.

Now I have to admit there are still a lot of work to do before
having something really usable to offer to other users. In my
opinion, the SkylineMatrix class should be taillored to be
optimal for the two following tasks:
1 - setup/assemble the matrix
2 - build efficient high-level algorithms

Both tasks require a powerful API taking into account the
specificities of the skyline storage. In particular this
discard low level coeff(i,j) methods or things like that.
Iterators are not very well suited here because 1) you need
various kind of iterators (upper, lower), and 2) they cannot
take advantage of the sequential storage to enable unrolling
and vectorization.

So, for the matrix assembly, the challenge is to reduce the
need for memory reallocation and memory copies. Here we first
need a very low level API allowing to fill the matrix with
constant time insertions.

One possibility is to let the user specify the profile for each
rows and cols, and allocate the memory accordingly. Then the
insertion can be done in constant time.

Then to offer more flexibility to the user, perhaps the
simpler is to let the user fill a {Dynamic}SparseMatrix, and
then implement a fast conversion method. The latter is a must
to have feature anyway.

If we provide an easy and fast API to fill the matrix, I'm not
sure that it will be very useful to create a SkylineMatrix from a
SparseMatrix.

In order to easily implement efficient algorithms (products
and solvers), the SkylineMatrix should provide higher level
access functions than the coeff() function or the iterators.
In particular I'm thinking about the following set of functions:

diagonal(); // return a vector _expression_ of the diagonal
lowerCol(j); // returns an _expression_ of the j-th column of
the strictly lower triangular part
upperRow(i);
centeredBlock(i, rows, cols); // returns an _expression_ of the
rows x cols submatrix at the coeff (i, i)

The first two functions would return a special _expression_
wrapping a dense VectorBlock plus a start index and a length
with optimized arithmetic operators. Such an _expression_, let's
call it a RangeVector could also be useful for the Band storage.

This is a good idea. What can I use as a starting point to
implement those VectorBlocks ? Should it inherit from an already
existing class ?

For diagonal(), it is easy. Just use a Matrix<Scalar,Size,1> to store it (instead of a raw pointer), and let it return a ref to it.

For the lowerCol(), upperRow() we have to create a new matrix type, well actually a vector type inheriting AnyMatrixBase<> with support for basics operations. However, I think it is safer to wait a bit that I've done the change we discussed on a recent thread about some refactoring of the _expression_ template mechanism. Once this change done, it should be much simpler to add new special matrices. Hopefully.

Ok. No problem. I'll wait.

A quick question: I did not read your source files carefully, but I thought that this SkylineMatrix stores the lower triangular using column-major vectors, and the upper part using row-major vectors. Right ? Then what is the purpose of the StorageOrder option ?

This is indeed useless. In fact, I have initially implemented a support for column-major vectors for the lower part, and row-major vectors for the upper part, as well as row-major vectors for the lower part, and column-major vectors for the upper part, but I realize that it was useless since our algos perform better on the first storage type.

Then writting a triangular solver, a LU decomposition, a Cholesky
decomposition, etc. would be almost trivial. Actually, if we add
the concept of matrix partitionner as I described in another
thread, one could write the exact same code for both a dense or a
skyline matrix !

I'm also thinking whether it could make sense to make the
"bandwidth" of the matrix a template parameter. The default would
be "Dynamic". This would allow to optimize the code for matrices
having a small fixed size bandwidth.

This could be a good idea. What kind of optimization can be done
on such SkylineMatrices ?

Mainly loop unrolling of the most inner loops. To make that possible, the "VectorRange" discussed above should be able to deal with compile time ranges.

Does it require memory alignment ?

Only to enable vectorization. For instance, for double vectorization will be automatically enabled if the compile time "bandwidth" is a multiple of 2 (and 4 for floats).

gael.

gael.

Thanks,

guillaume

gael.

On Mon, Nov 9, 2009 at 4:30 PM, guillaume saupin
<guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>> wrote:

Hi !

Here comes the cleaned code for the skyline matrix. If you
could
integrate it inside Eigen, it would be perfect.

thanks,

guillaume

guillaume saupin wrote:

Gael Guennebaud wrote:

On Thu, Nov 5, 2009 at 2:40 PM, guillaume saupin
<guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>> wrote:

Gael Guennebaud wrote:

Hi,

thanks for the patch !

I'd be ok to have it in Eigen. In a first
step I would
probably put it in the unsupported/ set of
modules
because it
sems quite preliminary for now.

No problem. This is perfect if we can access this
Matrix from
eigen. I've written some unit tests with boost.
Where
should I put
them ?

However before we can apply the patch some
cleaning is
required: at the very least the copyrights
have to
be updated.
It would also be nice to remove all the
commented
stuff.

Then we can discuss about the usefulness of
having a
SkylineMatrixBase. Indeed, such a base class
can be
needed for
two reasons:

The truth is that I have not really thought
about that.
I just
copy/paste the SparseMatrix code ;)

1 - To implement the _expression_ template
paradigm
for basics
operations (+, -, scaling, coefficients wise
operations like
abs, sin, cos etc.). However, here I'm not sure
that makes a
lot of sense. It seems to me that a such a
specialized matrix
is only usefull for solving problems, so
supporting
such
operators is not really needed, right ?

Our main use for SkylineMatrix is for solving
systems.
Nevertheless we will soon use at least a Skyline *
Dense product,
and probably a Dense * Skyline product. So template
expressions
may be required to do these products without
temporaries and with
a simple syntax. But it is maybe possible to do that
without a
Base class ?

Yes and no, because actually matrix products are
special
in the sense that they must be directly evaluated.
On the
other hand ET might still be useful to allow for more
complex products involving transpose, negate and
scaling ops.

I will need this kind of complex products, especially those
involving transpose, in my ColumnMatrix. Do you think
that it
must inherit MatrixBase to benefit from ET ?

The problem is that our current design does not
allow to
reuse our Transpose/CwiseUnaryOp/etc. _expression_
classes
for something else than dense matrices. So I'm more and
more convinced that our ET mechanism should be
updated to
be more flexible. But that's another topic !

2 - Another reason would be to have various
SkylineMatrix
storage variants which would be sufficiently
different to
require different classes, but close enough to
share the same
(solver) algorithms. Again, I'm not sure
that's the
case here.

Right. I don't think too that we'll introduce
another
storage.

Nevertheless, it's still fine to keep it for
now,
and wait a
bit for more use cases to see what's the
best strategy.

Thanks anyway for your advises. I'll clean the
code and
send it
back to you.

ok, cool.

gael.

gael.

On Wed, Nov 4, 2009 at 3:24 PM, guillaume saupin
<guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>>> wrote:

You'll find enclosed the code as a patch.

guillaume saupin wrote:

Hello !

I've finally (after some performances
tests)
implemented a
basic skyline matrix using the following
storage :

- an array stores the diagonal elements
- an array stores the upper elements
in a column
oriented way
- an array stores the lower elements in a
row oriented way

I've implemented a LU decomposition
for this
format. I
use the
following algorithm (with minor
modification
to handle the
sparseness of the matrix):

for each row
-- compute the lower part
for each col < row
LU(row, col) = (LU(row, col) -
sum(k
= 0 ; k < col)
[LU(row, k) * LU(k, col)] ) /
LU(row, row)
-- elements are
access contiguously / Can be easily
vectorize
-- compute the upper part
for each rrow < row
LU(rrow, row) -= sum(k = 0 ; k < rrow)
[LU(rrow, k)
* LU(k,
row)] -- elements are access
contiguously /
Can be easily
vectorize
-- compute the diag part
LU(row, row) -= sum(k = 0 ; k < row)
[LU(rrow, k) * LU(k,
row)] -- elements are access
contiguously /
Can be easily
vectorize

This algorithm is well suited for our
storage format.
After a
few tests, it seems even more
efficient to
store the
the upper
and lower elements in the same array,
with
the elements
from
one row followed by the elements of
the cols
of same index,
and so on ...

I've also implemented a product for the
Skyline * Dense
operation.

Do you think that it would be possible to
insert this code
into eigen ?

guillaume saupin wrote:

Gael Guennebaud a écrit :

On Tue, Oct 27, 2009 at 10:40 AM,
guillaume saupin
<guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>>>> wrote:

Gael Guennebaud a écrit :

this is not an easy
question
because we are
still unsure how
to manage efficiently and
with minimal
code all
these kind of
"special" matrices.

Of course an easy solution
would be to
make it
inherit
SparseMatrixBase and
let the
generic
algorithms
based on
sparse iterators do
the job.
However, this
approach won't be
optimal because the
iterators
won't take
advantage of the
specificity of a
SkylineMatrix which, e.g.,
should allow
vectorization.

Actually skyline
matrices are
very
similar to
banded matrices,
and in particular it
seems to
me that
they are
similar enough
to share the same
algorithms.
So for
instance
these two kind
of matrices could
inherit the
same
matrix base
and the
algorithms could be
implemented in a generic
way via the
concept of "range inner
vector" which is a
column (or a row)
vector with an associated
start index... We
also need
efficient basic operators
dealing with such
vector. E.g. the
addition of two such
vector
can be
efficiently
done by first
copying the non
overlapping
part (start and
end), and then sum
the overlapping part using
Eigen's core
module
feature.

The way we implemented skyline
matrix is not
really
similar to
what is usually done for
banded
matrix, as
we store
the diagonal,
the upper, and the lower
elements
in separate
arrays. This way, we
can achieve quite efficient LU
decomposition
(and
solving) due to
coherent memory accesses.

ok I see, then it is not
really what
people call
skyline storage
(http://www.netlib.org/linalg/html_templates/node96.html),
but it is more like an
extension of
the tridiagonal
storage.

I think that our storage is actually
quite similar
to the
one proposed there for non
symetric skyline
(referred as
NSK format by Saad) matrix. As in
these
format, we
store
the upper element in a
column-oriented
way, and
the lower
element in a row-oriented way.
The only difference is that we
store the
diagonal
elements
in a separate array instead of
keeping
an index
array to
these elements. This allows
contiguous
access to
diagonal
elements.

However, I don't see how your
storage can yield
to a
more efficient implementation
of the LU
decomposition
than the standard skyline
storage.
For instance
here
is a basic algorithm without
pivoting for the
standard
skyline storage:

for(int k = 0; k+1 < rows; ++k)
{
int rrows = rows-k-1;
int rsize = size-k-1;

// line 1
lu.col(k).end(rrows) /=
lu.coeff(k,k);

// line 2
lu.corner(BottomRight,rrows,rsize).noalias() -=
lu.col(k).end(rrows) *
lu.row(k).end(rsize);
}

where lu is the working matrix,
lu.col(k) is
assumed
to return a "range vector" as I
described in my
previous email.

Here line 1 would be trivially
optimized (i.e.,
vectorized) since
lu.col(k).end(rrows) is just
a small
dense vector.

Line 2 is an outer product which
again is
trivially/automatically
vectorized
sicne it is
impelmented as a sequence of:
"col_range_vector_i -=
scalar * col_range_vector".

Here the locality is pretty good
because the vector
"lu.col(k).end(rrows)" which is
reused multiple
times
is sequentially stored in memory.

But perhaps there exists a
special
algorithm which
perfectly fit your storage ?
Is it
possible to see
your code somewhere ?
Finally, if it
is really more
efficient then that would
make sense
to have it
in Eigen.

The benefit of using the NSK, i.e. of
storing the lower
matrix in a column-oriented way,
and the
upper
matrix in a
row-oriented way is that in line 1 we
access contigous
elements (the column k of the lower
matrix), and in
line 2
we also access contiguous element
(the
row elements
of the
upper matrix). I don't know if it's
efficient for
vectorization, but without, it is
quite
efficient,
as the
locality is very good.

gael.

So I don't think that our
algorithms share
common
features with
band matrix algorithms.
They seem
closer to what
can be done with
diagonal matrices.

Anyway, I've used the
SparseMatrix code as a
base
for my
SkylineMatrix code.
Currently it only
supports very
basic
operations. I'll implement
the LU
decomposition
soon. I will also
try to use vectorization.

guillaume

These are just some
initial
thoughts and the
discussion is
very open!

gael

On Mon, Oct 19, 2009
at 12:55 PM,
guillaume saupin
<guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>
<mailto:guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>
<mailto:guillaume.saupin@xxxxxx
<mailto:guillaume.saupin@xxxxxx>>>>>>> wrote:

Hello,

We are planning to use
your library
in our
projects, but we
need a
skyline matrix.
Therefore
I'd like to
implement one for
eigen, but
I don't now where
to start.

Is there a specific
class
that can be
a good
starting point /
skeleton to start
with ? The
SparseMatrixBase might be a
good choice.
Should this
SkylineMatrix
inherit from
SparseMatrixBase, or
be a
separate class ?

Thanks,

guillaume