Re: [eigen] Skyline matrix

[ Thread Index | Date Index | More Archives ]

On Thu, Nov 12, 2009 at 1:04 PM, guillaume saupin <guillaume.saupin@xxxxxx> wrote:
Gael Guennebaud wrote:

Ok, I've put it in the unsupported/ folder. To use it you just have add path/to/eigen2/unsupported to your set of include paths.

Since I guess you will certainely want to do some modifications in the future, I recommend you to create an account on bitbucket, and I will give you write access. Feel free to commit any changes you like in this module, but if your changes affect some other modules, please let's first dicuss them here !

I've just created my account on bitbucket. My id is simply guillaume.saupin.

Now I have to admit there are still a lot of work to do before having something really usable to offer to other users. In my opinion, the SkylineMatrix class should be taillored to be optimal for the two following tasks:
1 - setup/assemble the matrix
2 - build efficient high-level algorithms

Both tasks require a powerful API taking into account the specificities of the skyline storage. In particular this discard low level coeff(i,j) methods or things like that. Iterators are not very well suited here because 1) you need various kind of iterators (upper, lower), and 2) they cannot take advantage of the sequential storage to enable unrolling and vectorization.

So, for the matrix assembly, the challenge is to reduce the need for memory reallocation and memory copies. Here we first need a very low level API allowing to fill the matrix with constant time insertions.
One possibility is to let the user specify the profile for each rows and cols, and allocate the memory accordingly. Then the insertion can be done in constant time.

Then to offer more flexibility to the user, perhaps the simpler is to let the user fill a {Dynamic}SparseMatrix, and then implement a fast conversion method. The latter is a must to have feature anyway.

If we provide an easy and fast API to fill the matrix, I'm not sure that it will be very useful to create a SkylineMatrix from a SparseMatrix.

In order to easily implement efficient algorithms (products and solvers), the SkylineMatrix should provide higher level access functions than the coeff() function or the iterators. In particular I'm thinking about the following set of functions:

diagonal(); // return a vector _expression_ of the diagonal
lowerCol(j);    // returns an _expression_ of the j-th column of the strictly lower triangular part
centeredBlock(i, rows, cols); // returns an _expression_ of the rows x cols submatrix at the coeff (i, i)

The first two functions would return a special _expression_ wrapping a dense VectorBlock plus a start index and a length with optimized arithmetic operators. Such an _expression_, let's call it a RangeVector could also be useful for the Band storage.

This is a good idea. What can I use as a starting point to implement those VectorBlocks ? Should it inherit from an already existing class ?

For diagonal(), it is easy. Just use a Matrix<Scalar,Size,1> to store it (instead of a raw pointer), and let it return a ref to it.

For the lowerCol(), upperRow() we have to create a new matrix type, well actually a vector type inheriting AnyMatrixBase<> with support for basics operations. However, I think it is safer to wait a bit that I've done the change we discussed on a recent thread about some refactoring of the _expression_ template mechanism. Once this change done, it should be much simpler to add new special matrices. Hopefully.

A quick question: I did not read your source files carefully, but I thought that this SkylineMatrix stores the lower triangular using column-major vectors, and the upper part using row-major vectors. Right ? Then what is the purpose of the StorageOrder option ?

Then writting a  triangular solver, a LU decomposition, a Cholesky decomposition, etc. would be almost trivial. Actually, if we add the concept of matrix partitionner as I described in another thread, one could write the exact same code for both a dense or a skyline matrix !

I'm also thinking whether it could make sense to make the "bandwidth" of the matrix a template parameter. The default would be "Dynamic". This would allow to optimize the code for matrices having a small fixed size bandwidth.

This could be a good idea. What kind of optimization can be done on such SkylineMatrices ?

Mainly loop unrolling of the most inner loops. To make that possible, the "VectorRange" discussed above should be able to deal with compile time ranges.



On Mon, Nov 9, 2009 at 4:30 PM, guillaume saupin <guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>> wrote:

   Hi !

   Here comes the cleaned code for the skyline matrix. If you could
   integrate it inside Eigen, it would be perfect.



   guillaume saupin wrote:

       Gael Guennebaud wrote:

           On Thu, Nov 5, 2009 at 2:40 PM, guillaume saupin
           <guillaume.saupin@xxxxxx <mailto:guillaume.saupin@xxxxxx>
           <mailto:guillaume.saupin@xxxxxx>>> wrote:

              Gael Guennebaud wrote:


                  thanks for the patch !

                  I'd be ok to have it in Eigen. In a first step I would
                  probably put it in the unsupported/ set of modules
           because it
                  sems quite preliminary for now.

              No problem. This is perfect if we can access this
           Matrix from
              eigen. I've written some unit tests with boost. Where
           should I put
              them ?

                  However before we can apply the patch some cleaning is
                  required: at the very least the copyrights have to
           be updated.
                  It would also be nice to remove all the commented

                  Then we can discuss about the usefulness of having a
                  SkylineMatrixBase. Indeed, such a base class can be
           needed for
                  two reasons:

              The truth is that I have not really thought about that.
           I just
              copy/paste the SparseMatrix code ;)

                  1 - To implement the _expression_ template paradigm
           for basics
                  operations (+, -, scaling, coefficients wise
           operations like
                  abs, sin, cos etc.). However, here I'm not sure
           that makes a
                  lot of sense. It seems to me that a such a
           specialized matrix
                  is only usefull for solving problems, so supporting
                  operators is not really needed, right ?

              Our main use for SkylineMatrix is for solving systems.
              Nevertheless we will soon use at least a Skyline *
           Dense product,
              and probably a Dense * Skyline product. So template
              may be required to do these products without
           temporaries and with
              a simple syntax. But it is maybe possible to do that
           without a
              Base class ?

           Yes and no, because actually matrix products are special
           in the sense that they must be directly evaluated. On the
           other hand ET might still be useful to allow for more
           complex products involving transpose, negate and scaling ops.

       I will need this kind of complex products, especially those
       involving transpose, in my ColumnMatrix. Do you think that it
       must inherit MatrixBase to benefit from ET ?

           The problem is that our current design does not allow to
           reuse our Transpose/CwiseUnaryOp/etc. _expression_ classes
           for something else than dense matrices. So I'm more and
           more convinced that our ET mechanism should be updated to
           be more flexible. But that's another topic !


                  2 - Another reason would be to have various
                  storage variants which would be sufficiently
           different to
                  require different classes, but close enough to
           share the same
                  (solver) algorithms. Again, I'm not sure that's the
           case here.

              Right. I don't think too that we'll introduce another

                  Nevertheless, it's still fine to keep it for now,
           and wait a
                  bit for more use cases to see what's the best strategy.

              Thanks anyway for your advises. I'll clean the code and
           send it
              back to you.

           ok, cool.



                  On Wed, Nov 4, 2009 at 3:24 PM, guillaume saupin
           <mailto:guillaume.saupin@xxxxxx>>>> wrote:

                     You'll find enclosed the code as a patch.

                     guillaume saupin wrote:

                         Hello !

                         I've finally (after some performances tests)
           implemented a
                         basic skyline matrix using the following
           storage :

                         - an array stores the diagonal elements
                         - an array stores the upper elements in a column
                  oriented way
                         - an array stores the lower elements in a
           row oriented way

                         I've implemented a LU decomposition for this
           format. I
                  use the
                         following algorithm (with minor modification
           to handle the
                         sparseness of the matrix):

                         for each row
                         -- compute the lower part
                           for each col < row
                              LU(row, col) =  (LU(row, col)  - sum(k
           = 0 ; k < col)
                         [LU(row, k)  * LU(k, col)] ) / LU(row, row)
           -- elements are
                         access contiguously / Can be easily vectorize
                         -- compute the upper part
                         for each rrow < row
                           LU(rrow, row) -= sum(k = 0 ; k < rrow)
           [LU(rrow, k)
                   * LU(k,
                         row)] -- elements are access contiguously /
           Can be easily
                         -- compute the diag part
                         LU(row, row) -= sum(k = 0 ; k < row)
           [LU(rrow, k)  * LU(k,
                         row)] -- elements are access contiguously /
           Can be easily

                         This algorithm is well suited for our
           storage format.
                  After a
                         few tests, it seems even more efficient to
           store the
                  the upper
                         and lower elements in the same array, with
           the elements
                         one row followed by the elements of the cols
           of same index,
                         and so on ...

                         I've also implemented a product for the
           Skyline *  Dense

                         Do you think that it would be possible to
           insert this code
                         into eigen ?

                         guillaume saupin wrote:

                             Gael Guennebaud a écrit :

                                 On Tue, Oct 27, 2009 at 10:40 AM,
           guillaume saupin
           <mailto:guillaume.saupin@xxxxxx>>>>> wrote:

                                    Gael Guennebaud a écrit :

                                        this is not an easy question
           because we are
                                 still unsure how
                                        to manage efficiently and
           with minimal
                  code all
                                 these kind of
                                        "special" matrices.

                                        Of course an easy solution
           would be to
                  make it
                                        SparseMatrixBase and let the
                                 based on
                                        sparse iterators do the job.
           However, this
                                 approach won't be
                                        optimal because the iterators
           won't take
                                 advantage of the
                                        specificity of a
           SkylineMatrix which, e.g.,
                                 should allow

                                        Actually skyline matrices are
                  similar to
                                 banded matrices,
                                        and in particular it seems to
           me that
                  they are
                                 similar enough
                                        to share the same algorithms.
           So for
                                 these two kind
                                        of matrices could inherit the
                  matrix base
                                 and the
                                        algorithms could be
           implemented in a generic
                                 way via the
                                        concept of "range inner
           vector" which is a
                                 column (or a row)
                                        vector with an associated
           start index... We
                                 also need
                                        efficient basic operators
           dealing with such
                                 vector. E.g. the
                                        addition of two such vector
           can be
                                 done by first
                                        copying the non overlapping
           part (start and
                                 end), and then sum
                                        the overlapping part using
           Eigen's core

                                    The way we implemented skyline
           matrix is not
                                 similar to
                                    what is usually done for banded
           matrix, as
                  we store
                                 the diagonal,
                                    the upper, and the lower elements
           in separate
                                 arrays. This way, we
                                    can achieve quite efficient LU
                                 solving) due to
                                    coherent  memory accesses.

                                 ok I see, then it is not really what
           people call
                                 skyline storage
                                 but it is more like an extension of
           the tridiagonal

                             I think that our storage is actually
           quite similar
                  to the
                             one proposed there for non symetric skyline
                  (referred as
                             NSK format by Saad) matrix. As in these
           format, we
                             the upper element in a column-oriented
            way, and
                  the lower
                             element in a row-oriented way.
                             The only difference is that we store the
                             in a separate array instead of keeping
           an index
                  array to
                             these elements. This allows contiguous
           access to

                                 However, I don't see how your
           storage can yield
                  to a
                                 more efficient implementation of the LU
                                 than the standard skyline storage.
           For instance
                                 is a basic algorithm without
           pivoting for the
                                 skyline storage:

                                 for(int k = 0; k+1 < rows; ++k)
                                  int rrows = rows-k-1;
                                  int rsize = size-k-1;

                                  // line 1
                                  lu.col(k).end(rrows) /= lu.coeff(k,k);

                                  // line 2
                                            lu.corner(BottomRight,rrows,rsize).noalias() -=
                                 lu.col(k).end(rrows) *

                                 where lu is the working matrix,
           lu.col(k) is
                                 to return a "range vector" as I
           described in my
                                 previous email.

                                 Here line 1 would be trivially
           optimized (i.e.,
                                 vectorized) since
           lu.col(k).end(rrows) is just
                  a small
                                 dense vector.

                                 Line 2 is an outer product which
           again is
                                 trivially/automatically vectorized
           sicne it is
                                 impelmented as a sequence of:
                  "col_range_vector_i -=
                                 scalar * col_range_vector".

                                 Here the locality is pretty good
           because the vector
                                 "lu.col(k).end(rrows)" which is
           reused multiple
                                 is sequentially stored in memory.

                                 But perhaps there exists a special
           algorithm which
                                 perfectly fit your storage ? Is it
           possible to see
                                 your code somewhere ? Finally, if it
           is really more
                                 efficient then that would make sense
           to have it
                  in Eigen.

                             The benefit of using the NSK, i.e. of
           storing the lower
                             matrix in a column-oriented way, and the
                  matrix in a
                             row-oriented way is that in line 1 we
           access contigous
                             elements (the column k of the lower
           matrix), and in
                  line 2
                             we also access contiguous element (the
           row elements
                  of the
                             upper matrix). I don't know if it's
           efficient for
                             vectorization, but without, it is quite
                  as the
                             locality is very good.


                                    So I don't think that our
           algorithms share
                                 features with
                                    band matrix algorithms. They seem
           closer to what
                                 can be done with
                                    diagonal matrices.

                                    Anyway, I've used the
           SparseMatrix code as a
                                 for my
                                    SkylineMatrix code. Currently it only
                  supports very
                                    operations. I'll implement the LU
                                 soon. I will also
                                    try to use vectorization.


                                        These are just some initial
           thoughts and the
                                 discussion is
                                        very open!


                                        On Mon, Oct 19, 2009 at 12:55 PM,
                  guillaume saupin
           <mailto:guillaume.saupin@xxxxxx>>>>>> wrote:


                                           We are planning to use
           your library
                  in our
                                 projects, but we
                                        need a
                                           skyline matrix. Therefore
           I'd like to
                                 implement one for
                                        eigen, but
                                           I don't now where to start.

                                           Is there a specific class
           that can be
                  a good
                                 starting point /
                                           skeleton to start with ? The
                                 SparseMatrixBase might be a
                                        good choice.
                                           Should this SkylineMatrix
           inherit from
                                 SparseMatrixBase, or
                                        be a
                                           separate class ?



Mail converted by MHonArc 2.6.19+