[eigen] SVD

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi,

this time i'm really doing the SVD :) I know it's not the first time
I've been saying that.

I would like to run my design choices through this list. I have had a
good look at LAPACK's api and code, it is a good inspiration but I am
not 100% satisfied about their design. Here are the areas where I plan
to do differently:

*Bidiagonalization*

LAPACK function: DGEBDR, http://www.netlib.org/lapack/double/dgebrd.f

They reduce to either upper or lower bidiagonal form depending on
whether M>N. I am not convinced about this and would like to
standardize on upper bidiagonalization, restricting to rows>=cols, for
these reasons:
 - it makes my life easier, the code simpler and a bit faster.
 - that's all i need to implement SVD for all cases (see next section)
 - people dont care to much about bidiagonalization for itself, do
they? its only a step towards SVD, as I see it.

*General SVD approach*

Golub-Reinsch SVD is up to 50% faster for approximately square
matrices (aspect ratio < 5/3), while R-SVD is up to 2x faster for more
rectangular matrices.

Let's implement this:
1) if the aspect ration > 5/3, do a R-SVD, which means do a QR
decomposition and go to 2) to compute the SVD of R
2) else if rows>=cols, do Golub-Reinsch using the UpperBidiagonalization
3) else if rows<cols, do like 2) on the transpose.

So the only inefficiency is in evaluating the transpose in case 3),
which is not too bad I think.

Automatically selecting between Golub-Reinsch and SVD is good, I
think, as it is an implementation detail and few users would know
about this; and it doesn't cost much as most of the code is shared
(HouseholderQR doesn't cost much as it uses the same Householder code
as Bidiagonalization).

*Storage of the unitaries*

The question is how can the SVD class allow to do:

 svd.u() * matrix

efficiently without computing U itself. If we do this, then in
particular we're able to compute selected singular vectors
efficiently, but we can also do much more, like efficiently
implementing least squares. Same for V.

As you know, in Householder-based decompositions, it's very convenient
to store the householder vectors and subsequently be able to apply the
transformation without forming the unitary matrix explicitly. But in
SVD and in eigensolvers, there also are Givens rotations, the number
of which cannot be predicted. So LAPACK gives up allowing to apply the
transformations efficiently. There is one exception: in SVD
decomposition, A = UDV, they allow to efficiently apply the transpose
of U on the fly to a matrix. This is useful to implement least
squares. But this is very ad-hoc, doesn't help in other cases (how to
apply U itself? V? in their LS implementation they have to form the V
matrix explicitly) and generates code that may be unused.

So here is an idea that is in some sense coming from Andrea's
GivensQR: let's store the givens rotations themselves. Each sweep
produces an array of N Givens rotations. What is unknown is how many
sweeps there are, so they could be stored in a linked list. What is
nontrivial then is to apply these Givens rotations efficiently, as
each sweep implies a full traversal of the matrix to which one applies
the transformation. Here the simplest solution is to apply the
transformations to only a few columns (e.g. 16 columns) of the matrix
at once, since columns are mutually independent in this problem.
Another approach is divide-and-conquer bidiagonal SVD, it should also
be inherently cache-friendly, but is much more complicated to
implement and the complexity of this step is negligible compared to
the O(n^3) bidiagonalization step so I'm not sure if it's worth it.

Benoit



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/