Re: [eigen] Including Spectra within Eigen 3.4

Hi All,

I'm Yixuan, the author of the Spectra library. First of all I should thank Gael to move forward and put this plan on the table. I feel really excited to hear about the possibility to contribute and add this feature to the 3.4 branch of Eigen. As Gael said in the email, I'm glad to listen to suggestions and comments from this mailing list about the design of this future module, especially on the API part.

To partly answer Bill's question on the "smallest"/"largest" option, I want to say that both ways are workable, with their own pros and cons.

Reasons for template parameter:

1. For most cases the selection rule, that is, requesting the smallest or largest eigenvalues, is predefined for a specific problem. For example in matrix approximation and PCA (principal component analysis), it is already known that largest eigenvalues are needed.

2. According to Gael's plan, proper solvers will be automatically deduced for different selection rules. This means that we need to store different types of matrix operators in the class according to the selection rules, and hence we need that information at compile time.

3. Template parameter brings some performance gain. Fewer if-else conditions are evaluated.

Reasons for run-time option:

1. It allows for control in the run time.

2. Compatible with other ARPACK wrappers.

3. Smaller object file size and faster compilation. If users need to frequently switch between different selection rules, they only need to compile one solver class.

In my own opinion we can benefit from the template parameter approach for a cleaner internal implementation. We can have more discussions on this of course.

Also, I want to mention that "Spectra" was the name I used for my original library, and it may not be so meaningful when integrated into Eigen. I guess something like "IterativeEigenSolvers" makes more sense, and is closer to the naming convention of the existing solvers.

The same applies to my naming of various classes. I would suggest changing all "Sym"-like classes to "SelfAdoint" to be consistent with the current Eigen system.

Any other comments are highly appreciated. Thanks!

Best,

Yixuan

2017-02-22 18:40 GMT-05:00 Bill Greene <w.h.greene@xxxxxxxxx>:

This is great news! I have wanted a built in sparse eigensolver
capability in Eigen
for many years now.

I have used the "unsupported" ArpackGeneralizedSelfAdjointEigenSolver and
find it quite useful except, of course, the problem of having to match it to
the appropriately-compiled Arpack library and that it is somewhat limited in
functionality.

I read over your thoughts, Gael, on an appropriate API and, of course,
agree with the
goals of something much simpler than the current Spectra one and one that will
be stable going forward.

At the risk of complicating the discussion or asking something naive,
let me step back and ask a question about the currrent, dense Eigen eigensolver
API. Why are there separate classes for standard and generalized problems?
Couldn't that decision be made at run time based on having a constructor
(or compute function) with either one or two matrix arguments? This
is, of course,
the MATLAB approach with the eig function.
If so, it seems like that would simply the API somewhat.

Regarding the "smallest"/"largest" option, isn't this also something that can be
a runtime option rather than a template parameter? That is the way
that option is
handled in ArpackGeneralizedSelfAdjointEigenSolver.

I'm assuming that if an option can be deferred to runtime without degrading
performance, it will likely simplify the API.

Bill Greene

On Wed, Feb 22, 2017 at 1:37 PM, Rasmus Larsen <rmlarsen@xxxxxxxxxx> wrote:
> This sounds great!! I've been meaning to contribute a rewrite of my own
> PROPACK library that can be viewed as an optimized version of ARPACK for the
> sparse SVD problem. The two would complement each other nicely, and make
> Eigen pretty powerful for sparse spectral problems. One of these days, I'll
> actually get around to it...
>
> On Wed, Feb 22, 2017 at 10:31 AM, Gael Guennebaud
> <gael.guennebaud@xxxxxxxxxx> wrote:
>>
>>
>> Hi all,
>>
>> the last major item I'd like to include for the next 3.4 release is
>> Spectra (http://yixuan.cos.name/spectra) that is a set of iterative solvers
>> for large (sparse or dense) eigenvalue problems based on the implicitly
>> restarted Arnoldi's method. It is essentially a re-write from scratch of the
>> ARPACK library on top of Eigen. Congrats to its author, Yixuan Qiu, for this
>> amazing work.
>>
>> I've already contacted Yixuan about his opinion on the integration of
>> Spectra within Eigen, and he approved such a move. The general idea is to
>> make Spectra a new module in Eigen (Eigen/Spectra) and Yixuan would remain
>> the main maintainer of this module.
>>
>>
>> Since Eigen guarantees API backward compatibility, the main work for now
>> will be the stabilize the public API. This is were ANY OF YOU CAN HELP.
>>
>>
>> Let's first have a brief look at the current API. The user has to provide
>> the following compile-time information:
>>
>> - Choose the *right* solver class among the following six ones:
>> SymEigsSolver, GenEigsSolver, SymEigsShiftSolver, GenEigsRealShiftSolver,
>> GenEigsComplexShiftSolver, SymGEigsSolver.
>>
>> - Pick the *right* operation among: DenseCholesky,
>> DenseGenComplexShiftSolve, DenseGenMatProd, DenseGenRealShiftSolve,
>> DenseSymMatProd, DenseSymShiftSolve, SparseCholesky, SparseGenMatProd,
>> SparseGenRealShiftSolve, SparseRegularInverse, SparseSymMatProd,
>> SparseSymShiftSolve
>>
>> - Tell whether you want the *largest* or *smallest* eigenvalues.
>>
>> This already makes 6*12*2=144 configurations neglecting the numerous
>> variants of "largest" and "smallest".
>>
>> Of course, among them only a few really make sense. For instance,
>> requesting for the *smallest* eigenvalues is always going to be slow (please
>> correct me if I'm wrong). If you want the smallest ones then you must use a
>> shift-and-invert solver with the associated "ShiftSolve" operation and
>> request for the *largest*... yes you read it right, you need to request for
>> the *largest* even though you're going to get the smallest.
>>
>>
>> So I think that the high-level public API should be simplified so that the
>> user has to provide the minimal amount of information without risk of
>> miss-configurations.
>>
>>
>> To initiate discussions, let me share my current reflexion on this matter.
>> There might be several shortcomings, so please correct me if I'm overseeing
>> some usages or technical difficulties.
>>
>> In a normal usage, the user should only has to provide the following 3
>> compile-time information:
>>
>> 1 - Problem kind: self-adjoint, generic, self-adjoint-generalized,
>> generalized
>>
>> This makes four different kinds of problems with different underlying
>> algorithms and different inputs/outputs. So, as in the Eigen/Eigenvalues
>> module, this can be handled through 4 classes. Currently the last one
>> (generalized) is not supported yet, so this makes only 3 solver classes:
>>
>> SymEigsSolver, EigsSolver, SymGEigsSolver
>>
>>
>> 2 - The matrix abstraction: do I work on Matrix<>, on SparseMatrix<> or on
>> custom ones? This question is the same as for the iterative linear solvers
>> (e.g., ConjugateGradient), and for consistency we could pass it as the first
>> template argument of the solver class. (I'm neglecting the multiple
>> lowest/largest variants)
>>
>>
>> 3 - Do I want the largest or smallest ones? This information has to be
>> provided at compile-time so that the right operation and mode can be
>> automatically instantiated for the user (regular for the largest,
>> invert-shift for the smallest).
>>
>>
>> And that's all. Other parameters (number of eigenvalues, basis size,
>> initial shift, ...) can be specified at runtime. Say I want the 100 smallest
>> eigenvalues/vectors of a sparse symmetric real matrices, then all I've to
>> write would be:
>>
>> SymEigsSolver <SparseMatrix<double>, Smallest> eig(A,100);
>>
>> This will imply the usage of the build-in SparseSymShiftSolve operation
>> based on SimplicialLDLT.
>>
>>
>> So far so good, but what if I want to use my own operator instead of the
>> default one instantiated from the previous information? Offering this
>> flexibility if important for custom/matrix-free solving or to choose the
>> underlying linear solver in the shift-invert mode. This could be
>> accomplished through a third template parameters with a default value
>> automatically determined from the previous ones as in the LenvergMarquart<>
>> solver.
>>
>> Unfortunately, by doing so we end up with similar redundancies and
>> possible conflicts as in the current design. This is because the operation
>> implies the matrix-kind, the selection rule (largest or smallest), and
>> whether we have to use a regular or shift-invert mode.
>>
>> Perhaps it could be possible to play with C++11 variadic templates to
>> allow for the two modes, e.g.:
>>
>> SymEigsSolver <SparseMatrix<double>, Smallest > eig1(A,100);
>>
>> SymEigsSolver<MyCholmodOperator<double> > eig2(A,100); // implies lowest
>> and shift-invert mode
>>
>> Of course, MyCholmodOperator should provide some compile-time-information
>> to tell whether it is applying A^1 or A.
>>
>> Using variadic template for that seems over-complicated tough.
>>
>> Moreover, it is also unclear to me whether the different definitions of
>> smallest/largest (i.e., signed, magnitude, real part, imaginary part)
>> impacts the algorithmic choices? Can this detail be decided at runtime? Does
>> it influence the choice of the mode and operation?
>>
>> So in the end, some redundancy is perhaps not that bad:
>>
>> SymEigsSolver <SparseMatrix<double>, Smallest, MyCholmodOperator<...> >
>> eig3(A,100);
>>
>> because only power users will start playing this third template parameter.
>> With this proposal, it is still not possible to explicitly choose between
>> the regular and shift-invert modes. If that's important, then we could
>> extend the second parameter, e.g.:
>>
>> SymEigsSolver <SparseMatrix<double>, Smallest|RegularMode,
>> MyCholmodOperator<...> > eig3(A,100);
>>
>> SymEigsSolver <SparseMatrix<double>, Largest|ShiftInvertMode,
>> MyCholmodOperator<...> > eig3(A,100);
>>
>> Then we get all the flexibility of the current design, but with much less
>> risk to shoot his own foot (IMO).
>>
>> In contrast to the current design, I would also make sure that Smallest
>> and Largest always refer to the output eigenvalues returned by the
>> eigenvalues() method (which are not necessarily the eigenvalues of the input
>> matrix A, e.g., if you silently pass an invert operation while asking for
>> Largest|Regular).
>>
>> cheers,
>> gael.
>
>

Yixuan Qiu <yixuanq@xxxxxxxxx>
Department of Statistics,
Purdue University