Re: [eigen] initial sparse PCG solver

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi Daniel, Gael and others

   I have been working on a sparse solver suite built on top of Eigen but with 
a different architecture. Here are some points from my approach, just for 
discussion if anyone is interested. Also, anyone interested in joining this 
effort is welcome! :-)

1. I have to handle a block-tridiagonal, complex, non-symmetric matrix with 
some blocks being diagonal, some blocks being real symmetric Toeplitz, and 
some blocks being Hermitian.

2. I have implementations of BiCGStab, CGLS, and LSQR with QMR, CGNE and CGNR 
to follow soon.

3. To be able to freely swap preconditioners and sparse solvers (the two are 
independent AFAICT) my architecture uses abstract classes with pure virtual 
functions. This goes against Eigen's Core philosophy but (a) this isn't 
Eigen/Core :-), (b) performance degradation is not an issue if the penalty 
incurred in virtual function lookup is negligible compared to the computation 
inside the body of the functions, and (c) branch-predictors and instruction 
caches should mitigate penalties in inner loops.

4. Virtual calls can be replaced with static polymorphism once the module is 
mature. I am not sure the marginal performance improvement (IIUC) will be 
worth the effort invested in the development  though.

5. I am attaching code for two classes: SparseSolver and MatrixFunctor. I am 
also attaching my implementation of BiCGStab.

6. In many PDE/ODE cases on structured grids, esp. with domain decomposition, 
we get block-banded matrices with symmetry/self-adjoint nature. On the other 
hand, non-regular grids and adaptive mesh-refinement etc. lead to a sparse 
graph of a mesh for which more general Sparse Matrix classes are required. A 
flexible architecture that handles all these cases must have, at the top 
level, a matrix-functor base class since the details of the matrix are so 
user- and case-dependent. Many sparse methods based on Krylov subspace 
principles require only the application of the matrix (and its adjoint). A 
preconditioned MatrixFunctor must also apply the preconditioner in these 
steps. An advantage of a MatrixFunctor class instead of a matrix class 
(sparse or dense) is that temporary results can be transparently cached in 
its Derived instance while applying the preconditioner^H to the RHS in the 
case of multiplication-by-matrix adjoint (for example, consider a 
diagonal-matrix preconditioner)

7. The SparseSolver base class takes a MatrixFunctor* in its solve() method. 
So the two can be instantiated independently. Derivatives of both can 
pre-allocate memory required for specific temporaries at CTOR time and the 
classes can then be reused without new[]/delete[] penalties. This is very 
helpful with Monte Carlo methods, say. Also, different equation systems with 
the same dimension can be solved with the same SparseSolver instance. The use 
of free-standing functions like those currently in the Sparse module will 
require instantiation of temporaries each time the function is called unless 
the FORTRAN paradigm of stateless functions with all scratch space passed as 
function-arguments is followed. A class with mutable scratch-space objects 
hides this complexity. I am making the case for abstract base classes that 
serve as functors here.

8. The classes are currently not templated but it should be easy to do this to 
allow various precisions (float/double/long double) and real/complex.

9. It doesn't seem possible to optimize these algorithms at the top-level any 
further but since Eigen is the engine for implementation, we get good 
performance with convenient expressivity! :-)

10. Though I have begun with specific targets (see items #1 and #2), the 
architecture is flexible enough to allow Hermitian matrices and relevant 
solution methods like PCG and derivatives.

Thanks,
Manoj


On Friday 23 July 2010 10:24:36 am Gael Guennebaud wrote:
> Hi,
>
> excellent!
>
> I don't see any obvious way to optimize the implementation w/r to the
> performance.
>
> So if we put aside pure API discussion, I only have remarks you
> probably already know:
> - It should accept for the rhs and the result any objects
> (sparse/dense, matrix/vector).  That means replacing the dots by
> matrix products.
> - The EPSILON must be a user parameter with a default value depending
> on the scalar type.
> - Check it works with complexes.
> - The two divisions should be checked for 0.
> - For preconditioners you can also try our SparseLLT with a relatively
> large epsilon such that it is really fast and stay sparse.
> - The preconditionner should be an arbitrary type such that we could
> use "virtual" matrix, i.e., we would wrap SparseLLT into a tiny struct
> having operator* overloaded doing:
>
> XXXX operator* (const T& x) const {
>   return m_llt.solve(x);
> }
>
> but this first requires we port the solve API we now have in Dense to
> Sparse, i.e., returning proxy objects.... Not difficult, mostly
> copy/pasta from the dense LLT dec for instance.
>
> cheers,
> Gael.
>
> On Fri, Jul 23, 2010 at 11:40 AM, Daniel Lowengrub <lowdanie@xxxxxxxxx> 
wrote:
> > Hi,
> > I started working on a preconditioned conjugate gradient solver for
> > the Sparse module.  Before working on the interface and eigen
> > integration I decided to try and get the basic functionality setup.  I
> > attached a file with the implementation of the algorithm and a small
> > test program.  The test program uses the simple jacobi preconditioner.
> >  Obviously better preconditioners will have to be implemented in order
> > for the PCG algorithm to be feasible, but for now I'd like to focus on
> > the algorithm itself.
> > Any feedback on the implementation would be appreciated.
> > The implementation used here is based on this paper:
> > www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf
> >
> > Daniel

#ifndef __MatrixFunctor_H__
#define __MatrixFunctor_H__

#include "vector-matrix-types.h"

struct MatrixFunctor
{
    MatrixFunctor() {}
    virtual ~MatrixFunctor() {}
    virtual void applyMatrix(complex_submatrix & b,
                             complex_submatrix const& x) const =0;
    virtual void applyMatrixAdjoint(complex_submatrix & b,
                                    complex_submatrix const& x) const =0;
    virtual void computeResidual(complex_submatrix & r,
                                 complex_submatrix const& x) const =0;
    virtual unsigned long memoryUsage() const { return 0ul; };
};


#endif // __MatrixFunctor_H__
#ifndef __SparseSolver_H__
#define __SparseSolver_H__

#include <eigen-aliases.h>
#include <vector-matrix-types.h>
#include <common.h>

// fwd decl
class MatrixFunctor;

struct VectorCollectionUtils
{
    VectorCollectionUtils() {}
    ~VectorCollectionUtils() {}
    /// z = x^H . y
    static void InnerProd(complex_vector_view & z,
                          complex_submatrix const& x,
                          complex_submatrix const& y)
    {
        assert(x.rows() == y.rows());
        assert(x.cols() == y.cols());
        assert(z.size() == x.cols());
        for(int i=0; i<x.cols(); ++i) {
            z[i] = x.col(i).conjugate().dot(y.col(i));
        }
    }
};


class SparseSolver : public VectorCollectionUtils
{
public:
    SparseSolver() {}
    SparseSolver(uint const ndim, uint const nrhs_max)
        : ndim_(ndim), nrhs_(0u), norms_(nrhs_max),
        r_(ndim,nrhs_max), x_(ndim, nrhs_max)
    {
    }
    virtual ~SparseSolver() {}
    
    virtual void resize(uint const ndim, uint const nrhs_max) {
        ndim_ = ndim;
        norms_.resize(nrhs_max);
        r_.resize(ndim, nrhs_max);
        x_.resize(ndim, nrhs_max);
    }
    
    uint const& dim() const { return ndim_; }
    void setNumRHS(uint nrhs) { assert(int(nrhs) <= norms_.size()); nrhs_ = nrhs; }
    uint const& numRHS() const { return nrhs_; }
    
    complex_submatrix solution() { return eigen_aliases::col_range(x_, 0, nrhs_); }
    complex_submatrix const solution() const { return eigen_aliases::const_col_range(x_, 0, nrhs_); }
    
    complex_submatrix residual() { return eigen_aliases::col_range(r_, 0, nrhs_); }
    complex_submatrix const residual() const { return eigen_aliases::const_col_range(r_, 0, nrhs_); }
    
    virtual int solve(MatrixFunctor const *const matrix,
                      uint const max_iter,
                      real_value const tol) =0;
    virtual unsigned long memoryUsage() const;
    real_vector const& norms() const { return norms_; }
    
protected:
//     MatrixFunctor const& matrix_;
    uint ndim_;
    uint nrhs_;
    real_vector norms_;
    complex_matrix r_, x_;
};


inline
unsigned long
SparseSolver::memoryUsage() const
{
    unsigned long result = norms_.size() * sizeof(real_vector::Scalar);
    result += (r_.size() + x_.size()) * sizeof(complex_matrix::Scalar);
    return result;
}

#endif // __SparseSolver_H__
#include "BiCGStab.h"
#include "MatrixFunctor.h"
#include <iostream>

using namespace eigen_aliases;
using namespace Eigen;
using namespace std;


BiCGStab_Solver::
BiCGStab_Solver(uint const ndim, uint nrhs_max)
: BaseClass(ndim, nrhs_max),
resid_(nrhs_max), temp_(nrhs_max), rho_1_(nrhs_max), rho_2_(nrhs_max),
alpha_(nrhs_max), beta_(nrhs_max), omega_(nrhs_max),
p_(ndim, nrhs_max), s_(ndim, nrhs_max), t_(ndim, nrhs_max), v_(ndim, nrhs_max),
r1_(ndim, nrhs_max)
{
}


void
BiCGStab_Solver::
resize(uint const ndim, uint const nrhs_max)
{
    BaseClass::resize(ndim, nrhs_max);
    resid_.resize(nrhs_max);
    temp_.resize(nrhs_max);
    rho_1_.resize(nrhs_max);
    rho_2_.resize(nrhs_max);
    alpha_.resize(nrhs_max);
    beta_.resize(nrhs_max);
    omega_.resize(nrhs_max);
    p_.resize(ndim, nrhs_max);
    s_.resize(ndim, nrhs_max);
    t_.resize(ndim, nrhs_max);
    v_.resize(ndim, nrhs_max);
    r1_.resize(ndim, nrhs_max);
}


int
BiCGStab_Solver::
solve(MatrixFunctor const *const matrix,
      uint const max_iter,
      real_value const tol)
{
    real_vector_view resid(resid_.data(), BaseClass::nrhs_);
    complex_vector_view temp(temp_.data(), BaseClass::nrhs_);
    complex_vector_view rho_1(rho_1_.data(), BaseClass::nrhs_);
    complex_vector_view rho_2(rho_2_.data(), BaseClass::nrhs_);
    complex_vector_view alpha(alpha_.data(), BaseClass::nrhs_);
    complex_vector_view beta(beta_.data(), BaseClass::nrhs_);
    complex_vector_view omega(omega_.data(), BaseClass::nrhs_);
    complex_submatrix p = col_range(p_, 0, BaseClass::nrhs_);
    complex_submatrix s = col_range(s_, 0, BaseClass::nrhs_);
    complex_submatrix t = col_range(t_, 0, BaseClass::nrhs_);
    complex_submatrix v = col_range(v_, 0, BaseClass::nrhs_);
    complex_submatrix r1 = col_range(r1_, 0, BaseClass::nrhs_);
    
    complex_submatrix r = col_range(BaseClass::r_, 0, BaseClass::nrhs_);
    complex_submatrix x = col_range(BaseClass::r_, 0, BaseClass::nrhs_);
    
    real_value const _zero_(0.0);
    real_constant_vector const _ZERO_(BaseClass::nrhs_, _zero_);
    real_constant_vector const _TOLS_(BaseClass::nrhs_, tol);
    
    matrix->computeResidual(r, x);
    r1 = r;
    resid = r.colwise().norm();
    if((resid_.array() < tol).all())
        return 0;
    
    for(uint iter = 0u; iter < max_iter; ++iter) {
        
        VectorCollectionUtils::InnerProd(rho_1, r1, r);
        if((rho_1.array() == _zero_).any())
            return 2;
        
        if(iter == 0u)
            p = r;
        else {
            beta = (rho_1.cwiseQuotient(rho_2))
                .cwiseProduct(alpha.cwiseQuotient(omega));
            p = r + (p - v*omega.asDiagonal()) * beta.asDiagonal();
        }
        matrix->applyMatrix(v, p);
        VectorCollectionUtils::InnerProd(temp, r1, v);
        alpha = rho_1.cwiseQuotient(temp);
        s = r - v *alpha.asDiagonal();
        resid = s.colwise().norm();
        if((resid.array() < tol /*_TOLS_.array()*/).all()) {
            x += p * alpha.asDiagonal();
            return 0;
        }
        
        matrix->applyMatrix(t,s);
        VectorCollectionUtils::InnerProd(temp, t, s);
        resid = t.colwise().squaredNorm().real();
        omega = temp.cwiseQuotient(resid.cast<complex_value>());
        x += p*alpha.asDiagonal() + s*omega.asDiagonal();
        r = s - t*omega.asDiagonal();
        
        rho_2 = rho_1;
        resid = r.colwise().norm();
        
        cerr << "BiCGStab_Solver::solve(): "
            "iter=" << iter << ", norm2=" << resid.transpose() << endl;
        if((resid.array() < tol).all())
            return 0;
        
        if((omega.array() == _zero_).any())
            return 3;
    }
    
    return 1;
}


unsigned long
BiCGStab_Solver::
memoryUsage() const
{
    unsigned long result = BaseClass::memoryUsage()
        + resid_.size() * sizeof(*resid_.data())
        + (temp_.size() + rho_1_.size() + rho_2_.size()
           + alpha_.size() + beta_.size() + omega_.size()
           + p_.size() + s_.size() + t_.size() + v_.size() + r1_.size())
        * sizeof(complex_vector::Scalar);
    return result;
}

#ifndef __BiCGStab_Solver_H__
#define __BiCGStab_Solver_H__


#include "SparseSolver.h"


class BiCGStab_Solver : public SparseSolver
{
    typedef SparseSolver BaseClass;
public:
    BiCGStab_Solver() {}
    BiCGStab_Solver(uint const ndim, uint const nrhs_max);
    ~BiCGStab_Solver() {}
    void resize(uint const ndim, uint const nrhs_max);
    int solve(MatrixFunctor const *const matrix,
              uint const max_iter,
              real_value const tol);
    unsigned long memoryUsage() const;
private:
    real_vector resid_;
    complex_vector temp_;
    complex_vector rho_1_, rho_2_;
    complex_vector alpha_, beta_, omega_;
    complex_matrix p_, s_, t_, v_, r1_;
};



#endif // __BICGSTAB_SOLVER_H__


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/