Re: [eigen] initial sparse PCG solver |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] initial sparse PCG solver
- From: Manoj Rajagopalan <rmanoj@xxxxxxxxx>
- Date: Fri, 23 Jul 2010 12:30:46 -0400
- Organization: EECS Dept., University of Michigan, Ann Arbor, MI, USA
Hi Daniel, Gael and others
I have been working on a sparse solver suite built on top of Eigen but with
a different architecture. Here are some points from my approach, just for
discussion if anyone is interested. Also, anyone interested in joining this
effort is welcome! :-)
1. I have to handle a block-tridiagonal, complex, non-symmetric matrix with
some blocks being diagonal, some blocks being real symmetric Toeplitz, and
some blocks being Hermitian.
2. I have implementations of BiCGStab, CGLS, and LSQR with QMR, CGNE and CGNR
to follow soon.
3. To be able to freely swap preconditioners and sparse solvers (the two are
independent AFAICT) my architecture uses abstract classes with pure virtual
functions. This goes against Eigen's Core philosophy but (a) this isn't
Eigen/Core :-), (b) performance degradation is not an issue if the penalty
incurred in virtual function lookup is negligible compared to the computation
inside the body of the functions, and (c) branch-predictors and instruction
caches should mitigate penalties in inner loops.
4. Virtual calls can be replaced with static polymorphism once the module is
mature. I am not sure the marginal performance improvement (IIUC) will be
worth the effort invested in the development though.
5. I am attaching code for two classes: SparseSolver and MatrixFunctor. I am
also attaching my implementation of BiCGStab.
6. In many PDE/ODE cases on structured grids, esp. with domain decomposition,
we get block-banded matrices with symmetry/self-adjoint nature. On the other
hand, non-regular grids and adaptive mesh-refinement etc. lead to a sparse
graph of a mesh for which more general Sparse Matrix classes are required. A
flexible architecture that handles all these cases must have, at the top
level, a matrix-functor base class since the details of the matrix are so
user- and case-dependent. Many sparse methods based on Krylov subspace
principles require only the application of the matrix (and its adjoint). A
preconditioned MatrixFunctor must also apply the preconditioner in these
steps. An advantage of a MatrixFunctor class instead of a matrix class
(sparse or dense) is that temporary results can be transparently cached in
its Derived instance while applying the preconditioner^H to the RHS in the
case of multiplication-by-matrix adjoint (for example, consider a
diagonal-matrix preconditioner)
7. The SparseSolver base class takes a MatrixFunctor* in its solve() method.
So the two can be instantiated independently. Derivatives of both can
pre-allocate memory required for specific temporaries at CTOR time and the
classes can then be reused without new[]/delete[] penalties. This is very
helpful with Monte Carlo methods, say. Also, different equation systems with
the same dimension can be solved with the same SparseSolver instance. The use
of free-standing functions like those currently in the Sparse module will
require instantiation of temporaries each time the function is called unless
the FORTRAN paradigm of stateless functions with all scratch space passed as
function-arguments is followed. A class with mutable scratch-space objects
hides this complexity. I am making the case for abstract base classes that
serve as functors here.
8. The classes are currently not templated but it should be easy to do this to
allow various precisions (float/double/long double) and real/complex.
9. It doesn't seem possible to optimize these algorithms at the top-level any
further but since Eigen is the engine for implementation, we get good
performance with convenient expressivity! :-)
10. Though I have begun with specific targets (see items #1 and #2), the
architecture is flexible enough to allow Hermitian matrices and relevant
solution methods like PCG and derivatives.
Thanks,
Manoj
On Friday 23 July 2010 10:24:36 am Gael Guennebaud wrote:
> Hi,
>
> excellent!
>
> I don't see any obvious way to optimize the implementation w/r to the
> performance.
>
> So if we put aside pure API discussion, I only have remarks you
> probably already know:
> - It should accept for the rhs and the result any objects
> (sparse/dense, matrix/vector). That means replacing the dots by
> matrix products.
> - The EPSILON must be a user parameter with a default value depending
> on the scalar type.
> - Check it works with complexes.
> - The two divisions should be checked for 0.
> - For preconditioners you can also try our SparseLLT with a relatively
> large epsilon such that it is really fast and stay sparse.
> - The preconditionner should be an arbitrary type such that we could
> use "virtual" matrix, i.e., we would wrap SparseLLT into a tiny struct
> having operator* overloaded doing:
>
> XXXX operator* (const T& x) const {
> return m_llt.solve(x);
> }
>
> but this first requires we port the solve API we now have in Dense to
> Sparse, i.e., returning proxy objects.... Not difficult, mostly
> copy/pasta from the dense LLT dec for instance.
>
> cheers,
> Gael.
>
> On Fri, Jul 23, 2010 at 11:40 AM, Daniel Lowengrub <lowdanie@xxxxxxxxx>
wrote:
> > Hi,
> > I started working on a preconditioned conjugate gradient solver for
> > the Sparse module. Before working on the interface and eigen
> > integration I decided to try and get the basic functionality setup. I
> > attached a file with the implementation of the algorithm and a small
> > test program. The test program uses the simple jacobi preconditioner.
> > Obviously better preconditioners will have to be implemented in order
> > for the PCG algorithm to be feasible, but for now I'd like to focus on
> > the algorithm itself.
> > Any feedback on the implementation would be appreciated.
> > The implementation used here is based on this paper:
> > www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf
> >
> > Daniel
#ifndef __MatrixFunctor_H__
#define __MatrixFunctor_H__
#include "vector-matrix-types.h"
struct MatrixFunctor
{
MatrixFunctor() {}
virtual ~MatrixFunctor() {}
virtual void applyMatrix(complex_submatrix & b,
complex_submatrix const& x) const =0;
virtual void applyMatrixAdjoint(complex_submatrix & b,
complex_submatrix const& x) const =0;
virtual void computeResidual(complex_submatrix & r,
complex_submatrix const& x) const =0;
virtual unsigned long memoryUsage() const { return 0ul; };
};
#endif // __MatrixFunctor_H__
#ifndef __SparseSolver_H__
#define __SparseSolver_H__
#include <eigen-aliases.h>
#include <vector-matrix-types.h>
#include <common.h>
// fwd decl
class MatrixFunctor;
struct VectorCollectionUtils
{
VectorCollectionUtils() {}
~VectorCollectionUtils() {}
/// z = x^H . y
static void InnerProd(complex_vector_view & z,
complex_submatrix const& x,
complex_submatrix const& y)
{
assert(x.rows() == y.rows());
assert(x.cols() == y.cols());
assert(z.size() == x.cols());
for(int i=0; i<x.cols(); ++i) {
z[i] = x.col(i).conjugate().dot(y.col(i));
}
}
};
class SparseSolver : public VectorCollectionUtils
{
public:
SparseSolver() {}
SparseSolver(uint const ndim, uint const nrhs_max)
: ndim_(ndim), nrhs_(0u), norms_(nrhs_max),
r_(ndim,nrhs_max), x_(ndim, nrhs_max)
{
}
virtual ~SparseSolver() {}
virtual void resize(uint const ndim, uint const nrhs_max) {
ndim_ = ndim;
norms_.resize(nrhs_max);
r_.resize(ndim, nrhs_max);
x_.resize(ndim, nrhs_max);
}
uint const& dim() const { return ndim_; }
void setNumRHS(uint nrhs) { assert(int(nrhs) <= norms_.size()); nrhs_ = nrhs; }
uint const& numRHS() const { return nrhs_; }
complex_submatrix solution() { return eigen_aliases::col_range(x_, 0, nrhs_); }
complex_submatrix const solution() const { return eigen_aliases::const_col_range(x_, 0, nrhs_); }
complex_submatrix residual() { return eigen_aliases::col_range(r_, 0, nrhs_); }
complex_submatrix const residual() const { return eigen_aliases::const_col_range(r_, 0, nrhs_); }
virtual int solve(MatrixFunctor const *const matrix,
uint const max_iter,
real_value const tol) =0;
virtual unsigned long memoryUsage() const;
real_vector const& norms() const { return norms_; }
protected:
// MatrixFunctor const& matrix_;
uint ndim_;
uint nrhs_;
real_vector norms_;
complex_matrix r_, x_;
};
inline
unsigned long
SparseSolver::memoryUsage() const
{
unsigned long result = norms_.size() * sizeof(real_vector::Scalar);
result += (r_.size() + x_.size()) * sizeof(complex_matrix::Scalar);
return result;
}
#endif // __SparseSolver_H__
#include "BiCGStab.h"
#include "MatrixFunctor.h"
#include <iostream>
using namespace eigen_aliases;
using namespace Eigen;
using namespace std;
BiCGStab_Solver::
BiCGStab_Solver(uint const ndim, uint nrhs_max)
: BaseClass(ndim, nrhs_max),
resid_(nrhs_max), temp_(nrhs_max), rho_1_(nrhs_max), rho_2_(nrhs_max),
alpha_(nrhs_max), beta_(nrhs_max), omega_(nrhs_max),
p_(ndim, nrhs_max), s_(ndim, nrhs_max), t_(ndim, nrhs_max), v_(ndim, nrhs_max),
r1_(ndim, nrhs_max)
{
}
void
BiCGStab_Solver::
resize(uint const ndim, uint const nrhs_max)
{
BaseClass::resize(ndim, nrhs_max);
resid_.resize(nrhs_max);
temp_.resize(nrhs_max);
rho_1_.resize(nrhs_max);
rho_2_.resize(nrhs_max);
alpha_.resize(nrhs_max);
beta_.resize(nrhs_max);
omega_.resize(nrhs_max);
p_.resize(ndim, nrhs_max);
s_.resize(ndim, nrhs_max);
t_.resize(ndim, nrhs_max);
v_.resize(ndim, nrhs_max);
r1_.resize(ndim, nrhs_max);
}
int
BiCGStab_Solver::
solve(MatrixFunctor const *const matrix,
uint const max_iter,
real_value const tol)
{
real_vector_view resid(resid_.data(), BaseClass::nrhs_);
complex_vector_view temp(temp_.data(), BaseClass::nrhs_);
complex_vector_view rho_1(rho_1_.data(), BaseClass::nrhs_);
complex_vector_view rho_2(rho_2_.data(), BaseClass::nrhs_);
complex_vector_view alpha(alpha_.data(), BaseClass::nrhs_);
complex_vector_view beta(beta_.data(), BaseClass::nrhs_);
complex_vector_view omega(omega_.data(), BaseClass::nrhs_);
complex_submatrix p = col_range(p_, 0, BaseClass::nrhs_);
complex_submatrix s = col_range(s_, 0, BaseClass::nrhs_);
complex_submatrix t = col_range(t_, 0, BaseClass::nrhs_);
complex_submatrix v = col_range(v_, 0, BaseClass::nrhs_);
complex_submatrix r1 = col_range(r1_, 0, BaseClass::nrhs_);
complex_submatrix r = col_range(BaseClass::r_, 0, BaseClass::nrhs_);
complex_submatrix x = col_range(BaseClass::r_, 0, BaseClass::nrhs_);
real_value const _zero_(0.0);
real_constant_vector const _ZERO_(BaseClass::nrhs_, _zero_);
real_constant_vector const _TOLS_(BaseClass::nrhs_, tol);
matrix->computeResidual(r, x);
r1 = r;
resid = r.colwise().norm();
if((resid_.array() < tol).all())
return 0;
for(uint iter = 0u; iter < max_iter; ++iter) {
VectorCollectionUtils::InnerProd(rho_1, r1, r);
if((rho_1.array() == _zero_).any())
return 2;
if(iter == 0u)
p = r;
else {
beta = (rho_1.cwiseQuotient(rho_2))
.cwiseProduct(alpha.cwiseQuotient(omega));
p = r + (p - v*omega.asDiagonal()) * beta.asDiagonal();
}
matrix->applyMatrix(v, p);
VectorCollectionUtils::InnerProd(temp, r1, v);
alpha = rho_1.cwiseQuotient(temp);
s = r - v *alpha.asDiagonal();
resid = s.colwise().norm();
if((resid.array() < tol /*_TOLS_.array()*/).all()) {
x += p * alpha.asDiagonal();
return 0;
}
matrix->applyMatrix(t,s);
VectorCollectionUtils::InnerProd(temp, t, s);
resid = t.colwise().squaredNorm().real();
omega = temp.cwiseQuotient(resid.cast<complex_value>());
x += p*alpha.asDiagonal() + s*omega.asDiagonal();
r = s - t*omega.asDiagonal();
rho_2 = rho_1;
resid = r.colwise().norm();
cerr << "BiCGStab_Solver::solve(): "
"iter=" << iter << ", norm2=" << resid.transpose() << endl;
if((resid.array() < tol).all())
return 0;
if((omega.array() == _zero_).any())
return 3;
}
return 1;
}
unsigned long
BiCGStab_Solver::
memoryUsage() const
{
unsigned long result = BaseClass::memoryUsage()
+ resid_.size() * sizeof(*resid_.data())
+ (temp_.size() + rho_1_.size() + rho_2_.size()
+ alpha_.size() + beta_.size() + omega_.size()
+ p_.size() + s_.size() + t_.size() + v_.size() + r1_.size())
* sizeof(complex_vector::Scalar);
return result;
}
#ifndef __BiCGStab_Solver_H__
#define __BiCGStab_Solver_H__
#include "SparseSolver.h"
class BiCGStab_Solver : public SparseSolver
{
typedef SparseSolver BaseClass;
public:
BiCGStab_Solver() {}
BiCGStab_Solver(uint const ndim, uint const nrhs_max);
~BiCGStab_Solver() {}
void resize(uint const ndim, uint const nrhs_max);
int solve(MatrixFunctor const *const matrix,
uint const max_iter,
real_value const tol);
unsigned long memoryUsage() const;
private:
real_vector resid_;
complex_vector temp_;
complex_vector rho_1_, rho_2_;
complex_vector alpha_, beta_, omega_;
complex_matrix p_, s_, t_, v_, r1_;
};
#endif // __BICGSTAB_SOLVER_H__