Re: [eigen] Advice on contributing a module for a multithreaded, superno

On Sat, Mar 9, 2019 at 6:28 PM Jack Poulson <jack@xxxxxxxxxxxxx> wrote:

Hi Gael,

Interesting! I hadn't run a comparison against CholMod yet (but it is obviously the gold standard). There is a slight amount of automatic algorithmic switching that needs to be added in (https://gitlab.com/hodge_star/catamari/issues/2) before I would assume a blackbox comparison across the gamut of matrix types would be competitive. Catamari currently implements:
* Sequential and multithreaded, supernodal right-looking (multifrontal) Cholesky, LDL^T, and LDL^H,
* Sequential and multithreaded, supernodal left-looking Cholesky, LDL^T, and LDL^H,
* Sequential scalar left-looking Cholesky, LDL^T, and LDL^H
* Sequential scalar up-looking Cholesky, LDL^T, and LDL^H.

At the moment, the first category is used by default, but there are more refined categories:
* For large, reasonably dense, multithreaded environments, the supernodal right-looking approach is preferred.
* For large, reasonably dense, sequential environments, the supernodal left-looking approach might be preferred.
* For small/sparse problems, the up-looking approach should be preferred, but there is not yet parallel support for this in catamari (not that it would be difficult).

parallel simplicial would be nice, and the first publicly available implementation AFAIK.

According to CholMod's publications, it intelligently switches between the supernodal left-looking factorization and the scalar up-looking factorization (based upon the computational intensity as determined from the symbolic factorization). For higher core counts, the supernodal right-looking approach becomes critical so that the top of the assembly tree is parallelized.

I did not know that CholMod was supposed to be smart in this regard. On my own experienced I always had to explicitly choose between simplicial and supernodal modes to get best performance (and for simplicial I always use our built-in one of course!).

On the subject of Eigen integration: I would defer to Eigen experts on the best/most-useful means of integration. If you believe that there would be added usefulness in transplanting Catamari's logic into core Eigen (on top of its BLAS and matrix classes), I think that could be interesting.

That would definitely be the ideal option as duplicating code and efforts is rarely a good idea. Of course, this is also more work on the short term.

I believe that my permissively-licensed AMD implementation (quotient) should be roughly equivalent to SuiteSparse's (and, I suppose, Eigen's) in terms of performance.

Thanks to Google, we very recently managed to relicense SuiteSparse's AMD as MPL2. But the code is still horrible to us, C++ developers, so yours might still be very desirable if it's in par regarding performance/robustness. About AMD, I slightly modified our copy to support matrices with explicit zeros on the diagonal: those must be moved/kept at the bottom-right corner. Typical use case is equality constraints handled through Lagrange multipliers.

A wrapper might be a good first step, but this conversation has gotten me thinking that improving the algorithmic switching logic is a priority.

Yes, starting with a wrapper would be a good start to make catamari visible and easily testable/usable to the Eigen's community. If we envision an integration within Eigen's official module, a large refactoring will be required through.

Gaël.

With Thanks,
Jack

On Sat, Mar 9, 2019, at 2:31 AM, Gael Guennebaud wrote:
Hi Jack,

This is a very nice piece of work, congrats! I did one benchmark if catamari+MKL on the "ldoor" matrix [1] and got same speed as suitesparse/Cholmod: about 8s on my Haswell quad-core (15s without openmp).

Requiring c++14 for new features is not a big deal anymore.

How do you envision this "Eigen/Catamari" module? A simple wrapper as we do with Cholmod/UmfPack/Pastix/Pardiso? or a built-in Eigen-ified version? As you probably already figured out Eigen already provides several features redundant with what you developed in Catamari:
- BLAS/LAPACK layer with optional compile-time fallback to any BLAS / LAPACKE / MKL
- Sparse storage
- Fill-in reorderings, we have AMD and COLAMD, so yours might rather be complementary.

Gaël

[1] https://sparse.tamu.edu/GHS_psdef/ldoor

On Fri, Mar 8, 2019 at 4:59 PM Jack Poulson <jack@xxxxxxxxxxxxx> wrote:

Dear Eigen Maintainers,

I noticed that Eigen's current documentation only lists a simplicial LDLT sparse-direct solver; I spent my last several months building a header-only C++ (optionally multithreaded) supernodal Cholesky/LDL^T/LDL^H sparse-direct solver (which also implements high-performance Determinantal Point Process sampling with similar techniques) and am curious if it would be seen as a good fit for an unsupported Eigen module. The solver achieves as much as 1 TF in double-precision on my 16-core skylake workstation.

The current project is described here:
https://hodgestar.com/catamari/
I released v0.1 a few days ago, but I have improved the performance in 'master' substantially since then.

My assumption is that the biggest hurdle will be due to the solver (catamari) being implemented in C++14, and that Eigen is currently restricted to C++03. And, while catamari optionally incorporates OpenMP task scheduling for the multithreaded parallelism, this support is only enabled if an appropriate preprocessor directive is defined (CATAMARI_OPENMP).

Any advice would be appreciated.

Sincerely,
Jack Poulson