|Re: [eigen] Advice on contributing a module for a multithreaded, supernodal, MPL-licensed sparse-direct LDLT/LDLH solver?|
[ Thread Index |
| More lists.tuxfamily.org/eigen Archives
- To: eigen@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [eigen] Advice on contributing a module for a multithreaded, supernodal, MPL-licensed sparse-direct LDLT/LDLH solver?
- From: "Jack Poulson" <jack@xxxxxxxxxxxxx>
- Date: Sat, 09 Mar 2019 12:27:41 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hodgestar.com; h=message-id:in-reply-to:references:date:from:to:subject :content-type; s=fm2; bh=g1M0+KH7CIwln5T3kcigaS/csTi2zMKUkzDEprl mXxY=; b=Um81W+LHlizxl62Fc5h0mfrfyoG9b2bziIoKsiKxEySpmeXk6AvJXqe iYVdZmFMzGH1qfPsqMuKg8nOOccFo6qB3nAmv5lNGsQjzkF2UoEd4bVE6pg/XlPS +tng/rBjF3lhmTsShm/gC1H45kTVXsfOxIfAfyi+24F2GpMBUi984cZ8qKecm1/u i2nh50zxFFemMKWJ0E+a8GR8ELvtFbaSSxWlrC9gFtFRJvC2bPsFohh4li/XwZVc qwTiIlVPLUd5HPX8nPmvBPEpeRaqIJRaLE6gWu+ekc9tUJO4yVOzHQsFC/e9gKQL gBgkaY9zsr07rM3uK3W+YACb4nG/0/Q==
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:in-reply-to :message-id:references:subject:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=g1M0+KH7CIwln5T3k cigaS/csTi2zMKUkzDEprlmXxY=; b=I2lb0BGi538coJLwe2mXoLK+7TYjVG8wu rmof68Tbw/d/s0mVO3jzkSz3WTwTTumEkpPsAmOZBdNbFFoAd+9gI82oTQ/499Qe R5QnwCHJ2+jxhN0MeEaw0Qdkz+LT715ko2QOS8GBtGcObiuIBtqg90TPJPDSE0Or KVliBuA1F7KtAlXquLhbY0RTmTKdsDX4mlpptIonA3x4xBRJNAiU8zAVwWmNEDib wDrowkDKYJrZicBnPxzAcb6zKg7MKfz0gjUthOT8T8A6NJAffRQqBWHYBrKr1YP3 lBMvWdiolEvmUcujXeUEc4sN7OPHdtiXi+y5KBoye27tDA9cwVPPg==
Interesting! I hadn't run a comparison against CholMod yet (but it is obviously the gold standard). There is a slight amount of automatic algorithmic switching that needs to be added in (https://gitlab.com/hodge_star/catamari/issues/2
) before I would assume a blackbox comparison across the gamut of matrix types would be competitive. Catamari currently implements:
* Sequential and multithreaded, supernodal right-looking (multifrontal) Cholesky, LDL^T, and LDL^H,
* Sequential and multithreaded, supernodal left-looking Cholesky, LDL^T, and LDL^H,
* Sequential scalar left-looking Cholesky, LDL^T, and LDL^H
* Sequential scalar up-looking Cholesky, LDL^T, and LDL^H.
At the moment, the first category is used by default, but there are more refined categories:
* For large, reasonably dense, multithreaded environments, the supernodal right-looking approach is preferred.
* For large, reasonably dense, sequential environments, the supernodal left-looking approach might be preferred.
* For small/sparse problems, the up-looking approach should be preferred, but there is not yet parallel support for this in catamari (not that it would be difficult).
According to CholMod's publications, it intelligently switches between the supernodal left-looking factorization and the scalar up-looking factorization (based upon the computational intensity as determined from the symbolic factorization). For higher core counts, the supernodal right-looking approach becomes critical so that the top of the assembly tree is parallelized.
On the subject of Eigen integration: I would defer to Eigen experts on the best/most-useful means of integration. If you believe that there would be added usefulness in transplanting Catamari's logic into core Eigen (on top of its BLAS and matrix classes), I think that could be interesting. I believe that my permissively-licensed AMD implementation (quotient) should be roughly equivalent to SuiteSparse's (and, I suppose, Eigen's) in terms of performance.
A wrapper might be a good first step, but this conversation has gotten me thinking that improving the algorithmic switching logic is a priority.
On Sat, Mar 9, 2019, at 2:31 AM, Gael Guennebaud wrote:
This is a very nice piece of work, congrats! I did one benchmark if catamari+MKL on the "ldoor" matrix  and got same speed as suitesparse/Cholmod: about 8s on my Haswell quad-core (15s without openmp).
Requiring c++14 for new features is not a big deal anymore.
How do you envision this "Eigen/Catamari" module? A simple wrapper as we do with Cholmod/UmfPack/Pastix/Pardiso? or a built-in Eigen-ified version? As you probably already figured out Eigen already provides several features redundant with what you developed in Catamari:
- BLAS/LAPACK layer with optional compile-time fallback to any BLAS / LAPACKE / MKL
- Sparse storage
- Fill-in reorderings, we have AMD and COLAMD, so yours might rather be complementary.
Dear Eigen Maintainers,
I noticed that Eigen's current documentation only lists a simplicial LDLT sparse-direct solver; I spent my last several months building a header-only C++ (optionally multithreaded) supernodal Cholesky/LDL^T/LDL^H sparse-direct solver (which also implements high-performance Determinantal Point Process sampling with similar techniques) and am curious if it would be seen as a good fit for an unsupported Eigen module. The solver achieves as much as 1 TF in double-precision on my 16-core skylake workstation.
The current project is described here:
I released v0.1 a few days ago, but I have improved the performance in 'master' substantially since then.
My assumption is that the biggest hurdle will be due to the solver (catamari) being implemented in C++14, and that Eigen is currently restricted to C++03. And, while catamari optionally incorporates OpenMP task scheduling for the multithreaded parallelism, this support is only enabled if an appropriate preprocessor directive is defined (CATAMARI_OPENMP).
Any advice would be appreciated.