Re: [eigen] Accelerate RealSchur and EigenSolver

[ Thread Index | Date Index | More Archives ]

I'm the one that filed the original note about the Hessenberg stuff, and I was able to get it "working", but not without some issues, that should probably be addressed here. 

1) I did the decomposition with:

schur.computeFromHessenberg(Abalanced, Eigen::MatrixXd::Zero(Abalanced.rows(), Abalanced.cols()), false);

and then walked the block-diagonal matrix to find the eigenvalues:

const Eigen::MatrixXd &T = schur.matrixT();
Eigen::Index j = 0;
for (int i = 0; i < T.cols(); ++i) {
    if (i+1 < T.cols()-1 && std::abs(T(i+1,i)) > DBL_EPSILON){
        // Nope, this is a 2x2 block, keep moving
        i += 1;
        // This is a 1x1 block, keep this (real) eigenvalue
        roots(j) = T(i, i);

2) My positive experience was that this method was approximately 20% faster than the conventional eigenvalue solve.  I would have expected/ hoped for better performance
3) My negative experience was that the real eigenvalues returned were not identical to the real eigenvalues from the eigenvalues function.  Probably there are some scaling and conditioning steps that I need to add, and I know my matrix is particularly nasty, but if it doesn't yield the correct values, its not very useful !

All that said, I would love to have a working method for this problem, since basically all my runtime is in the eigenvalue solving problem.

On Mon, Apr 17, 2017 at 2:39 PM, Yixuan Qiu <yixuanq@xxxxxxxxx> wrote:
Hi All,

Motivated by a previous thread here (, I was considering how to speed up EigenSolver for small- and medium-sized matrices. My focus was on optimizing RealSchur::computeFromHessenberg(), since it is the major computation burden of EigenSolver::compute().

After some profiling and analysis, I find that the bottleneck of RealSchur is the QR iteration that repeatedly calls applyHouseholderOnTheLeft() and applyHouseholderOnTheRight() on some blocks of the matrix. These two functions come from the Householder transformation functions implemented in Householder/Householder.h.

And this is where we have room for optimization. In RealSchur, the size of Householder reflector is known to be 2 or 3, so we can write some specialized Householder functions to reduce the overhead in the generic implementation. I have created a patch for Householder.h (, and some benchmark results on my machine are shown in the plot: The -O2 flag was used for all settings.

It can be seen that with AVX enabled, the patched version reduces a visible proportion of running time compared with the current implementation.

If you think this makes sense, I can then submit a PR to Eigen. But before that, the problem I mentioned previously ( may need to be addressed first.




Yixuan Qiu <yixuanq@xxxxxxxxx>
Department of Statistics,
Purdue University

Mail converted by MHonArc 2.6.19+