trying to directly write a cache-friendly implementation is probably a bit too difficult. I'd recommend to first have a working implementation based on vector-vector operations and then see how to leverage more efficient matrix-vector or even matrix-matrix operations.

Moreover, it would be better to write a high-level blocking strategy as in the PartialPivLU and LLT solvers and let the existing triangular solver and matrix products deal with the nasty details. Such an approach should lead to a much simpler code, with less redundancy, and the result will be more future proof as the internal matrix product kernels are subject to change from one version to the other.


I am writing Sylvester-like solvers.  I began with triangular Sylvester
equation.  Now I have done with vectors, i.e. when the solution is a
column-vector or a row-vector.

However, when I look at Eigen/src/Core/products/TriangularSolverMatrix.h, I see
different blocking methods for on-the-left and on-the-right.  What is the best
blocking when the solution is a general matrix?

The attachment is my current work.  The TODO comment marks where I got stuck.


