On Tue, Jul 28, 2009 at 8:49 PM, Jitse Niesen
<jitse@xxxxxxxxxxxxxxxxx> wrote:
On Tue, 28 Jul 2009, Gael Guennebaud wrote:
* all these new routines are just high level blocking algorithms built on
top of a single highly optimized product kernel.
With "blocking algorithm", do you mean that you divide the matrices up in blocks to reduce cache misses? For instance, to compute the product of two N-by-N matrices, partition them as an (N/n)-by-(N/n) block matrices of n-by-n blocks, and multiply the blocks. I had a quick look at the code and that does not seem what you're doing ...
Cheers,
Jitse