2009/5/13, Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:
> it spends most of its time in cache friendly matrix product

oops, this is inexact.

i wanted to say that i tried reducing all the way to tiny fixed-size
blocks, and then it spend most of its time in cache-friendly product
indicating that i couldn't squeeze any better performance out of this
approach; still the performance wasn't better than what i attached to
the previous email where it calls partial LU when blocks are small


