2009/5/14 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>: > Indeed, for dynamic sizes, there's no unrolling, but there's another > issue: cache-friendliness. Gael made a very good (actually, according > to the benchmarks on the wiki, the best all-around) cache-friendly > matrix-vector product implementation. This becomes increasingly > important as the matrix size increases. By contrast, our triangular > solver does not yet have comparable optimizations (and, according to > our benchmarks, neither do other libraries). Actually I was wrong here. I was thinking about what Gael said the other day on IRC, [mer. mai 13 2009] [14:16:14] <gael__> actually, we should also implement a cache friendly "inv(triangular) * matrix" which is currently done one column after the other.... But what he meant was actually the case where you solve with a matrix as the right-hand side. In this case, there is room for improvement. But in _your_ case (i checked your benchmark code) you are only solving with a VECTOR as right hand side. Our triangular solver (again written by Gael) does use the cache-friendly matrix-vector product internally, so it's already cache-friendly in that case. Unless Gael thinks otherwise, I'm tempted to think that there is not much room for improvement left to improve the speed to triangular solver in your case... and so it might be that your numbers actually show that the matrix-vector product approach is actually faster. Cheers, Benoit

