Hi List,

Thanks a lot for your feedback and results on the first iteration of this benchmark.

A new version is checked in the repo: bench/benchmark-blocking-sizes.cpp


 1. Uses Eigen's BenchTimer.h so should run everywhere.
 2. Tries to empty caches by default. You can override/tweak this behavior by playing with the --min-working-set-size command-line parameter.
 3. Measures sizes up to 2048, up from 1024. Gael rightly pointed out to me that 1024 was not quite large enough to show all of the impact of M/N blocking sizes.
 4. Displays progress info on stderr -- so it's fun to watch and you'll want to run it on lots of machines for me!
 5. Doesn't try to do any analysis anymore. Just dumps a raw easy-to-parse table of GFlops for each combo of product size and blocking params.

Downside: it now takes longer to run --- typically 3 hours on a PC, 9 hours on an Android device.

Example compilation command line (note -mfma for haswell):

$ c++ -O3 -DNDEBUG -mavx -mfma ../eigen/bench/benchmark-blocking-sizes.cpp -o b -I ../eigen --std=c++11

You'll want to redirect stdout to a file, while stderr is all you want to watch in your terminal, so there's no need for 'tee'.

$ ./b  foo > log-benchmark-blocking-sizes

Obviously, remove the 'foo' in the above command line --- unless you want to see the usage message :)


