[eigen] Re: Benchmark for blocking sizes

2015-02-20 17:34 GMT-05:00 Benoit Jacob <jacob.benoit.1@xxxxxxxxxx>:

What's next:
The new data format doesn't contain analysis, just raw timings. Once we have data from several different machines, we can start to generate tables of blocking parameters that are "least bad" across multiple machines. Of course, users who want absolutely optimal perf on their machine will have to generate a table for it, but Eigen should default to parameters that minimize the efficiency across all machines. So the next step is to generate such a table of parameters by aggregating data from multiple logs.

An analysis program is now in bench/analyze-blocking-sizes.cpp. It takes a list of logs and tries to find good partitionings.

Example on current log-blocking-sizes/v2 data:

Partition into 4 subsets for 91.2% efficiency
Subset 0, efficiency 91.2%:
    tmp2/sandy-bridge-xeon-e5-1650.txt
    tmp2/sandy-bridge-xeon-e5-1650-warm-caches.txt
    tmp2/mac-haswell-fma-i7-4770hq-warm-caches.txt
Subset 1, efficiency 100%:
    tmp2/amd-opteron-6376-trevor-irons.txt
Subset 2, efficiency 100%:
    tmp2/mac-haswell-fma-i7-4770hq.txt
Subset 3, efficiency 100%:
    tmp2/nexus5.txt

(Obviously, when a subset contains only 1 log, efficiency is 100% by definition).

Benoit