Re: [eigen] Tracking performance regressions

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


I've updated the plots with more reliable ones (less busy system).

* The 3pX4 kernel clearly slowed down the 2400x2400x24 case.
* The regression for complexes is more clearly visible on SSE-only runs. (http://download.tuxfamily.org/eigen/perf-monitoring/haswell-sse-cgemm.pdf)
* The prefetching change slow-down the 2400^3 case, but improve the 2400 24 24 one.

Regarding complexes, I think that we should move to 4 separated real products instead of torturing the product kernel to handle complexes. With such an approach, GFLops would be guaranteed to match the one of reals.. Now that the packing functions take as input a generic data_mapper type, this should be fairly easy.

gael


On Sat, Feb 21, 2015 at 3:50 PM, Benoit Jacob <jacob.benoit.1@xxxxxxxxx> wrote:
Nevermind, I had been looking at the CGEMM file. SGEMM is much more like was expected! Interesting how we've been regressing complex while optimizing float.
Benoit

2015-02-21 9:48 GMT-05:00 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:

That's strange: the graph also reports the original introduction of the 3px4 kernel as a regression on Haswell for 2400^3 products? Does that make sense?

Maybe it will be easier to make sense of results once we have optimal blocking parameters.

Benoit

2015-02-21 9:44 GMT-05:00 Benoit Jacob <jacob.benoit.1@xxxxxxxxx>:

Wow, that's great!

Thanks a lot --- performance regression testing has been most needed for a long time.

It already looks like my prefetch changes are bad for Haswell and we likely need to make them ARM-only.

Benoit

2015-02-20 13:42 GMT-05:00 Gael Guennebaud <gael.guennebaud@xxxxxxxxx>:


Hi,

I've started a very simple approach to help us tracking performance regression. The code is in bench/perf_monitoring/gemm, and some results are there:


These plots have been generated from a single command line:
$ CXX_FLAGS=" -mfma " ./run_gemm.sh

Currently this permits to see the performance evolution of matrix*matrix products for float, double, and complex<double> for some predefined matrix sizes configurations (gemm/settings.txt) and some predefined changesets (gemm/changesets.txt).

The idea is to add in gemm/changesets.txt any new changeset that might explicitly affect the performance of matrix products.

Then, by adjusting the file gemm/settings.txt, one can track its favorite size configurations.

Currently all changesets are tested for every run, but in the very near future I plan to make the script tests the new ones only by default with the option to update previous ones to reduce false positives.

Since the generated pdf files are very small, we could try to install a very simple webservice for sharing the plot files.

In the future, we could try to couple this with Mozilla performance monitoring tools:


cheers,
gael









Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/