Re: [eigen] benchmarking weirdness |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Thanks for your reply. I have repeated these tests enough time to be sure that the performance differences are real, not just noise. I will try to get vtune and learn to use it. I am concerned however about the non-freeness; if it's only for the cache miss estimations, have you tried cachegrind? Good idea about the cache misses, although I would like to understand why a specific matrix storage/traversal order is better than another one wrt cache misses. Also, I still have no idea why disabling asserts hurts performance. Cheers, Benoit On Saturday 05 January 2008 13:31:01 Christian Mayer wrote: > Hi, > > I didn't look at the code recently so I can't explain the measured > results. But I can say a few things about benchmarking/measuring itself :) > > In German we've got a saying: "Wer misst, misst Mist" > (Once you are measuring you'll only get crap...) > > The idea is basicly that you must reduce any possible errors and take > care that the remainig error won't hit you badly... In our case I'd say > that you should kill all processes that you don't need (especially the X > server!), make sure that all drives are synchronized, run the benchmark > a few times, so that it's in the HDD cache and *then* run the benchmark > many times and save the performance. In the end throw most of the > measured data way, only keep the values close to the median and thake > the mean. > This becomes especially necessary once we start to compare the speed to > the major players like the Intel Math Kernel Library (free for Linux) > that contains an extremely optimized BLAS an LAPACK or to ATLAS or any > other library. > > As long as we are only developing along and need a trend it can be a bit > more relaxed. But you should get a good performance monitor - which is > basicly identical to Intel VTune: > http://www.intel.com/cd/software/products/asmo-na/eng/vtune/239145.htm > (free for Linux -> "Free Non-Commercial Download") > > There you find out *why* one code is slower than the other. Take special > care about cache hits and misses - they are most likely to cause the > described behaviour (once you know that the compiler machine coder > generation is optimal - but VTune also shows you the assembly, so you > can check it at the same time) > > So get the VTune and analyze the results. Once it's clear why one result > is slower than the other we can try to optimize it. > > CU, > Christian > > Benoît Jacob schrieb: > > Hi List > > > > A lot of progress has happened since alpha1 -- much more than I expected > > to remain to be done. I'll write more about this later, but now I would > > like to discuss benchmarking. > > > > We now have two benchmarks in doc/ : benchmark.cpp is our traditional > > benchmark on 3x3 fixed-size matrices, and benchmarkX.cpp is a 20x20 > > dynamic size variant. > > > > There is also a script, benchmark_suite, running these benchmarks several > > times with various compile options: > > *with and without -DNDEBUG (disabling asserts) > > *with matrix storage order set to RowMajor and ColumnMajor > > > > I should insist on the fact that the matrix storage order influences not > > only the storage of coefficients, but also the traversal order when e.g. > > copying matrices. Expressions are recursively aware of the preferred > > traversal order. > > > > The reason why I'm writing this is that this benchmark_suite gives me > > some very unexpected results: > > > > gaston@kiwi:~/cuisine/branches/work/eigen2/doc$ g++ --version > > g++ (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4) > > gaston@kiwi:~/cuisine/branches/work/eigen2/doc$ ./benchmark_suite > > Fixed size 3x3, ColumnMajor, -DNDEBUG > > > > real 0m19.942s > > user 0m19.893s > > sys 0m0.024s > > Fixed size 3x3, ColumnMajor, with asserts > > > > real 0m32.434s > > user 0m32.406s > > sys 0m0.008s > > Fixed size 3x3, RowMajor, -DNDEBUG > > > > real 0m21.497s > > user 0m21.497s > > sys 0m0.000s > > Fixed size 3x3, RowMajor, with asserts > > > > real 0m32.133s > > user 0m32.122s > > sys 0m0.012s > > Dynamic size 20x20, ColumnMajor, -DNDEBUG > > > > real 0m33.014s > > user 0m33.006s > > sys 0m0.000s > > Dynamic size 20x20, ColumnMajor, with asserts > > > > real 0m27.599s > > user 0m27.554s > > sys 0m0.024s > > Dynamic size 20x20, RowMajor, -DNDEBUG > > > > real 0m28.343s > > user 0m28.342s > > sys 0m0.000s > > Dynamic size 20x20, RowMajor, with asserts > > > > real 0m26.597s > > user 0m26.562s > > sys 0m0.012s > > > > We see two strange things here, which I can't explain. > > > > First, with dynamicsize 20x20, disabling asserts (defining NDEBUG) > > REDUCES speed! What's going on? > > > > First, the storage order has a nonnegligible impact. More precisely, with > > 3x3 fixedsize, ColumnMajor is almost 10% faster than RowMajor, while with > > 20x20 dynamicsize, RowMajor is faster than ColumnMajor! Also, how to > > explain the fact that RowMajor suffers less than ColumnMajor from the > > slowdown induced by defining NDEBUG ? > > > > All this is in SVN so please help me! > > > > Cheers, > > Benoit
Attachment:
signature.asc
Description: This is a digitally signed message part.
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |