Hi List

A lot of progress has happened since alpha1 -- much more than I expected to 
remain to be done. I'll write more about this later, but now I would like to 
discuss benchmarking.

We now have two benchmarks in doc/ : benchmark.cpp is our traditional 
benchmark on 3x3 fixed-size matrices, and benchmarkX.cpp is a 20x20 dynamic 
size variant.

There is also a script, benchmark_suite, running these benchmarks several 
times with various compile options:
*with and without -DNDEBUG (disabling asserts)
*with matrix storage order set to RowMajor and ColumnMajor

I should insist on the fact that the matrix storage order influences not only 
the storage of coefficients, but also the traversal order when e.g. copying 
matrices. Expressions are recursively aware of the preferred traversal order.

The reason why I'm writing this is that this benchmark_suite gives me some 
very unexpected results:

gaston@kiwi:~/cuisine/branches/work/eigen2/doc$ g++ --version
g++ (GCC) 4.2.1 (Ubuntu 4.2.1-5ubuntu4)
gaston@kiwi:~/cuisine/branches/work/eigen2/doc$ ./benchmark_suite
Fixed size 3x3, ColumnMajor, -DNDEBUG

real    0m19.942s
user    0m19.893s
sys     0m0.024s
Fixed size 3x3, ColumnMajor, with asserts

real    0m32.434s
user    0m32.406s
sys     0m0.008s
Fixed size 3x3, RowMajor, -DNDEBUG

real    0m21.497s
user    0m21.497s
sys     0m0.000s
Fixed size 3x3, RowMajor, with asserts

real    0m32.133s
user    0m32.122s
sys     0m0.012s
Dynamic size 20x20, ColumnMajor, -DNDEBUG

real    0m33.014s
user    0m33.006s
sys     0m0.000s
Dynamic size 20x20, ColumnMajor, with asserts

real    0m27.599s
user    0m27.554s
sys     0m0.024s
Dynamic size 20x20, RowMajor, -DNDEBUG

real    0m28.343s
user    0m28.342s
sys     0m0.000s
Dynamic size 20x20, RowMajor, with asserts

real    0m26.597s
user    0m26.562s
sys     0m0.012s

We see two strange things here, which I can't explain.

First, with dynamicsize 20x20, disabling asserts (defining NDEBUG) REDUCES 
speed! What's going on?

First, the storage order has a nonnegligible impact. More precisely, with 3x3 
fixedsize, ColumnMajor is almost 10% faster than RowMajor, while with 20x20 
dynamicsize, RowMajor is faster than ColumnMajor! Also, how to explain the 
fact that RowMajor suffers less than ColumnMajor from the slowdown induced by 
defining NDEBUG ?

All this is in SVN so please help me!


