Hi List,
Here's an update (the best part is the last one):
0) I committed a file that I had forgotten to "svn add", sorry about that.
1) Michael contributed the first meta loop unrolling one week ago. Michael,
your work is now in CopyHelper.h. I added you copyright on that file.
2) numeric traits, providing a uniform way to deal with various number types.
3) row-vectors are now treated on an equal footing with column vectors. The
Vector typedefs are still column vectors, but there now also are RowVector
typdefs. The row() method now returns a row vector. Operator[] accesses the
coefficiens of a vector in an uniform way, for both row and column vectors.
4) matrix conjugation, transposition, adjunction, trace; vector dot product. I
decided to not overload operator|. So do v.dot(w), that's all. I don't think
that I want to add a global dot(v,w) after all.
5) I extended meta unrolling to all the loops we have. I could think of some
cases where the compiler would have failed to unroll them, so at least now
that won't happen. We won't unroll every loop in the future, but these ones
were really important.
6) Big reorganization of the header files
7) Optimization: I reversed the order of some loops (like the inner loop of
matrix-matrix multiplication) and got a *huge* speedup.
Here's the result I get with our benchmark (g++ 4.2.1, Intel Core1 1.66GHz):
TVMET: 6.1 seconds
Eigen2 with hand-unrolling of the matrix-product: 5.2 seconds
Eigen2 with meta-unrolling: 5.5 seconds
Eigen2 with reversed meta-unrolling: 3.4 seconds
Trying to understand this speedup I ran cachegrind (with only 100000
repetitions) and found this difference:
without reversing of loops:
==9840== D refs: 8,648,008 (6,065,012 rd + 2,582,996 wr)
with reversing of loops:
==9834== D refs: 5,048,066 (2,765,055 rd + 2,283,011 wr)
Anyway, Eigen2 is now almost twice faster than before -- when it already was
faster than TVMET and Eigen1.
Cheers,
Benoit