[eigen] need help optimizing with __restrict__ and such |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Hi List, First of all: doing m = m*m now works as expected in eigen2, because operator* now performs immediate evaluation, see explanation in a previous mail. The old behavior is now available under the new name lazyMul(). I've started taking care of performance. So i've reused my old benchmark with fixed-size 3x3 matrices, adapted it to eigen2, see attached file b.cpp; and I've compared with the TVMET equivalent, see attached file a.cpp. Compilation command-line included as a comment inside the files. Result: on my machine, ./a runs in 6 seconds while ./b runs in 24 seconds. So Eigen2 currently is 4x slower than tvmet. Wait, don't leave! come back! As far as I can see, there are 3 "keys" to optimization here (not yet talking about handwriting assembly code). 0) g++ -O3 -DNDEBUG 1) Loop unrolling 2) Function inlining 3) The "restrict" keyword For 1), I've hand-unrolled the case of fixed-size 3x3 matrices, just for the needs of my benchmark. About 2): Once loops are unrolled, GCC is very good at inlining functions. In my benchmark there was only one function left that wasn't inlined: eval(). I solved that by marking it with __attribute__((always_inline)). Before somebofy complains about nonstandard keywords: this is of course not an issue, we use the nonstandard stuff only on those compilers that support it. Excerpt from Util.h: #ifdef __GNUC__ # define EI_ALWAYS_INLINE __attribute__((always_inline)) # define EI_RESTRICT __restrict__ #else # define EI_ALWAYS_INLINE # define EI_RESTRICT #endif This is of course only a draft, it will be possible to fine-grain those defines in a better way using cmake platform checks (I already have written a CheckRestrictKeyword.cmake). Which takes us to 3): the restrict keyword. This is the only explanation that I see why eigen2 is 4x slower than tvmet. The problem is that i'm very much inexperienced with that keyword. I read http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html I tried to add restrict at many places of eigen2 but never got more than a +3% performance increase -- and sometimes got worse performance. I don't fully understand restrict so if someone is comfortable with it, I'd welcome some help. The code is in SVN, feel free to add/remove EI_RESTRICT whenever you think it is appropriate. This restrict-stuff seems to be the only difference that might explain why tvmet performs well and eigen2 doesn't. Cheers, Benoit
// g++ -O3 -I /home/gaston/tvmet-1.7.1/include/ -DNDEBUG a.cpp -o a #include<iostream> #include<tvmet/Matrix.h> #include<tvmet/Vector.h> using namespace std; using namespace tvmet; int main(int argc, char *argv[]) { Matrix<double,3,3> I; I = 1,0,0, 0,1,0, 0,0,1; Matrix<double,3,3> m, n; m = 1,2,3, 4,5,6, 7,8,9; for(int a = 0; a < 100000000; a++) { n = m*m; alias(m) = I + 0.05 * (m + n); } cout << m << endl; return 0; }
// g++ -O3 -I /home/gaston/cuisine/branches/work/eigen2/src/ -DNDEBUG b.cpp -o b #include<All> using namespace std; int main(int argc, char *argv[]) { EiMatrix3d I; EiMatrix3d m; for(int i = 0; i < 3; i++) for(int j = 0; j < 3; j++) { I(i,j) = (i==j); m(i,j) = (i+3*j); } for(int a = 0; a < 100000000; a++) { m = I + 0.00005 * (m + m*m); } cout << m << endl; return 0; }
Attachment:
signature.asc
Description: This is a digitally signed message part.
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |