| [eigen] need help optimizing with __restrict__ and such |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Hi List, First of all: doing m = m*m now works as expected in eigen2, because operator* now performs immediate evaluation, see explanation in a previous mail. The old behavior is now available under the new name lazyMul(). I've started taking care of performance. So i've reused my old benchmark with fixed-size 3x3 matrices, adapted it to eigen2, see attached file b.cpp; and I've compared with the TVMET equivalent, see attached file a.cpp. Compilation command-line included as a comment inside the files. Result: on my machine, ./a runs in 6 seconds while ./b runs in 24 seconds. So Eigen2 currently is 4x slower than tvmet. Wait, don't leave! come back! As far as I can see, there are 3 "keys" to optimization here (not yet talking about handwriting assembly code). 0) g++ -O3 -DNDEBUG 1) Loop unrolling 2) Function inlining 3) The "restrict" keyword For 1), I've hand-unrolled the case of fixed-size 3x3 matrices, just for the needs of my benchmark. About 2): Once loops are unrolled, GCC is very good at inlining functions. In my benchmark there was only one function left that wasn't inlined: eval(). I solved that by marking it with __attribute__((always_inline)). Before somebofy complains about nonstandard keywords: this is of course not an issue, we use the nonstandard stuff only on those compilers that support it. Excerpt from Util.h: #ifdef __GNUC__ # define EI_ALWAYS_INLINE __attribute__((always_inline)) # define EI_RESTRICT __restrict__ #else # define EI_ALWAYS_INLINE # define EI_RESTRICT #endif This is of course only a draft, it will be possible to fine-grain those defines in a better way using cmake platform checks (I already have written a CheckRestrictKeyword.cmake). Which takes us to 3): the restrict keyword. This is the only explanation that I see why eigen2 is 4x slower than tvmet. The problem is that i'm very much inexperienced with that keyword. I read http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html I tried to add restrict at many places of eigen2 but never got more than a +3% performance increase -- and sometimes got worse performance. I don't fully understand restrict so if someone is comfortable with it, I'd welcome some help. The code is in SVN, feel free to add/remove EI_RESTRICT whenever you think it is appropriate. This restrict-stuff seems to be the only difference that might explain why tvmet performs well and eigen2 doesn't. Cheers, Benoit
// g++ -O3 -I /home/gaston/tvmet-1.7.1/include/ -DNDEBUG a.cpp -o a
#include<iostream>
#include<tvmet/Matrix.h>
#include<tvmet/Vector.h>
using namespace std;
using namespace tvmet;
int main(int argc, char *argv[])
{
Matrix<double,3,3> I;
I = 1,0,0,
0,1,0,
0,0,1;
Matrix<double,3,3> m, n;
m = 1,2,3,
4,5,6,
7,8,9;
for(int a = 0; a < 100000000; a++)
{
n = m*m;
alias(m) = I + 0.05 * (m + n);
}
cout << m << endl;
return 0;
}
// g++ -O3 -I /home/gaston/cuisine/branches/work/eigen2/src/ -DNDEBUG b.cpp -o b
#include<All>
using namespace std;
int main(int argc, char *argv[])
{
EiMatrix3d I;
EiMatrix3d m;
for(int i = 0; i < 3; i++) for(int j = 0; j < 3; j++)
{
I(i,j) = (i==j);
m(i,j) = (i+3*j);
}
for(int a = 0; a < 100000000; a++)
{
m = I + 0.00005 * (m + m*m);
}
cout << m << endl;
return 0;
}
Attachment:
signature.asc
Description: This is a digitally signed message part.
| Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |