[eigen] gcc bug hit by eigen, workaround proposal

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


Hi List,

While measuring Eigen performance I realized that gcc doesn't completely unroll _nested_ loops. More precisely, in case of a nested loop, it only unrolls the inmost loop, while the outer loop gets only partially unrolled.

Namely, with fixed-size matrices, Eigen does a lot of nested loops like

for( col = 0; col < size(); col++)
  for( row = 0; row < size(); row++)
    do_something( row, col );

and for fixed-size classes, size() returns the template parameter Size, which is known at compile-time, so one would expect gcc to be able to unroll these nested loops. I always assumed it, and designed Eigen around this assumption. But it's not the case. I filed a bug report here,

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30201

and one gcc developer said this would be something for gcc 4.3. Obviously we can't wait for gcc 4.3 to get good performance in Eigen. I measured a 6x speed difference in some methods. So what I propose is to have a system of preprocessor macros manually unrolling loops. Need to think more about that though, but the rough idea is to replace

for( col = 0; col < size(); col++)
  for( row = 0; row < size(); row++)
  {
    ...
  }

with

for( int foo = 0; foo < size() * size(); foo++)
{
  col = foo / size();
  row = foo % size();

  ...
}

The rationale is that gcc is able to unroll single loops correctly, it only fails to unroll _nested_ loops.

I would like to have this sorted out and implemented before 1.0, which means in 2 weeks. I guess that some of us need good performance right now.

Benoit



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/