sounds reasonable, but we'd have to keep it in the back of our heads
that we wanna revert this once gcc's fixed (to make the code readable

On 14/12/06, Benoit Jacob <jacob@xxxxxxxxxxxxxxx> wrote:
Hi List,

While measuring Eigen performance I realized that gcc doesn't
completely unroll _nested_ loops. More precisely, in case of a
nested loop, it only unrolls the inmost loop, while the outer loop
gets only partially unrolled.

Namely, with fixed-size matrices, Eigen does a lot of nested loops like

for( col = 0; col < size(); col++)
   for( row = 0; row < size(); row++)
     do_something( row, col );

and for fixed-size classes, size() returns the template parameter Size,
which is known at compile-time, so one would expect gcc to be able to
unroll these nested loops. I always assumed it, and designed Eigen
around this assumption. But it's not the case. I filed a bug report here,

and one gcc developer said this would be something for gcc 4.3. Obviously
we can't wait for gcc 4.3 to get good performance in Eigen. I measured a
6x speed difference in some methods. So what I propose is to have a system
of preprocessor macros manually unrolling loops. Need to think more about
that though, but the rough idea is to replace

for( col = 0; col < size(); col++)
   for( row = 0; row < size(); row++)


for( int foo = 0; foo < size() * size(); foo++)
   col = foo / size();
   row = foo % size();


The rationale is that gcc is able to unroll single loops correctly, it
only fails to unroll _nested_ loops.

I would like to have this sorted out and implemented before 1.0, which
means in 2 weeks. I guess that some of us need good performance right now.


