[eigen] gcc bug hit by eigen, workaround proposal |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/eigen Archives
]
Hi List,
While measuring Eigen performance I realized that gcc doesn't
completely unroll _nested_ loops. More precisely, in case of a
nested loop, it only unrolls the inmost loop, while the outer loop
gets only partially unrolled.
Namely, with fixed-size matrices, Eigen does a lot of nested loops like
for( col = 0; col < size(); col++)
for( row = 0; row < size(); row++)
do_something( row, col );
and for fixed-size classes, size() returns the template parameter Size,
which is known at compile-time, so one would expect gcc to be able to
unroll these nested loops. I always assumed it, and designed Eigen
around this assumption. But it's not the case. I filed a bug report here,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30201
and one gcc developer said this would be something for gcc 4.3. Obviously
we can't wait for gcc 4.3 to get good performance in Eigen. I measured a
6x speed difference in some methods. So what I propose is to have a system
of preprocessor macros manually unrolling loops. Need to think more about
that though, but the rough idea is to replace
for( col = 0; col < size(); col++)
for( row = 0; row < size(); row++)
{
...
}
with
for( int foo = 0; foo < size() * size(); foo++)
{
col = foo / size();
row = foo % size();
...
}
The rationale is that gcc is able to unroll single loops correctly, it
only fails to unroll _nested_ loops.
I would like to have this sorted out and implemented before 1.0, which
means in 2 weeks. I guess that some of us need good performance right now.
Benoit