Re: [eigen] generic unrollers

On Tue, Jun 10, 2008 at 5:33 PM, Benoît Jacob <jacob@xxxxxxxxxxxxxxx> wrote:

On Monday 09 June 2008 16:15:33 Gael Guennebaud wrote:
> First, the unrolling decision should take into account the vectorization
> since for float it reduces the instruction count by a factor four, so the
> unrolling cost should be around: (4*4)*2 flops + 2*16 reads = 96 that is
> just bellow 100.

Ah OK, good to know.

but we still have to divide by the size of the packet...

> Secondly, this also makes me thinking again that maybe it could be enough
> to approximately count the number of instructions rather than a pseudo
> evaluation cost. Indeed, this makes the unrolling decision more accurate
> (unless I miss something) and simpler to control intuitively.

In fact it is a dilemma. For a perfect solution we would need to know both the
code size and the computation cost. Indeed, unrolling is trading code size
for a speed improvement; and the relative speed improvement is bigger if the
computation cost is smaller.

Since in NumTraits we basically set all costs to 1, it means that we
implicitly say that code-size is proportional to cost anyway. It is still
useful IMO to call that Cost, and assign a higher value to operations such as
sqrt().

[...]

Here I don't understand: looking at NumTraits.h you already set the cost of
basic operations to 1, not 2.

pop pop pop.... you're damn right, the cost of basic operations is already 1, I was really sure about the 2..... That makes wrong all the rest of my discussion. Actually, the point was to show that we can set it to 1 while preserving the same behavior, but since it is already the case.... However, that's weird, because that means currently in (a+b)*Matrix(?,2), (a+b) should not be evaluated while in my experiments I got similar performance than with explicit evaluation (and removing the condition in ei_nested was indeed slower). So I'm puzzled.

So I'll try again, and see what's the best: the current "<" or the proposed: "<="

Anyway, at least my discussion help to see the effect of the choice of the cost value with respect to the evaluation of nested arguments. basically, setting the cost to something greater than the cost of a basic operation will have the same effect on nesting whatever the value, so no need to bother.

> Third remark: instead of the three options:
> 1 - full unrolling
> 2 - inner loop unrolling
> 3 - no unrolling (at least no explicit unrolling)
> what about trying to support loop peeling (partial unrolling of dynamic
> loops) which ultimately could leads to a full unrolling. Of course this is
> not so easy, but the 3 options solution is not very simple too (must take
> into account the largest/smallest dimensions, etc.) and quite more limited.
> IMO this is worth trying.

Sure, but I regard loop peeling as a separate problem than inner loop
unrolling. I'd go for first choosing one of the three options 1,2,3 you list
above, and then for these loops that are not yet explicitly unrolled (in
cases 2,3, which is always in the case of dynamic-size) consider loop
peeling.

my point was that maybe there exist a very smart way to right a loop peeling unroller that would naturally handle all the situation, that's it ! but I have not though so much about it....

> > Since this is making the unrollers even more complicated, and since we
> > now have several different places in eigen with such unrollers (Assign.h,
> > Redux.h, Part.h, Visitor.h) I think it's now more than time to move to
> > generic unrollers. These would take as template argument a struct
> > providing the method to apply to each coeff, etc.
> >
> > OK that I do it?
>
> would be awesome ! though I'm a bit worried about the impact on compilation
> time.... But it's definitely worth trying !

My rationale is that the unrollers already take many template parameters, so
one more parameter is not too bad! But yes the only way to tell is to
actually measure it.

well, the additional template parameter will be a functor, so one more indirection during the compilation... but we'll see, given the fact some unrollers already take a functor as arguments.... but don't interpret it wrong, I'm not at all against this idea :)

Cheers,
Benoit