Re: [eigen] generic unrollers |

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]

*To*: eigen@xxxxxxxxxxxxxxxxxxx*Subject*: Re: [eigen] generic unrollers*From*: "Gael Guennebaud" <gael.guennebaud@xxxxxxxxx>*Date*: Tue, 10 Jun 2008 20:28:38 +0200*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=dM3IXrcJuYUrlUweK/NlLIIPhhJ36wSoGcsQmH1eLpo=; b=xLBw4z86S1bwQrmH0xr8T4wK6j4Z+w63q+jszxkYo3GGFRLAXz+h1drEuwO23FRgpI t4R9iaqg7pSYs3Yj9/h1dsGFQ4WlP2FLYjob653plk1pMGVEngilj9GPITvSJOK70bV0 f1MeHyKyKNJe9vrcZ7SwwqL+kn0N6Ryi9YlZc=*Domainkey-signature*: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=vx5rJoYuxNKFYSGQfCVhw5g6GLAB4bptvWuU391AToh3J8XmQVV10eNd74Bs5yynfe FaUBGiCKvtDO1TmUlJfRoWisMvuBHSdSMvq+zSLM/JWcVAs6fePzI4emML5d3kaDCx31 JVIsDHn4CV6D+0aOEaVlBdNOe1vK+N4toOfxQ=

On Tue, Jun 10, 2008 at 5:33 PM, Benoît Jacob <jacob@xxxxxxxxxxxxxxx> wrote:

but we still have to divide by the size of the packet...

pop pop pop.... you're damn right, the cost of basic operations is already 1, I was really sure about the 2..... That makes wrong all the rest of my discussion. Actually, the point was to show that we can set it to 1 while preserving the same behavior, but since it is already the case.... However, that's weird, because that means currently in (a+b)*Matrix(?,2), (a+b) should not be evaluated while in my experiments I got similar performance than with explicit evaluation (and removing the condition in ei_nested was indeed slower). So I'm puzzled.

So I'll try again, and see what's the best: the current "<" or the proposed: "<="

Anyway, at least my discussion help to see the effect of the choice of the cost value with respect to the evaluation of nested arguments. basically, setting the cost to something greater than the cost of a basic operation will have the same effect on nesting whatever the value, so no need to bother.

my point was that maybe there exist a very smart way to right a loop peeling unroller that would naturally handle all the situation, that's it ! but I have not though so much about it....

well, the additional template parameter will be a functor, so one more indirection during the compilation... but we'll see, given the fact some unrollers already take a functor as arguments.... but don't interpret it wrong, I'm not at all against this idea :)

On Monday 09 June 2008 16:15:33 Gael Guennebaud wrote:Ah OK, good to know.

> First, the unrolling decision should take into account the vectorization

> since for float it reduces the instruction count by a factor four, so the

> unrolling cost should be around: (4*4)*2 flops + 2*16 reads = 96 that is

> just bellow 100.

but we still have to divide by the size of the packet...

In fact it is a dilemma. For a perfect solution we would need to know both the

> Secondly, this also makes me thinking again that maybe it could be enough

> to approximately count the number of instructions rather than a pseudo

> evaluation cost. Indeed, this makes the unrolling decision more accurate

> (unless I miss something) and simpler to control intuitively.

code size and the computation cost. Indeed, unrolling is trading code size

for a speed improvement; and the relative speed improvement is bigger if the

computation cost is smaller.

Since in NumTraits we basically set all costs to 1, it means that we

implicitly say that code-size is proportional to cost anyway. It is still

useful IMO to call that Cost, and assign a higher value to operations such as

sqrt().

[...]

Here I don't understand: looking at NumTraits.h you already set the cost of

basic operations to 1, not 2.

pop pop pop.... you're damn right, the cost of basic operations is already 1, I was really sure about the 2..... That makes wrong all the rest of my discussion. Actually, the point was to show that we can set it to 1 while preserving the same behavior, but since it is already the case.... However, that's weird, because that means currently in (a+b)*Matrix(?,2), (a+b) should not be evaluated while in my experiments I got similar performance than with explicit evaluation (and removing the condition in ei_nested was indeed slower). So I'm puzzled.

So I'll try again, and see what's the best: the current "<" or the proposed: "<="

Anyway, at least my discussion help to see the effect of the choice of the cost value with respect to the evaluation of nested arguments. basically, setting the cost to something greater than the cost of a basic operation will have the same effect on nesting whatever the value, so no need to bother.

> Third remark: instead of the three options:Sure, but I regard loop peeling as a separate problem than inner loop

> 1 - full unrolling

> 2 - inner loop unrolling

> 3 - no unrolling (at least no explicit unrolling)

> what about trying to support loop peeling (partial unrolling of dynamic

> loops) which ultimately could leads to a full unrolling. Of course this is

> not so easy, but the 3 options solution is not very simple too (must take

> into account the largest/smallest dimensions, etc.) and quite more limited.

> IMO this is worth trying.

unrolling. I'd go for first choosing one of the three options 1,2,3 you list

above, and then for these loops that are not yet explicitly unrolled (in

cases 2,3, which is always in the case of dynamic-size) consider loop

peeling.

my point was that maybe there exist a very smart way to right a loop peeling unroller that would naturally handle all the situation, that's it ! but I have not though so much about it....

My rationale is that the unrollers already take many template parameters, so

> > Since this is making the unrollers even more complicated, and since we

> > now have several different places in eigen with such unrollers (Assign.h,

> > Redux.h, Part.h, Visitor.h) I think it's now more than time to move to

> > generic unrollers. These would take as template argument a struct

> > providing the method to apply to each coeff, etc.

> >

> > OK that I do it?

>

> would be awesome ! though I'm a bit worried about the impact on compilation

> time.... But it's definitely worth trying !

one more parameter is not too bad! But yes the only way to tell is to

actually measure it.

well, the additional template parameter will be a functor, so one more indirection during the compilation... but we'll see, given the fact some unrollers already take a functor as arguments.... but don't interpret it wrong, I'm not at all against this idea :)

Cheers,

Benoit

**Follow-Ups**:**Re: [eigen] generic unrollers***From:*Benoît Jacob

**References**:**[eigen] generic unrollers***From:*Benoît Jacob

**Re: [eigen] generic unrollers***From:*Gael Guennebaud

**Re: [eigen] generic unrollers***From:*Benoît Jacob

**Messages sorted by:**[ date | thread ]- Prev by Date:
**Re: [eigen] limiting executable code size with debug info** - Next by Date:
**Re: [eigen] generic unrollers** - Previous by thread:
**Re: [eigen] generic unrollers** - Next by thread:
**Re: [eigen] generic unrollers**

Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |