[eigen] sums peeling

[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]


This(attached diff) is the outcome from my work today on peeling sum()...

Executive summary: i got speed improvements up to +35% but this is
going to be stuff for eigen 2.1.

Explanation: the benefit of peeling depends a lot on the expression
that is being summed, but Eigen's current cost model is too simplistic
to tell apart the expressions that benefit from peeling.

Let's take a VectorXf v(10000); and SSE2.

v.sum();  // peeling speed improvement: +6%
v.cwise().square().sum();  // peeling improvement: +35%

etc... so we get a nice improvement for CoeffReadCost roughly <=5 ,
especially nice when it is 2 or 3.

Now let's take Two VectorXf v,w;

(v.cwise()/w).sum(); // peeling decreases speed slightly

Conclusion: peeling is beneficial when CoeffReadCost<=5 roughly AND
involves only 1 load.
So in addition to CoeffReadCost we need to keep track of how many
loads are involved per coeff access.


Attachment: diff
Description: Binary data

Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/