[eigen] sums peeling |
[ Thread Index | Date Index | More lists.tuxfamily.org/eigen Archives ]
Hi, This(attached diff) is the outcome from my work today on peeling sum()... Executive summary: i got speed improvements up to +35% but this is going to be stuff for eigen 2.1. Explanation: the benefit of peeling depends a lot on the expression that is being summed, but Eigen's current cost model is too simplistic to tell apart the expressions that benefit from peeling. Let's take a VectorXf v(10000); and SSE2. v.sum(); // peeling speed improvement: +6% v.cwise().square().sum(); // peeling improvement: +35% etc... so we get a nice improvement for CoeffReadCost roughly <=5 , especially nice when it is 2 or 3. Now let's take Two VectorXf v,w; (v.cwise()/w).sum(); // peeling decreases speed slightly Conclusion: peeling is beneficial when CoeffReadCost<=5 roughly AND involves only 1 load. So in addition to CoeffReadCost we need to keep track of how many loads are involved per coeff access. Cheers, Benoit
Attachment:
diff
Description: Binary data
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |